Stats 24X7.com

Data Mining

This page is setup as a lecture course, if you have a quick question to be answered check the TOPIC TABLE.

Black and white handouts of the lectures are avaliable in this zip file.

DataMhandout.zip

You will need these zipped data files for some of the lectures.

DataMiningData.zip

1. Introduction to Data Mining

A preview of what you will learn

  2. Data Mining Tools 

a. Descriptive Tools

i. Descriptive Statistics 

Strip Chart, Box Plot, Histogram, Dotplot, Sample mean, sample median, sample variance and standard deviation (sd), semi-interquartile range, scatter plot, covariance, correlation.

ii. Cluster Analysis (under construction)

iii. Discriminant Analysis   (under construction

b1. Predictive Regression Methods in R

i. Least Squares Multiple Linear Regression

ii. Advanced Regression Methods in R

Robust Regression, Weighted Regression, Ridge Regression A

b2. Categorical Response Variable 

       i. Logistic Regression

Compute the confusion matrix for logistic regression example from last lecture, Spliting data sets into training and test sets, building logistic models.

        ii. Poisson Regression

When to use Poisson regression, how to estimate parameters, fitting regression models in R, testing goodness of fit, adjusting for heterogeneity.

        iii.  Multinomial Logistic Regression

Qualitative DV is not binary but takes on K values.

iv. Ordinal Logisitic Regression

Qualitative DV is not binary but takes K ordinal values.

  3.  Classification and Regression Trees (CART)

a. CART - 1

set up in R, impurity measures, parametric models   

b.  CART - 2

pruning a tree, prediction using tree, classification trees   

  4. Market Basket Analysis

Association discovery from customer transactions data, sequence discovery

  5. Random Forests

Variable importance measure in Random Forestes, computing in R

  6. Generalized Linear Models (Not General Linear Models)

  7. Structural Equation Model

Theory, Path Diagrams, Covariance Matrix Algebra, Two Stage Least Squares

  8. Arima Modeling

Tutorial through example in R