Data Mining
This page is setup as a lecture course, if you have a quick question to be answered
check the TOPIC TABLE.
Black and white handouts of the lectures are avaliable in
this zip file.
DataMhandout.zip
You will need these zipped data files for some of the
lectures.
DataMiningData.zip
1. Introduction to Data
Mining
A preview of what
you will learn
2. Data Mining Tools
a. Descriptive Tools
i. Descriptive
Statistics
Strip Chart, Box Plot, Histogram, Dotplot, Sample mean, sample median, sample
variance and standard deviation (sd), semi-interquartile range, scatter plot, covariance,
correlation.
ii. Cluster Analysis (under construction)
iii. Discriminant Analysis
(under construction)
b1. Predictive Regression Methods in
R
i. Least Squares Multiple Linear Regression
ii.
Advanced Regression Methods in
R
Robust Regression, Weighted Regression, Ridge
Regression A
b2. Categorical Response
Variable
i. Logistic Regression
Compute the confusion matrix for logistic
regression example from last lecture, Spliting data sets into training
and test sets, building logistic
models.
ii. Poisson
Regression
When to use Poisson regression, how to estimate
parameters, fitting regression models in R, testing goodness of fit, adjusting for
heterogeneity.
iii. Multinomial Logistic
Regression
Qualitative DV is not binary but takes on K
values.
iv. Ordinal Logisitic
Regression
Qualitative DV is not binary but takes K ordinal
values.
3.
Classification and Regression Trees (CART)
a. CART -
1
set
up in R, impurity measures, parametric models
b.
CART - 2
pruning
a tree, prediction using tree, classification trees
4. Market Basket
Analysis
Association discovery from customer transactions data, sequence discovery
5. Random
Forests
Variable importance measure in Random Forestes, computing in R
6.
Generalized Linear Models (Not General Linear
Models)
7.
Structural Equation Model
Theory, Path Diagrams, Covariance Matrix Algebra, Two Stage Least
Squares
8.
Arima
Modeling
Tutorial through example in R
|