第7章: 分类和预测
What is classification? What is prediction?
Issues regarding classification and prediction
Classification by decision tree induction
Bayesian Classification
Classification by works
Classification by Support Vector Machines (SVM)
Classification based on concepts from association rule mining
Other Classification Methods
Prediction
Classification accuracy
Summary
2017/11/10
1
Data Mining: Concepts and Techniques
Classification:
predicts categorical class labels (discrete or nominal)
classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data
Prediction:
models continuous-valued functions, ., predicts unknown or missing values
Typical Applications
credit approval
target marketing
medical diagnosis
treatment effectiveness analysis
Classification vs. Prediction
2017/11/10
2
Data Mining: Concepts and Techniques
Classification—A Two-Step Process
Model construction: describing a set of predetermined classes
Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute
The set of tuples used for model construction is training set
The model is represented as classification rules, decision trees, or mathematical formulae
Model usage: for classifying future or unknown objects
Estimate accuracy of the model
The known label of test sample pared with the classified result from the model
Accuracy rate is the percentage of test set samples that are correctly classified by the model
Test set is independent of training set, otherwise over-fitting will occur
If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known
2017/11/10
3
Data Mining: Concepts and Techniques
Classification Process (1): Model Construction
Training
Data
Classification
Algorithms
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier
(Model)
2017/11/10
4
Data Mining: Concepts and Techniq
《数据仓库与数据挖掘》第9章 来自淘豆网www.taodocs.com转载请标明出处.