第8章: 聚类分析
What is Cluster Analysis?
Types of Data in Cluster Analysis
A Categorization of Major Clustering Methods
Partitioning Methods
Hierarchical Methods
Density-Based Methods
Grid-Based Methods
Model-Based Clustering Methods
Outlier Analysis
Summary
Computational Intelligence Lab, Zhejiang University
Clustering Examples
Segment customer database based on similar buying patterns.
Group houses in a town into neighborhoods based on similar features.
Identify new plant species
Identify similar Web usage patterns
Spatial Data Analysis
create thematic maps in GIS by clustering feature spaces
detect spatial clusters and explain them in spatial data mining
Image Processing
Computational Intelligence Lab, Zhejiang University
Clustering Customers
Computational Intelligence Lab, Zhejiang University
Clustering Houses
Size Based
Computational Intelligence Lab, Zhejiang University
Clustering vs. Classification
No prior knowledge
Number of clusters
Meaning of clusters
Unsupervised learning
Computational Intelligence Lab, Zhejiang University
Clustering Problem
Given a database D={t1,t2,…,tn} of tuples and an integer value k, the Clustering Problem is to define a mapping f:D{1,..,k} where each ti is assigned to one cluster Kj, 1<=j<=k.
A Cluster, Kj, contains precisely those tuples mapped to it.
Unlike classification problem, clusters are not known a priori.
* Fuzzy Clustering
Computational Intelligence Lab, Zhejiang University
What Is Good Clustering?
A good clustering method will produce high quality clusters with
high intra-class similarity
low inter-class similarity
The quality of a clustering result depends on both the similarity measure used by the method and its implementation.
The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns.
Computational Intelligence Lab, Zhejiang University
Requirements of Clustering in D
《数据仓库与数据挖掘》第10章 来自淘豆网www.taodocs.com转载请标明出处.