课程编号: 091M5023H |
课时: 46 |
学分: 2.0 |
课程属性: 专业普及课 |
主讲教师:刘莹等 |
英文名称: Data Mining |
教学目的、要求
本课程为计算机软件学科研究生的专业普及课程。主要介绍数据挖掘技术的起源、原理、主要算法、关键技术等。课程包含的主要议题包括:数据挖掘的重要性、特点、应用领域、数据仓库、数据预处理技术、关联规则、分类、预测、聚类、顺序模式等。
本课程采用全英文教学,并将注重理论与实践相结合,使计算机专业研究生掌握数据挖掘的概念的同时,锻炼解决实际问题的能力,为将来的科研工作奠定基础。
预修课程
数据结构,算法,C编程,数据库,概率统计
教 材
主要内容
Chapter 1: Introduction (3 hours)
1.1 Motivation
1.2 Major issues, major applications,
1.3 Major applications
1.4 Characteristics
.
Chapter 2: Data Warehouse (3 hours)
2.1 Model
2.2 Architecture
2.3 Operations
.
Chapter 3: Data Pre-processing (3 hours)
3.1 Data cleaning
3.2 Data transformation
3.3 Data reduction
Chapter 4: Association Rules Mining (6 hours)
4.1 Apriori
4.2 Single-pass frequent itemset mining
4.3 FP-Growth
4.4 Multi-level & Multi-dimensional association rules mining
Chapter 5: Classification (6 hours)
5.1 Decision tree
5.2 Bayesian Classifier
5.3 Classification by backpropagation,
5.4 KNN classifier
5.5 Prediction models
Chapter 6: Clustering (6 hours)
6.1 Partitioning methods
6.2 Hierarchical methods
6.3 Density-based methods
6.4 Grid-based methods
6.5 Outlier detection
Chapter 7: Applications (2 hours)
7.1 Credit scoring
7.2 oil exploration
7.3 Customer relationship management
Chapter 8: Big Data Mining (4 hours)
8.1 Big data
8.2 Big data characteristics
8.3 Big data mining techniques including high performance mining, Web mining, stream mining, graph mining, text mining, cloud mining, etc.
参考文献
Data Mining, Concepts and Techniques. Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2006.