Data Mining (Informationsutvinning)Fourth Year Course, Period 2, 2005 (1DL105 and 1DL111) |
---|
Instructor:
|
uu.it
domain to be able to access it.)
uu.se
domain.
After the first lecture, to register in the course you have to send the instructor mail with the following information:
hotmail, gmail, yahoo,
... address),
After sending the mail, it is recommended to check the registration list. Only after you see your name in the list, you can assume that you are indeed registered in the course. The registration list will close (its final version will be sent to the Kansliet) on Friday November 11th. After that date, no additions or changes to it will be made.
Data mining, or knowledge discovery from data repositories, has during
the last few years emerged as one of the most exciting fields in
computer science. Data mining aims at finding useful regularities
in large data sets. Interest in the field is motivated by the
growth of computerized data collections which are routinely kept by
many organizations and commercial enterprises, and by the high
potential value of patterns discovered in those collections. For
instance, bar code readers at supermarkets produce extensive amounts
of data about purchases. An analysis of this data can reveal
previously unknown, yet useful information about the shopping
behavior of the customers. Data mining refers to a set of techniques that have been designed to efficiently find interesting pieces of information or knowledge in large amounts of data. Association rules, for instance, are a class of patterns that tell which products tend to be purchased together. There is currently a large commercial interest in the area, both for the development of data mining software and for the offering of consulting services on data mining. In this course we explore how this interdisciplinary field brings together techniques from databases, statistics, machine learning, and information retrieval. We will discuss the main data mining methods currently used, including data cleaning, clustering and classification techniques, algorithms for association rule mining, text indexing and seaching algorithms, how search engines rank pages, and recent techniques for web mining and for privacy-preserving data mining. Designing algorithms for these tasks is difficult because the input data sets are typically very large, and the tasks may be very complex. One of the main focuses in the field is the integration of these algorithms with relational databases and the mining of information from semi-structured data. We will examine the additional complications that come up in this case.
|
Besides assignments, there will also be a written, final examination; see the schedule below.
Margaret H. Dunham | Pang-Ning Tan, Michael Steinbach, and Vipin Kumar |
Data Mining: Introductory and Advanced Topics | Introduction to Data Mining |
Prentice Hall, 2002 | Addison-Wesley, 2005 |
Lecture 1 | Lecture 2 | Lecture 3 | Lecture 4 | Lecture 5 | Lecture 6 |
Lecture 7 | Lecture 8 | Lecture 9 | Lecture 10 | Lecture 11 | Lecture 12 |
Below is the stuff you have to know to take and hopefully pass the exam, not necessarily the stuff you need to read to know about the topics covered in the course.
The table contains chapter suggestions, depending on the book you follow. Paper numbers are those of the required reading material. The table should be read as follows: for each set of lectures, one should read what it appears on the last column and either what's in column two or three. Chapter suggestions of the form "i – j" should be interpreted as read from chapter i to chapter j (both inclusive).
Lectures | Read from the Durham book | Read from the Tan et al. book | Additional required reading |
1 | Ch 1.1–1.6 | Ch 1 | |
2 | Ch 2.2, 2.3, 2.9, 2.10, 3.1–3.4 | Ch 2.1–2.4 | |
3 | Ch 4.1–4.3 | Ch 4.1, 4.2, 5.2 | Slides |
4 | Ch 4.4, 4.7 | Ch 4.3, 4.5, 5.1 | Slides |
5 | Ch 5.1–5.3, 5.5.3 | Ch 8.1, 8.2 | Slides |
6 | Ch 5.5.1, 5.5.2, 5.5.5, 5.6 | Ch 8.3, 8.4, 9.5 | Slides; Papers 1 & 2 |
7 | Ch 6.1–6.2.3 | Ch 6.1–6.3.1 | Papers 3 & 4 |
8 | Slides; Paper 5 | Ch 6.3, 6.4; Paper 5 | |
9 | Slides | Ch 6.7, 7.3 | |
10 | Slides | Ch 7.4 | Paper 6 |
11 | Ch 7 | Slides | |
12 | Slides; Papers 7 & 8 | ||
13 & 14 | Nothing | Nothing | Nothing |
Num | Date | Time | Topics Covered |
1 | Monday 31/10 | 13-15 | Introduction to Data Mining |
2 | Tuesday 1/11 | 13-15 | Overview of Data Mining Techniques |
3 | Wednesday 2/11 | 8-10 | Classification (1) |
Tut 1 | Wednesday 2/11 | 10-12 | Introduction to MATLAB |
Tut 2 | Monday 7/11 | 10-12 | Tutorial on Assignment 1 |
4 | Wednesday 9/11 | 10-12 | Classification (2) |
5 | Thursday 10/11 | 10-12 | Clustering (1): Partitional Techniques |
6 | Tuesday 15/11 | 13-15 | Clustering (2): Hierarchical Techniques |
Tut 3 | Tuesday 15/11 | 15-17 | Tutorial on Assignment 2 |
7 | Wednesday 16/11 | 15-17 | Association Rules (1): Frequent Itemsets |
8 | Thursday 17/11 | 10-12 | Association Rules (2): Fast Algorithms & Rule Generation |
9 | Tuesday 22/11 | 15-17 | Association Rules (3): Evaluation of Association Patterns |
10 | Wednesday 23/11 | 15-17 | Mining Sequential Patterns |
11 | Monday 28/11 | 13-15 | Web Content Mining |
12 | Tuesday 29/11 | 13-15 | Search Engines |
13 | Tuesday 6/12 | 13-15 | Data Mining and Privacy (1) |
14 | Tuesday 6/12 | 15-17 | Data Mining and Privacy (2) |
15 | Thursday 15/12 | 13-15 | Course Overview, Course Evaluation & Exam Tips |
Monday 19/12 | 8-13 | EXAM |