Faculty of Information Technology, BUT

Course details

Data Storage and Preparation

UPA Acad. year 2019/2020 Winter semester 5 credits

The course introduces fundamental data classification from the viewpoint of data mining and knowledge discovery. It also provides insight on selected modern database systems and particular topics are studied in deep manner --- there are presented object-relational databases, spatial databases (including issues connected with spatial data storage and indexing), NoSQL databases, XML databases, and multimedia databases. Moreover, advanced queries on relational databases are discussed too. Next, it is explained a process of data mining and knowledge discovery and particular steps of this process. The explanations is focused on typical tasks performed in data pre-processing before ongoing extraction of potentially useful knowledge from data. The process of data mining and knowledge discovery is presented on selected use-cases.

Guarantor

Deputy Guarantor

Language of instruction

Czech

Completion

Examination (written)

Time span

26 hrs lectures, 6 hrs exercises, 6 hrs pc labs, 14 hrs projects

Assessment points

60 exam, 20 half-term test, 20 projects

Department

Lecturer

Instructor

Subject specific learning outcomes and competences

Students will be able to classify data from data mining and knowledge discovery viewpoint, store and manipulate data in suitable database systems, quickly search for required data, inspect data features and prepare data for consecutive knowledge extraction.

Generic learning outcomes and competences

- Student can better perform in data manipulation in various situations
- Student improves in participation on a small project as a member of a small team

Learning objectives

The aim of the course is to explain fundamental data classification and classification of data resources, to give deeper insight on selected database systems (object-relational, spatial, NoSQL, XML, and multimedia) and efficient data manipulation, to provide core insight and particular steps on the process of data mining and knowledge discovery with concentration on data pre-processing and exploratory analysis.

Why is the course taught

The aim of this course is to demonstrate how to work with complex data around us, how to store such data, how to get oriented in such data, obtain useful characteristics from such data, and how to prepare such data for extraction of hidden information/knowledge by application of machine learning methods and other advanced analytical methods.

Prerequisite kwnowledge and skills

Fundamental relational data model theory. Formal design of relational database. Data storage on internal level. Data safety and integrity. Transactions. Conceptual modeling and database design from conceptual model. SQL programming language. Fundaments of computer graphics. Fundaments of computational geometry. Object paradigm. Fundaments of statistics and probability.

Study literature

  • Lecture materials (slides, scripts, etc.)
  • Lemahieu, W., Broucke, S., Baesens, B.: Principles of Database Management. Cambridge University Press. 2018, 780 p.
  • Kim, W. (ed.): Modern Database Systems, ACM Press, 1995, ISBN 0-201-59098-0
  • Melton, J.: Advanced SQL: 1999 - Understanding Object-Relational and Other Advanced. Morgan Kaufmann, 2002, p. 562, ISBN 1-558-60677-7
  • Shekhar, S., Chawla, S.: Spatial Databases: A Tour, Prentice Hall, 2002/2003, p. 262, ISBN 0-13-017480-7
  • Dunckley, L.: Multimedia Databases: An Object-Relational Approach. Pearson Education, 2003, p. 464, ISBN 0-201-78899-3
  • Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers, 2012, p. 703, ISBN 978-0-12-381479-1

Fundamental literature

  • Lemahieu, W., Broucke, S., Baesens, B.: Principles of Database Management. Cambridge University Press. 2018, 780 p.
  • Kim, W. (ed.): Modern Database Systems, ACM Press, 1995, ISBN 0-201-59098-0
  • Melton, J.: Advanced SQL: 1999 - Understanding Object-Relational and Other Advanced. Morgan Kaufmann, 2002, p. 562, ISBN 1-558-60677-7
  • Shekhar, S., Chawla, S.: Spatial Databases: A Tour, Prentice Hall, 2002/2003, p. 262, ISBN 0-13-017480-7
  • Dunckley, L.: Multimedia Databases: An Object-Relational Approach. Pearson Education, 2003, p. 464, ISBN 0-201-78899-3
  • Gaede, V., Günther, O.: Multidimensional Access Methods, ACM Computing Surveys, Vol. 30, No. 2, 1998, pp. 170-231. 
  • Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers, 2012, p. 703, ISBN 978-0-12-381479-1

Syllabus of lectures

  1. Introduction: course contents, data characteristics, introduction to data mining and knowledge discovery, database technology development history recapitulation
  2. Object-relational DB, object-relational mapping, advanced SQL features.
  3. Spatial DB: spatial data storage and manipulation issues
  4. Spatial DB: possible solutions of spatial data storage
  5. Indexing in spatial DB I - points
  6. Indexing in spatial DB II - multi-dimensional objects
  7. Mid-term exam
  8. Multimedia and XML databases
  9. NoSQL databases
  10. Data mining and knowledge discovery process, data pre-processing in this process - data characteristics, exploratory data analysis
  11. Data pre-processing during data mining and knowledge discovery process - pre-processing methods
  12. Fundamental tasks in data mining and knowledge discovery, examples of corresponding methods
  13. Programming languages used for data mining and knowledge discovery, illustrative use-cases on data mining and knowledge discovery

Syllabus of numerical exercises

DEMO excercises
  1. Object-relational and spatial databases, data definition and manipulation, peculiarities
  2. Multimedia and XML databases, data indices
  3. NoSQL databases

Syllabus - others, projects and individual work of students

  1. Creation and feature demonstration of both structured and unstructured data processing, where data may be of various nature.

Progress assessment

  • Mid-term exam, for which there is only one schedule and, thus, there is no possibility to have another trial.
  • One project should be solved and delivered in a given date during a term.

Controlled instruction

  • Mid-term exam - written form, questions, where answers are given in full sentences, no possibility to have a second/alternative trial. (20 points)
  • Projects realization - 1 project (program development according to a given specification) with appropriate documentation. (20 points)
  • Final exam is performed in written form. Students are given questions, where answers are provided in full sentences. The maximal amount of points one can get is 60 points - the minimal number of points which must be obtained from the final exam is 25, otherwise, no points will be assigned to a student. The exam has one regular and two corrective periods. Regular period is always performed in fully written way only. Corrective periods can be performed either in fully written form or in a combined form (both written and verbal performance in a single day, written in the morning verbal in the afternoon). The form of corrective periods is announced as soon as the previous period is evaluated, while the combined form will be performed in the case when for the particular period is assigned no more than 16 students.

Exam prerequisites

At the end of a term, a student should have at least 50% of points that he or she could obtain during the term; that means at least 20 points out of 40.
Plagiarism and not allowed cooperation will cause that involved students are not classified and disciplinary action can be initiated.

Schedule

DayTypeWeeksRoomStartEndLect.grpGroupsInfo
Tuelecturelectures E104 E105 E112 08:0009:50 1MIT 2MIT xx
Wedexercise3., 4., 5., 6., 9., 10., 11., 12. of lectures D105 16:0017:50 1MIT 2MIT xx
Thucomp.lab3., 4., 5. of lectures N203 N204 N205 14:0015:50 1MIT 2MIT xx
Fricomp.lab3., 4., 5. of lectures N203 N204 N205 10:0011:50 1MIT 2MIT xx
Fricomp.lab3., 4., 5. of lectures N203 N204 N205 12:0013:50 1MIT 2MIT xx

Course inclusion in study plans

Back to top