Data Mining

Abstract

The widespread use of computers gave impetus to the active development of data mining technology, the emergence of which is associated with the need to process large amounts of information, accumulated in modern data warehouses, and search for new knowledge or patterns that cannot be detected by standard information processing methods or by experts. The ability to use well-known methods of mathematical statistics, machine learning, pattern recognition, database theory to solve problems of this kind has opened up new opportunities for analysts, researchers and engineers in various fields of human activity. The practical tasks of big data processing consist in the implementation of computer systems or complexes, programmable control systems of data analysis of large and extra-large volumes. The complexity and variety of big data processing technologies require knowledge of data mining methods to solve typical information analysis problems.

The course is based on the materials of lectures and computer classes of the disciplines “Data Mining” and “Big Data Analysis”, given for students of the Faculty of Radiophysics and Computer Technologies of the Belarusian State University during the Bachelor and Master educations from 2011 until the present.

What will be studied

A detailed course syllabus can be found using this link. The program consists of 8 sections, which include 12 lectures (2 hours each) and 10 practical computer classes (4 hours each). Successful mastery of discipline also implies independent work. Course sections:

  • Basic concepts of discipline.
  • Data dimensionality reduction methods (principal components and coordinates, factor analysis).
  • Cluster analysis methods (hierarchical, k-means, Fuzzy k-means, k-medoids, PAMk, CLARA, DBSCAN and spectral).
  • Classification methods (k-nearest neighbors, Bayesian networks, support vector machines, decision trees – ID3, coverages, Conditional Inference Tree, CART, Random Forests, as well as V-fold cross-validation and bootstrap methods).
  • Neural networks (Hebb, adaptive and backpropagation learning algorithms, Kohonen neural networks and deep learning).
  • Stochastic search methods (simple stochastic and random search, Metropolis, simulated annealing and genetic algorithms).
  • Association rules (Apriori and FPG algorithms).
  • Data visualization methods and data mining process pipeline.

Requirements for listeners

Students of the course have to study the following academic disciplines:

  • Linear algebra
  • Statistics
  • R programming

Course learning materials

  • Author’s lectures
  • Computer class manuals, accompanied by author’s guidelines for practical work (according to variants)
  • Electronic tests based on course materials (testing)
  • Forum for discussing course issues (online resource)
  • Course instructor consultations (chats and emails)

What is the difference between the course and existing analogues?

The course covers the main groups of data mining algorithms in the author’s development of educational materials of teachers who have 20 years of experience in the field of data analysis and have conducted this course at the Belarusian State University since 2011 in the course of training more than 2,000 students (BSU ranks 288 in the QS World University Rankings 2023: Top global universities). Materials of lectures and computer classes are offered, as well as individual assignments by variants, developed taking into account the experience of conducting similar courses at leading universities in the world. The lecturers of the course were trained at the Universities of Wageningen (Netherlands), Bern (Switzerland) and Luxembourg (Luxembourg).

Students are explained the mathematical foundations of data mining methods at a level that allows developing software implementations of algorithms in the R language and solving practical problems. In most existing courses, students are encouraged to master the work of method libraries in the form of a black box. As a result, the listener develops a superficial perception of the applied method and a shallow understanding of the results of the analysis, which makes it difficult to obtain reliable conclusions when solving applied problems.

Teachers

Яцков Н.Н.

Ph.D. in Physics and Mathematics, Associate Professor at the Department of System Analysis and Computer Modelling of the Belarusian State University.

Phone: +375 (17) 326-02-22
E-mail: yatskou@bsu.by

Guest lecturers

Назаров П.В.
Petr Nazarov

Guest Lecturer, Ph.D. in Physics and Mathematics, Head of the Bioinformatics Unit of the Luxembourg Institute of Health.

Phone: +352 26970-385
E-mail: Petr.Nazarov@lih.lu

Course enrollment

Admission to the course is based on the results of an interview. The key selection factor is the applicant’s motivation.

End of course

To obtain a certificate, the student must successfully complete practical classes, as well as solve tasks intended for independent work.

Contacts

Mikalai Yatskou

Phone.: +375 17 326 7042
Email: yatskou@bsu.by