CSC321 Data Mining and Machine Learning

Overview

The object of this course is to introduce Data Mining, where previously unknown and potentially useful information is automatically extracted from data sources, using regularities or patterns of implicit information. Such patterns can be used to make predictions over future data, and be used to explain and understand the nature of that data. Machine Learning is one mechanism by which data mining is achieved. It is used to discover and extract information from raw data. This course will cover tools and techniques of machine learning that are used in practical data mining.

Learning Objectives

  • Theoretical and practical aspects of machine learning as a tool for mining data
  • Knowledge of methods and techniques for learning, application and evaluation
  • Ability to implement and evaluate basic machine learning algorithms. Familiarity with open source machine * learning tools, such as SciKit-Learn.

Required Text

None. There will be readings during the term that you will find.

Assignments

There will be programming assignments. You will complete these and hand them in, via Nexus, before the beginning of class the following week. There will be a challenge problem, where students will have to preprocess data, and apply a range of machine learning approaches, performing evaluation of the results. Examples include classification of restaurant reviews, or using data from the biomedical domain. There will be a research paper critique. There will be a midterm exam. There will be a final project.

Late Assignments

There are ZERO late assignments allowed in this course. A late assignment will not be graded.

Academic Integrity

Union College recognizes the need to create an environment of mutual trust as part of its educational mission. Trust among students ensures that no student has an unfair advantage over another; trust between faculty and students ensures that the effort both parties put into preparation and evaluation of assigned work is not wasted, but can truly advance understanding and learning for students. Creation of this environment of trust is the responsibility of the entire academic community: faculty, staff and students. It requires that students submit work that is prepared in accordance with the course instructor’s requirements and that faculty foster an environment of academic honesty. Toward this end, professors will uphold the high ethical standards of their discipline, provide to their students clear guidance on the policy and practice of academic integrity, and fairly evaluate students’ work. To help establish mutual assurance of intellectual honesty, Union College expects students to sign the Honor Code Affirmation. Matriculation at the College is taken to signify implicit agreement with the Code.

Specifically with regard to this class - there are a number of programming assignments. As with ALL CS classes, DO NOT share your code with anyone. Each assignment is your own.

You MAY use resources like stack overflow, but ONLY to look for help with USING specific functions, and NOT for answers to homework (complete implementations of code). If you are unsure, DO NOT use them, and ASK ME.