CSC233 Introduction to Data Analytics

Overview

Data analytics, the process of analyzing, revealing, interpreting and visualizing information concealed inside big data is revolutionizing daily life, as used by companies such as Amazon, Google and Facebook, for the diagnosis of medical conditions or the way medical claims are handled, for investment strategies and real estate pricing, and in academia, with the analysis of historical texts, understanding the deliberations of the Supreme Court or the European Commission, or processing large amounts of genomics data.

In this class, students will be introduced to techniques to acquire data from the web, manipulate and pre-process data into manageable forms, perform analyses from a descriptive and predictive standpoint, and learn the basics of visualization of the result, all with a focus on storytelling through data, enhancing data literacy.

You will use the Python programming language to scrape web data, prepare data for analysis, analyze to produce explainable results, and visualize those results. Throughout we will discuss the pitfalls, ethics and challenges of working with data.

Assignments & Grades

Each week I will give you some problem exercises. You will complete these and submit them online. There will be a final project, involving the acquisition, manipulation and analysis of data. Finally, class attendance and participation are critical components of the course.

Policies

Late Assignments

Without prior agreement, there is NO late submission of assignments allowed.

Academic Integrity

Scholastic dishonesty is misrepresenting someone else’s work as your own, which is a form of stealing, and will not be tolerated.

For this class in particular, we encourage working together during class time. If you missed something, talk to me, talk to your classmates or reach out via piazza or helpdesk. However ALL HOMEWORK must be completed individually.

You MAY:

  • Talk about concepts in solutions
  • Discuss ideas
  • Look up online documentation, or examples

You MAY NOT:

  • Share code
  • Look at another student’s code
  • Look up solutions to specific problems on the internet

Ultimately, I may choose to call you into office hours to explain the choices you made in solving a problem. You should be comfortable with explaining the choices you made, how that choice was implemented and why.

You are responsible for reading and understanding Union’s policies regarding Academic Conduct in the student handbook (http://www.union.edu/offices/dean/handbook/). If you need help understanding how and when to cite sources, please see me.

Schedule

Note: Subject to change. Be sure to check for updates on Nexus: http://nexus.union.edu/

  • Introduction to Data Analytics
  • Using Descriptive Statistics to Talk About Data
  • Python Tools for Data Analytics
  • Lists, dictionaries and Arrays
  • PANDAS for Exploratory Data Analysis
  • Collecting Data from the Web
  • Data Wrangling and Cleaning
  • Merging, selecting and transforming data
  • Using Simple Machine Learning Methods to Explore Data
  • Regression
  • Classification
  • Clustering
  • Evaluating performance
  • Ethics of data