CSC103 Taming Big Data
Overview
Introduction to the field of computer science with the theme of natural and social science applications. Introduces students to algorithms, basic data structures, and programming techniques. Includes development of programs and use of existing applications and tools for computational applications including simulation, data analysis, visualization, and other computational experiments.
Required Text
We will use an online introductory text from ZyBooks. The code for subscription is available on the NEXUS page for this course.
Student Learning Outcomes
You will learn to program in Python. This requires the learning of the syntax and semantics of the Python programming language, to enable you to solve problems in an endless array of subjects and disciplines. This course focuses on the problems of working with data, so we’ll cover:
- fundamental components and operation of a computer
- data types (integers, floats, strings, lists, dictionaries)
- repetition and selection
- files, both reading and writing
- breaking down complex problems into computational steps
- using data to answer questions
Common Curriculum: Justice, Equity, Identity and Diversity (JEID)
Algorithms and computational systems increasingly shape the world we live in. Advances in AI and machine learning are in the news on a daily basis. But there is also a growing awareness of the limitations and dangers of these systems. In particular, they often promote biases and inequalities. For example, facial recognition software has been shown to work much less accurately for people of color than for white people, an AI hiring algorithm was found to reject female applicants for technical positions at a higher rate than male applicants, and search results can be full of racial stereotypes.
As an introduction to computer science, this course is the first step that students take to go from being a user of computing to becoming a creator of computing tools. Our goal is to ensure that they understand the many ways in which their choices, as programmers, can include or exclude users, can benefit or harm users.
Throughout the term, students will practice technical concepts by completing programming assignments that are couched in real-world application scenarios. The scenarios will be picked to highlight ways in which the programmer’s implementation decisions impact other people’s experiences, often reinforcing biases and disadvantaging already minoritized groups. Some examples:
- Look at demographic data and see how fields can be used a proxies of race or gender
- Analyze vote counts for districts across states, to determine if gerrymandering is a danger given the number of wasted votes.
- Train a simple machine learning system: Using biased data will create a biased system.
Common Curriculum: Perspectives
Data & Quantitative Reasoning (DQR)
Using algorithms, students will implement programs to analyze data to answer questions, and to form hypotheses. Examples of data used include:
- Country- based carbon emissions data 1970 - 2012
- US Census-based demographic data
- Department of Labor data, including race, sex and age
- Texts of popular novels
Students will then critically engage with the results, to understand that the answer is only ever as good as the data it is extracted from. The ethics of data use will be discussed throughout the class.
Engineering, Technology & Society (ETS)
Students will learn to programmatically break down larger problems into a series of smaller, computational steps (creating an algorithm). The impact of these algorithms on individuals and societies will be explored. Algorithms will be implemented in Python, to show how problems are solved, and also to more deeply understand the nature of computers themselves. Reference will be made to the ways algorithms can be used both positively and negatively, and that it is the job of the programmer to consider the implications of any code they create.
Assignments & Grades
There are numerous types of assignments for this course - programming is a very hands-on activity and the more you do, and the different ways you think about it, the better you will become at it.
Working together is a great way to more fully explore the concepts of the course. At the same time, independent work is also critical so that you fully understand the material on your own. Thus assignments are designed to balance opportunities to work together and individually. In lab, I am happy for you to work together. YOU MUST indicate whom you worked with on your lab cover sheet.
However it is NEVER acceptable to collaborate for HOMEWORK assignments.
There will be weekly homework exercises. Homework exercises are for you to play with and reinforce the concepts we talk about in class. Each person must hand in their own solution. Where these exercises require python programs, you MUST hand in working code. If a section of code does not work for you, it is ok to comment it out. This will be explained more fully at the appropriate time.
At the end of the term there will be a programming project. This project will combine different programming concepts, and provide opportunities to play with your own data. Each student must complete their own programming project.
In any individual project or assignment you may discuss algorithms with each other, but you may NOT look at each other’s code. To complete these projects on time, it is critical that you start each as early as possible and get help as soon as possible when needed.
At no point in the course, for assignments, labs or projects, may you use code you find from online resources.
There will be labs where you will work on exercises in-class and receive help from your peers and myself.
There will be two in-class midterm exams (roughly Week 4 and Week 7), and a final exam that must be completed individually. There may be “pop quizzes” and independent in-class exercises. The intent is not that these be punitive in any way, but rather motivate you to keep up and provide feedback on your progress. Learning to program is like learning a foreign language: if you do not speak it during some part of everyday your progress will be quite slow.
Finally, class attendance and participation is a critical component of the course. Please discuss any necessary absences with me.
Handing in assignments: For both homework exercises and programming projects, you will turn in a hard copy of the source code and submit the program electronically using the NEXUS website.
Whether you work on your own computer or on the system at Union, ultimately your programming projects and homework exercises must - so be sure to test it before handing it in. Labs and in-class exercises will also be submitted on NEXUS.