Taming Big Data uses big data and computational science as the theme for learning about computers and computing. It will focus on applications from the sciences and social sciences.
We'll start the course with an overview of how we communicate with a computer and what programming is all about. Then we will move on to look at how we get the computer to manipulate data. By the end of the course students are able to develop programs that manipulate data for problems such as simulation, classification, and financial analysis. They will also be able to prepare data so that it can be processed by existing applications and tools.
The course starts out with a series of relatively small assignments. In the later portion of the term the problems grow in size, in keeping with students' increased ability to handle large amounts of data that are stored in files.
In this course the bulk of programming is done in Python. Increasingly students who are very interested in the course topic are also encouraged to explore R.
In class, you are required to use our lab iMacs. However, when working on your projects outside of class, you have a choice. If you'd like to continue using our iMacs, feel free! We have three spaces that you can use:
19600101,28.4where the first thing on the line is the year - month - day in a YYYYMMDD format, followed by the daily high temperature. So the example line is for January 1, 1960, and the high temperature was 28.4 degrees (all temperatures are Fahrenheit).
Your task is to compute the average closing price over the entire period that the stock has been traded.
DO NOT USE readlines() or read(). DO USE readline().
Remember that the easy way to handle file access is to work in your own directory, and run python from the OS prompt in a Terminal window.
How to do this? If you want to plunge on ahead with no helpful hints, go for it! If you want some hints and tips, click here.
There will be a regular programming assignments. As the term progresses these will utilize an increasing number of features of the Python language. We may also use other packages, such as the R statistical analysis package, later in the term.
There will be two in-class exams and a final exam. There may also be
pop quizzes on material covered in prior classes and the reading. The
intent is not that these be "punitive" in any way but, rather, that they
provide motivation for you to keep up. Learning to program is like learning a
foreign language. If you don't speak it during some part of every
day your progress will be quite slow.
The allocation of emphasis among the course components is as follows:
No homework will be accepted late unless a prior arrangement is made. Just in case you missed that the first time No homework will be accepted late unless a prior arrangement is made.
All hardcopy of homework is due at the beginning of class on the due date. Electronic submission of program executables must be done before you arrive at class.
(subject to change)
Week 1 & 2
|How do we communicate with a computer?
How do we make the computer do what we want?
What is Computer Science?
What is computational science?
What is programming?
|Introduction to algorithms, programs,
functions, variables, arithmetic.
Working with Python
|PP: Ch 1, Ch 2|
|We have all this data!
How do we manipulate it,
and make decisions based on it?
|Lists, Introduction to control flow
Modules, Introduction to Objects & Methods
|PP: Ch 4, Ch 5, Ch 7|
|What about text data?||Strings||PP: Ch 3|
|Exam on 10/4|
Can we do things more than once?
|More control flow, making choices, more repetition||PP: Ch 6 & 7|
|Need to find something in that data?|
Is your data stored in a file?
Search, Nested Lists, File Processing
|PP: Ch 8|
|Sometimes data comes in interesting groups or relationships||Sets and Dictionaries||PP: Ch 9|
|Sometimes things are easier if information is in
Sometimes data items are connected to each other.
And sometimes programs blow up!
|Finish dictionaries and
Search and sort
Exam on 10/30
|Computation in various disciplines||PP: Chapter 14|
|Wilson: 2.1-2.3, 3.1-3.4 (will be provided)|