CSc 150
Project 1 -- Looking Out for Number One
Due: Wednesday, April 7, 2010
Objectives
- To practice the object-oriented way of programming, which
involves the design of classes and the creation of objects
- To practice writing Java methods (you may know them
as functions in other languages). This involves writing
prototypes, signatures, return types,
modifiers, and parameters
- To further practice with typed variables
- To reacquaint yourself with familiar constructs like conditionals,
loops, and arrays, and their Java syntax
- To practice creating programs with third-party classes
- To work with file I/O
- To get used to writing and consulting standard Javadoc documentation to
figure out how classes work, or to see if a method that will do what you want
has already been written.
Introduction
Say you were given a list of statistics. The source could be almost anything:
batting averages, city populations, number of sunspots observed each day for a year--
whatever. If you were to look at each number and figure out how
many of them started with a 1, how many started with a 2, etc., you'd expect
an equal distribution, since the digits that make up those numbers can
vary pretty randomly.
But you'd be wrong.
There is a strange, but true, law known as Benford's Law that says that
in any given list of statistics, the number 1 will appear most often as
a leading digit (about 30% of the time). 2 will appear next most often,
followed by 3, etc. The graph below shows the law at work for a variety of
data sets.
Most people find Benford's Law surprising and counterintuitive (but neat!)
If you'd like to know more about why Benford's Law works and its history,
read this
article by Jon Walthoe which does a pretty good
job of explaining it. Furthermore,
a computer program is great at showing mathematical phenomena like this
one in action. That's exactly what you'll do in this assignment: number
crunching statistical files to prove to yourself that Benford's Law really is true.
Your Mission
Create a program that takes a file of statistics as input, computes
how many of the numbers have x as a leading digit for all x 1-9,
and then plots a bar graph showing the relative frequency of each of the digits
when they lead off a number. The output of your program should look something like
this:
File: "sunspot_data.txt"
frequency of the lead digit
---------------------------
1: *****************************************************************
2: **********************************************
3: ****************************************
4: ********************************
5: *************************
6: **********************
7: ********************
8: ********************
9: ******************
Don't forget to include the filename and a title for your bar graph.
Well-labeled output is part of good design!
The Details
To help you with the file I/O, I'm providing a Tokenizer class on
Blackboard that you should use
to help obtain input. It has only one instance method, nextToken,
that returns the next unread token (i.e. a number) as a string.
Consult these Javadocs to learn how
to use it, and test it out first! For example, you
could write a small program that simply reads every number from the input
file and then prints it.
You'll also want to consult the
Java API docs to
learn about methods that can help you parse the string you get from
the Tokenizer. You'll need to strip out the first character
from the string and turn it into a number (i.e. something of type int).
Both the String and Character classes will be useful to look at.
Get used to looking up stuff in the docs like this -- that's how
you save yourself time in the long run by not having to
reinvent code that already does what you want.
In addition, here are some sample input files to test your code on.
Your tests should not stop here though. You are required to find two
other input files that demonstrate Benford's law and include them
with your project. Remember, the data cannot be too random or too
constrained or else Benford's Law won't apply.
(Read
the article above for why.) But that still leaves lots
of choices. And make sure the two files you pick are different types of data --
not two files of population data from two different states, for example.
Get creative to make a more convincing argument of how widespread
Benford's law really is.
PITFALL ALERT: The input files need to be in a particular format
(see the Tokenizer docs).
So look for spreadsheet files where the data you want is in a single
column that you can easily cut and paste into a text file. That'll make
your life much easier.
Document in Javadoc style
You are required to comment your code in Javadoc style. (The
Tokenizer
class is commented in this way.) I'll show examples of how to
do this in class, but you should also consult
Sun's reference page which shows good examples. At minimum,
every parameter and return value of every method should be commented
using the @param and @return tags.
Every class should have a descriptive comment too. If you're using
Eclipse, you can automatically generate the Javadoc web pages
by going to the Project menu - Generate Javadoc.
Some Thoughts on Design
Remember that you're being graded not just on if the program works,
but how well you designed your code. Try to think in an object-oriented way.
Since this first program is mainly to get you used to Java, I'll help you out
this first time. Try to follow what we did in class. There is always a
main method that the program starts with, and it's in a
class by itself. That method creates and sends messages to other objects
to get everything else done. What kinds of objects would be involved here?
One way to go is to have a BenfordTester object whose job (behavior)
is to process the input file and do the graphing.
This BenfordTester class could then have instance methods
to define this behavior, like those below:
/** Counts the number of times the number n is the first digit
* in file filename for all n=1...9.
*
* You'll be using the Tokenizer class in this method to get
* each number from filename and then figuring out
* what its leading digit is. How you use this information
* to figure out the total count for all 9 digits is up to you.
*/
public void counter(String filename)
/** Uses the nine total counts (one for each digit 1-9) to form
* a histogram (a bar graph) of leading digit frequency.
*
* One asterisk will be printed in the bar graph for every
* unitsPerAsterisk in the count. For example, say you had
* counted that '2' was the leading digit 100 times. If
* unitsPerAsterisk was 25, then you'd print 4 asterisks for the
* '2' bar (1 asterisk for each 25 in the count). If
* unitsPerAsterisk was 1, then you'd print 100 asterisks for
* the '2' bar (1 asterisk for each 1 in the count). Being able to
* adjust this for different statistics files will let you see
* coarser or finer levels of detail so you can make an easy-to-read
* bar graph.
*/
public void grapher(int unitsPerAsterisk)
In this way, you separate the counting task from the graphing task.
You are not required to use the methods above. They're just
suggestions. You may also use them but alter the parameters and
return types, if you wish.
Remember to practice good programming skills:
- Figure out how to use the starter code first. Try to
read one of the sample text files and run nextToken
on it. What gets returned? Can you read the whole file?
How do you know when you're at the file's end?
- Write what the method does (the comments) first. Then
write the code. And don't just copy and paste the comments
I have above! Those comments are written more as instructions for
you. Write all comments in your own words.
- Test your code often! Write one method (or just part of one method)
and test just that.
Grading
You should check out the grading standards
on our course web page that will be used for all programming
projects.
This project will be worth 50 points. It will
be divided up this way:
- 20 points for correctly working code.
- 10 points for the overall design of your code, including
what you did to compute all nine counts correctly.
- 10 points for breaking up tasks into appropriate methods as
well as good understandability.
Your code should reflect straightforward logic that you should
be able to easily explain out loud. If you can't articulate it,
then your code is probably too complicated for no good reason.
- 5 points for finding appropriate input files to test with.
- 5 points for Javadoc documentation.
What to turn in
Be sure to include your own input files in your project when you turn it in.
If you're using Eclipse, you can just include it in the project folder
before you zip it up. Please turn in both
a paper and an electronic copy of your
project. The zipped up project will
be turned in to the Dropbox on Blackboard.
Having trouble?
Don't wait until the last minute! Come see me and get
your $80 worth.
Gentle Reminder
Programming assignments, like homework assignments,
are individual
projects. I encourage you to talk
to others about the general nature
of the project and ideas about how to pursue it. However, the
technical work, the writing, and the inspiration behind these
must be substantially your own. If any person besides
you contributes in any way to the project,
you must credit their work on your project.
Similarly, if you include
information that you have gleaned from other published sources, you
must cite them as references.
Looking at, and/or copying, other people's programs or written work is
inappropriate, and will be considered cheating.
Return
to Project Index