CS211 Project 2

Due Monday, November 13

For project 2, create a program (from scratch, not a given starting point) that will calculate the vocabulary used in a text file, both as a count of total number of unique words, and a count of how many times each word is used. The program should open a text file, and process the file one word at a time until reaching the end. Use an std::map to store a count of how many times each word occurs. Once this is accomplished, you should be able to print out the number of uses of each word, as well as the total number of words and the most used word.

Since English uses a space as a separator, you can use the extraction operator to extract words from the input stream. In general, the algorithm should go something like this:

Open the file
do {
	Read a word from the file
	Remove punctuation and convert the word to lower case
	If the word is in the map, increase the associated count
	If the word is not in the map, add it with a count of 1
} while( the file still has more data) ;
Display a list of words with number of uses
Display the most used word
Display the number of words (size() for the std::map)

Depending on how you remove punctuation, the results may vary a little. For example, if you remove punctuation totally, "I'll" would be reduced to "ill", which is a valid but different word. If you reject strings that contain punctuation, you'll miss words that are part of a contraction. Leaving apostrophes alone will result in tracking contractions as words, which could be ok. Make a choice here which seems reasonable to you.

Here is an example of what my program does on a simple file with a list of words in it:
seth@nimrod:~/cs211 $ cat words
spider
bee
ant
grasshopper
bee
beetle
bee
seth@nimrod:~/cs211 $ ./a.out
Enter the filename:  words
ant 1
bee 4
beetle 1
grasshopper 1
spider 1
The file contained: 5 words
Most Used:  bee (4 uses)
Here's an example on a longer file:
seth@nimrod:~/cs211 $ cat story
Once when I was new to the risks and problems of boat ownership, I had 10 people on my boat, plus my daughter on skiis behind the boat, heading South towads Asotin.  I thought there was deep enough water quite a ways from shore, but I was wrong, and hit rocks.  Nobody hurt, but the propeller was smashed, and I broke some other pieces in the outdrive.  

Many stories have an upside, and this one does too.  Previously, my boat would steer like a fish, and I did fix this as part of the repair.  Otherwise, knowing my nature, I would have put up with bad steering for years.

So if you try boating in the river here, give the water just off Asotin a wide berth.  A depth finder and a few afternoons studying the geography of the river bottom is a good idea too.  Better yet might be to stay on shore and spend your time programming instead.
seth@nimrod:~/cs211 $ ./a.out story
a 6
afternoons 1
an 1
and 7
as 1
asotin 2
bad 1
be 1
behind 1
berth 1
better 1
boat 4
boating 1
bottom 1
broke 1
but 2
daughter 1
deep 1
depth 1
did 1
does 1
enough 1
few 1
finder 1
fish 1
fix 1
for 1
from 1
geography 1
give 1
good 1
had 1
have 2
heading 1
here 1
hit 1
hurt 1
i 7
idea 1
if 1
in 2
instead 2
is 1
just 1
knowing 1
like 1
many 1
might 1
my 4
nature 1
new 1
nobody 1
of 3
off 1
on 3
once 1
one 1
other 1
otherwise 1
outdrive 1
ownership 1
part 1
people 1
pieces 1
plus 1
previously 1
problems 1
programming 1
propeller 1
put 1
quite 1
repair 1
risks 1
river 2
rocks 1
shore 2
skiis 1
smashed 1
so 1
some 1
south 1
spend 1
stay 1
steer 1
steering 1
stories 1
studying 1
the 9
there 1
this 2
thought 1
time 1
to 2
too 2
towads 1
try 1
up 1
upside 1
was 4
water 2
ways 1
when 1
wide 1
with 1
would 2
wrong 1
years 1
yet 1
you 1
your 1
The file contained: 110 words
Most Used:  the (9 uses)