CS312 Lab 4: Letter Frequency in Names vs. Words


Due Wednesday, February 20, at 1:30 PM (because of President's Day)

English letter use frequency is reasonably well established, and used for cryptography purposes. However, names are often not in English, and as such, may show a divergance from standard letter frequencies.

Write a C++ program that will calculate the frequency of each letter, using the spell check dictionary located in /usr/share/dict/american-english on isoptera. Also run the program on the list of names provided, in the file only_names. This is a list of names, generated by the US census. It is sorted by frequency, but the frequencies are not listed in the file (if you would like them, look at names). Set up your program to output the same unit in each file, rather than the number of times each letter is encountered. Frequency of each letter per 100 letters, for example.
As a suggestion: Set up a loop that reads a character from the file. Track the letter counts in an array of 26 integers, where each represents the count for each letter. Track also the number of characters added to the array, or calculate the array sum at the end. Then, print out frequencies for each letter, using the total number of letters and the counts of each letter to calculate these. Do this for each file, then print out both the frequencies for each file, and the differences in frequencies between them.
Here is a file read demo