Lab 7: Statistics and Steam


Image Source


Data for eruption timing from the Old Faithful Geyser in Yellowstone National Park has been collected, and is available in a number of places including here. The data can be loaded into a Python program using file i/o and string processing, which hasn't been discussed in class yet. So the code is provided for you in faithful_list.py in the class examples area. Download faithful.dat to the same location as your Python program.

Averages

Use the "eruptions" and "waits" list to calculate the average wait time between eruptions, and the average eruption length. Remember, the sum of a list divided by the length of the list is the average of the list.

Plot of Wait Times

Use ASCII art to plot the number of times each wait occurs. For this, you'll need a count of each wait time. You can use the .count list method, and a loop that runs from the shortest wait time up to the longest wait time. Draw one asterisk (*) per eruption, for each wait time. For this, you'll need to find the frequency of each eruption time.

Lab Report

Besides showing me your lab, prepare and upload a brief report summarizing your results. The report should contain the graph you generated (A screenshot is fine, or you can cut and paste the ASCII graph). The report should contain three sections: Averages, Plot of Wait Times, and Methods Used. Put your source code in the Methods Used section.

Extra Credit: Standard Deviation

For 10% extra credit on this lab, check to see if the wait time is related to the eruption length. Make another list, containing the wait time divided by the eruption time. Calculate the standard deviation on the list of wait times, and divide it by the average wait time. Calculate the standard deviation for the wait time divided by the eruption time, and divide it by the average of that list. Both numbers will be less than 1. If the second number is less than the first number, then the eruption time is related to the wait for the next eruption.
The statistics package is not available in Python versions before 3.4, so you'll either have to find a more recent Python version, calculate the standard deviation yourself (the forumula is on Wikipedia), or use a library like Numpy. Calculating it yourself based on the formula is probably easiest.