Introduction
Fig 2.0: Racer Times |
One goal of this lab is to become familiar with the concept of standard deviation, and other statistics. The second goal is to understand the difference between mean centers and weighted mean centers. This lab is broken up into two separate parts.
Part one consists of defining and implementing a variety of statistics. These statistics include range, median, mode, kurtosis, skewness, and standard deviation. The statistics will be derived from the data set shown at right in figure 2.0. All statistics will be calculated using Excel except for the standard deviation which will be done by hand. The scenario for this lab is a cycling race (Tour de Geographia). In Tour de Geographia, there is both a team and an individual component. The individual who wins the race wins $300,00 with 25% going to the team owner, but the team that wins is awarded $400,000 with 35% going to the team owner. After performing the statistics based previous race result, team Astana or team Tobler will be chosen depending on which team and team owner will most likely make the most money at the race.
Part two consists of calculating mean centers and weighted mean centers. Population data from 2000 and 2015 for Wisconsin counties will be used to determine the weighted mean center of population for both 2000 and 2015. Also, the geographic mean center of Wisconsin counties will be shown on the map. The difference between geographic mean center and weighted mean center will be discussed. Then, there will be some discussion about the patterns displayed in the map.
Part one consists of defining and implementing a variety of statistics. These statistics include range, median, mode, kurtosis, skewness, and standard deviation. The statistics will be derived from the data set shown at right in figure 2.0. All statistics will be calculated using Excel except for the standard deviation which will be done by hand. The scenario for this lab is a cycling race (Tour de Geographia). In Tour de Geographia, there is both a team and an individual component. The individual who wins the race wins $300,00 with 25% going to the team owner, but the team that wins is awarded $400,000 with 35% going to the team owner. After performing the statistics based previous race result, team Astana or team Tobler will be chosen depending on which team and team owner will most likely make the most money at the race.
Part two consists of calculating mean centers and weighted mean centers. Population data from 2000 and 2015 for Wisconsin counties will be used to determine the weighted mean center of population for both 2000 and 2015. Also, the geographic mean center of Wisconsin counties will be shown on the map. The difference between geographic mean center and weighted mean center will be discussed. Then, there will be some discussion about the patterns displayed in the map.
Part I: Defining and Calculating Statistics
Definitions
Range: Is the difference between the highest and lowest values in the data set.
Mean: Is equal to all the values added together divided by the number of items in the data set.
Median: The exact middle value of the data set. If the data set has an even number of data then the median is found by finding the mean between the two middle values in the data set.
Mode: Is the most common value in the data set.
Skewness: Refers to the symmetry of the data distribution. There is positive and negative skewness. Positive skewness occurs when the skew is calculated to be over 1, while negative skewness occurs when the skew is calculated to be less than -1. When the skew is between -1 and 1, it is generally considered a normal distribution. Visually, positive skewness will have a long tail to the right because of the large outliers in that direction, and a negative skewness will have a long tail to the left because of the small outliers in that direction.
Fig 2.1: Different Types of Kurtosis |
Kurtosis: Is how peaked a data set is. Kurtosis doesn't have units, but it is given as a number. Positive kurtosis (Leptokurtic) is when the peak is very steep and the number is greater than 4, and negative kurtosis (Platykurtic) is when the peak is spread out and the number is less than 2. When the peak follows a normal distribution, it is given the name Mesokurtic when the number is between 2 and 4. The graphic on the right, in figure 2.1, does a nice job of showing the different types of kurtosis. When kurtosis is calculated in Excel, a 3 is subtracted from the original value.
Fig 2.2: Sample vs Population Standard Deviation |
Calculating Standard Deviation
Below, in figures 2.3 and 2.4, are the standard deviations calculated by hand for team Astana and team Tobler based on the data set shown in figure 2.0. The mean wasn't calculated by hand, but is used in the calculation below. The mean of team Astana's times was 2276.66 minutes and the mean of team Tobler's times was 2285.46 minutes. All the units in the calculation were in minutes to make the math easier. Before the calculation of team Astana, the variables are given in the upper right corner and are labeled with what they stand for. Team Astana's standard deviation calculation is shown in figure 2.3, and team Tobler's standard deviation calculation is displayed in figure 2.4
Team Astana
Fig 2.3: Team Astana's Standard Deviation Calculation |
Team Tobler
Fig 2.4: Team Tobler's Standard Deviation Calculation |
The range, mean, median, mode, kurtosis, and skewness were all calculated in excel. the results are shown below in figure 2.5. All values are rounded to the nearest minute. Kurtosis and skewness don't have a unit so they are left just as they were calculated. Notice, that the standard deviation is displayed as well.
Fig 2.5: All Statistics for both team Astana and team Tobler |
Before I choose which team I should pick, so I can collect the most possible money, a graph was made. It is shown in figure 2.6 which depicts pairing each racer with another racer from the other team. For both teams, the best racer starts at one, and the second best racers resides at two, the thrid best at three and so on.
Fig 2.6: Tour de Geographia Cycling Times in Minutes |
Part II: Calculating Mean Centers and Weighted Mean Cetners
Geographic Mean Center: A measure of central tendency which is calculated by taking the averages of the x, and y values. It is the exact center of a set of points.
Weighted Mean Center: Is the geographic mean center of set of points adjusted for the values associated with each point. Each point is given a weight depending on the value which it holds. For example, below, in the Wisconsin Population map (Figure 2.7), each county is stored as a point, and each of these points then holds the population of the county. The weighted mean center is taken by taking the sum of the population then multiplying it by the county's center x and y coordinates ,and then finding the center of all these weighted county values.
Map
Below, in Figure 2.7, is a map which displays the geographic mean center, and the weighted mean center of Wisconsin population at the county level. To find the geographic mean center, the tool Mean Center was used with the input of the Wisconsin counties feature class. To find the weighted mean center by population for 2000, and 2015, the Mean Center tools was also used, but this time, the respective population field was added to the input weight field box.
Fig 2.7: Geographic Center and Weighted Mean Center of Wisconsin by County |
The map above shows that the geographic mean center is a little more than 50 miles northwest from the weighted mean center of population for 2000. and 2015. The geographic center of Wisconsin is located in Wood county, while both the 2000 and 2015 mean center's of population is located in Green Lake county. This is because of the large population cities of Green Bay, Milwaukee, and Madison are given considerable weight when finding the weighed mean based on population. From the map, it is clear that the weighted mean center for population has moved every so slightly to the north and west. This means that overall, the population has shifted so that a larger percentage of population is now resided west of the 2000 population mean center. Going back into the Excel file from which the population data came from, both Sawyer county, and St. Croix county experienced large increases in population over the last 15 years. Both of these counties are located in the western third of the state. This can help explain for the westward movement of the population mean center. The St. Croix county population has grown by 52,071, and Sawyer county has grown by 46,796. Conversly, Sheboygan county, which borders lake Michigan, has experienced a population loss of 71,083 over the last 15 years. Both of these together help to show that the Wisconsin population is moving west. The reason why people may be moving west may be because of fast growing cities such as Hudson, or Eau Claire. where the demand for jobs at on the increase. Another reason could be that there isn't as much space for the population to expand in the southeast as there is across the rest of the state.
Conclusion
In conclusion, statistics, and measures of central tendency can be used for different kinds of analysis. The statistics of range, median, mode, mean, skewness, kurtosis, and standard deviation help for one to understand the general shape of a data distribution without having to graph the data. Although sometimes, looking at the statistics themselves is enough to answer some questions, many times one will have to look through the data set and see what the values are. This will help so that outliers can be identified and that the general distribution can be eyeballed. This only works though if the data set is fairly small, like the one for Tour de Gegraphia.
Sources
Math Is Fun, Standard Deviation Formulas
https://www.mathsisfun.com/data/standard-deviation-formulas.html
Schall Blog, Kurtosis
http://schaal15.blog.sbc.edu/tag/kurtosis/
Esri, GIS Dictionary
http://support.esri.com/other-resources/gis-dictionary/term/weighted%20mean%20center
Census Bureau, Fact Finder
https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml
https://www.mathsisfun.com/data/standard-deviation-formulas.html
Schall Blog, Kurtosis
http://schaal15.blog.sbc.edu/tag/kurtosis/
Esri, GIS Dictionary
http://support.esri.com/other-resources/gis-dictionary/term/weighted%20mean%20center
Census Bureau, Fact Finder
https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml
No comments:
Post a Comment