Wednesday, February 1, 2017

Data Types and Classification Methods

Overview
  This assignment will be divided into two main parts. Part I will consist or defining and explaining the differences between nominal, ordinal, interval, and ratio data. Examples of these data types will be shown through maps found on the internet to show that an understanding of the data types is met. Part II will entail defining the equal interval based on range, quantile, and natural breaks methods for displaying data. There will be three maps, one for each method, which will show where a hypothetical agriculture marketing company should concentrate their message to increase farming principal operators among females. All of the maps are based off of the same data, but look different because of the different data classification methods used. To conclude, after the maps are created, the map which displays the data in the best manner will be chosen for the clients to see. Also, a solid argument will be made explaining why that map was picked.

Part I: Explaining Different Data Types


Nominal
  Nominal data carries no numeric value. Any value associated with a nominal feature is arbitrary. This means that nominal data can consist of numbers, but the numbers don't hold any mathematical value. Data assigned with nominal values is used to identify one feature from another. Figure 1.0, shown below, is an example of nominal map. The map depicts regional soil types throughout Wisconsin. Each value is given a color to only show that they are unique from each other. The colors are just there for cartographic purposes. Their is no numeric value associated with the colors. This map is not mine and the sole purpose of using it is to demonstrate the use of nominal data in maps. Here is the website where the map is originally from.
Soil Regions in Wisconsin
Fig 1.0: Soil Regions of Wisconsin. A Nominal Data Map.

Ordinal
  Ordinal data can be thought of as ranking data. The value of ordinal data is determined by its rank. There are two types of ordinal data: strongly ordered and weakly ordered. Strongly ordered is less common than weakly ordered and is when a unit of data is given a specific position or ranking on its own. A good example of this appears in survey questions. One of these questions might ask how one would rate their happiness level. The respondents options would range from excellent, good, fair, and poor. Because each value or option can stand on its own and have its own rank, it is strongly ordered ordinal data. Weakly ordered ordinal data is much more common than strongly order ordinal data. Data is still ranked, but  data classes emerge. These data classes group similar values together to display the data in a more pleasing manner. Below, in Figure 1.1, is a map which displays unemployment rate by county in the state of Wisconsin. The data in this map is split up into six classes. Each class contains a range of data hence this is an example of weakly ordered ordinal map. This map is not mine and the sole purpose of using it is to demonstrate the use of ordinal data in maps. Here is the website where the map is originally from.
Wisconsin Unemployment Rate, 2009. An Ordinal Data Map
Fig 1.1: Wisconsin Unemployment Rate, 2009. An Ordinal Data Map

Interval
  Interval data is associated with continuous data. Data values are placed on a scale, but the zero point is relative and is not a true zero. The values carry numeric value when calibrated to its scale. A good example of this is temperature on the Fahrenheit scale. 0 degrees is a relative zero and doesn't actual mean zero. There can be values both above and below this zero. Simple arithmetic can still be performed with this though. The difference between 45 degrees and 30 degrees is the same as the difference is between 30 degrees and 15 degrees. Another good example of interval data is elevation. Figure 1.2, shown below, is a map which displays the low temperatures across Wisconsin on January 30, 2017. This map uses the Fahrenheit temperature scale. Both values of above and below zero are displayed in the map. Because the zero isn't an absolute zero though, the data is classified as interval. This map is not mine and the sole purpose of using it is to demonstrate the use of interval data in maps. Here is the website where the map is originally from.
Fig 1.2: Lows Across Wisconsin on January 30, 2017. An Interval Data Map.
Ratio
  Ratio data is similar to interval data with the exception that ratio data has an absolute zero. It is also associated with continuous data. Ratio data is placed along a scale which contains a fixed and meaningful zero point. Because of this, mathematical operations can be performed with true results. Examples of ratio data include densities, precipitation, snowfall accumulation, counts, and rates. An example, Figure 1.3, is shown below which is a map that shows snow accumulation across Wisconsin from a November winter storm in 2014. More advanced mathematical operations can be used to show the differences in snow accumulation on this map. For Example, Green Bay pick up 2.5 inches in the storm while Chippewa Falls picked up 5.8 inches of snow. Because this is ratio data, it can be stated that snow accumulation in Chippewa Falls  2.32 times greater than in Green Bay. This map is not mine and the sole purpose of using it is to demonstrate the use of ratio data in maps. Here is the website where the map is originally from.

Fig 1.3: Snowfall Accumulation November 10-11, 2014. A Ratio Data Map.

Part II: Classification Methods

Equal Interval - Based on Range
  The method to of creating equal interval data classes based on range is pretty simple. First, the number of classes must be determined. In this case, it's four. Then, the range of the values will be calculated. This range value is then divided by the number of classes. This number is the interval. The first class will always range from the lowest value to the value of the interval plus the lowest value. The map below in figure 1.4 shows this method being used. It displays the number of women as the principal operator of a farm across the counties of Wisconsin. The interval in figure 1.4 is 96.25. Generally, the equal interval method is best suited for evenly distributed data set.

Number of Female Principal Operators Using the Equal Interval Method.
Fig 1.4: Number of Female Principal Operators Using the Equal Interval Method.
Quantile
  The quantile classification method sets values into classes which contain an equal number of values. Each class will contain the same number of values. This can often lead to very large ranges within the class. This method is best used when the difference between values between classes is very small, and the data is evenly distributed. The map below in figure 1.5 shows this method being used. It displays the number of women as the principal operator of a farm across the counties of Wisconsin. Looking at the class interval, one can clearly see that the range varies quite a bit between classes as the first class has a range of 46 while the last class has a range of 246. 

Number of Female Principal Operators by County
Fig 1.5: Number of Female Principal Operators Using the Quantile Method.

Natural Breaks
  The natural breaks method is best suited for skewed data distributions. The break points between classes using the natural breaks method is determined through looking at the numbers and determining where the largest (natural) breaks are. When ArcMap determines the break points, it looks for the same thing. However, it is able to minimize the variance within a class and maximize it between classes. Because of this process, the method is the most commonly used and works very will with skewed data distributions. Below, in figure 1.6,  is a third map of the number of female operators of a farm by county in Wisconsin.

Number of Female Principal Operators Using the Natural Breaks Method
Fig 1.6: Number of Female Principal Operators Using the Natural Breaks Method.

Conclusion

  All of the three maps above show where the most female principal operators are by county, yet the clients from the hypothetical marking agriculture company wanted to know areas to attract females farmers. The areas where the clients need to target are the light shaded areas which are found mostly throughout northern and eastern Wisconsin. The reason why the hue scheme wasn't flipped was because of the the unwritten rule that the highest values have the darkest hues in a single color hue choropleth map. If the hues were flipped, the focus areas would be darker, and easier to see, but the maps would get confusing because the its not what people are used to seeing. This is why the normal hue scheme from light to dark was used.
  By looking at the maps. It looks like the natural breaks method looks the most pleasing. It is clearly shown in this map where the clients should target. The quantile and equal interval methods do not look pleasing at all. The data distribution backs this up as well. In order to tell if a data set is skewed, normal, or evenly distributed the statistics button can be used in the attributes table. Figure 1.7, shown below, displays the frequency distribution of the Femal_Op field. By looking at the distribution, one can see that it is right skewed. There are a couple of pretty high values which help the skew. This distribution is in direct accordance with the natural breaks method which displays this type of data distribution the best. Hence, it is the one that would be recommended to the clients.

Right Skewed Distribution
Fig 1.7: Right Skewed Frequency Distribution of the Femal_Op Field.

Sources
Nominal data map retrieved from eeinwisconsin.org
Ordial data map retrieved from reform-dem.blogspot.com
Interval data map retrieved from aos.wisc.edu
Ratio data map retrieved from archive.jsonline.com
Wisconsin Shapefile downloaded from factfinder.census.gov
Agriculture data retrieved from 2012 Census of Agriculture.
Data type information read from Modeling Our World, Second Edition, by Michael Zeiler
Data type information also from Dr. Weichelt's lecture slides

No comments:

Post a Comment