Tuesday, April 4, 2017

Hypothesis Testing

Introduction

  In this assignment are four questions which relate to hypothesis testing. The goal of this assignment is to demonstrate an understanding of significance levels, Z-tests, T-Tests, critical values, and hypothesis testing. The questions in this assignment utilize real work data which will be used to connect statistics to geography.

Question 1

  This first question entails filling out a table which initially included the interval type, confidence level, and number of samples. The table was then completed by using materials from class such as T tables, and Z tables, and notes. The three fields filled out were α, Z-Test or T-test, and Z or T Value(s). This chart can be seen below in figure 4.0.
Critical Value Chart
Fig 4.0: Table showing statistical information about data.

Question 2

  This next question consisted of utilizing hypothesis testing to see if crop yields in metric tons in specific district in Kenya are statistical different than the rest of the country's crop yield. There are three types of crops that the Department of Agriculture and Live Stock Development organization is concerned about. They are ground nuts, cassava, and beans. The sample of 23 farmers was conducted from the concerned district. This crop result was based on metric tons per hectare. Ground nuts had an average of .52 with a standard deviation of .3, cassava had an average of 3.3 with a standard deviation of .75, and beans had an average of .34 with a standard deviation of .12.
  The null hypothesis for the ground nuts, cassava, and beans is that there is no difference between the sample crop yield and the country's average crop yield. The alternative hypothesis for the ground nuts, cassava, and beans is that there is a difference between the sample crop yield and the country's average crop yield. Because there were only 23 farmers used in the survey for the district, a T-Test will be used. When determining when to use a T-Test versus a Z-Test, one should look at the sample size. If the sample size is 30 or larger then a Z-Test should be used, If the sample size is less than 30, a T-Test should be used.
Z-Test and T-Test Equation
Fig 4.1: Z-Test and T-Test Equation
  Next, the specific test statistic values will be determined for each crop. The equation for a Z-Test and a T-Test are the same. The equation is shown on the right in figure 4.1. The top of the equation is the sample mean minus the population mean and the bottom part of the equation is the sample standard deviation divided by the square root of the number of samples. The significance level used is 95%, and a two tail test will be used.
  Using this equation, a critical value of -.7993 is calculated for the ground nuts, a critical value of -2.5578 is calculated for the cassava, and a critical value of 2.06155 is calculated for the beans.
  Then, these values are analyzed using the T-Table in the back of the textbook on page 369. The degrees of freedom (sample size minus one) is used to help look up the critical value for a 95% level of significance. Because at two tail test is used, the critical value will actually be pulled from the 97.5% column. The critical values for a two tailed test at a 95% level of significance is -2.074 and 2.074.
  By using the critical values at the 95% level with a two tailed test and comparing them to the test statistic values the results of the hypothesis test can be determined. For the ground nuts, the test statistic -.7993 falls between the critical value range so here I fail to reject null hypothesis meaning that statistically there is no difference between the crop yields of the sample from the district and the country's average. For cassava, the test statistic -2.5578 falls outside the the critical value range. This means that I reject the null hypothesis and that there is a statistical difference between the crop yield of the sample and of the country's average. Also, by looking at the means, it is determined that the district has a lower statistical harvest of cassava than the average county's harvest. For the beans, the test statistic 1.9983 falls between the critical value range meaning that I fail to reject the null hypothesis and that statistically there is no difference between the sample and the country's average crop yield.
  Using the probability chart in the back of the textbook, the probability of having or exceeding the specific test statistics can be looked up. For the ground nuts the probability found is .21656. For the cassava the probability found is .00856. Lastly, the probability found for the beans is .97037.

Question 3

  This question will also use hypothesis testing. This time, the scenario is that a researcher thinks that the level of a particular stream's pollutant content is higher than the allowable limit of 4.2 mg/L. Taking 17 samples in the stream, the researcher reveals an average pollutant level of 6.4 mg/L with a standard deviation of 4.4. For this question a one tailed test and a 95% significance level will be used.
  The null hypothesis is that there is no statistical difference between the the pollutant content of the stream and the allowable pollutant content. The alternative hypothesis is that there is a statistical difference between the pollutant content of the stream and the allowable pollutant content.
  Because there were only 17 samples of the stream taken, a T-Test will be used. Using the equation in figure 4.1, a test statistic of 2.062 is calculated. Then, this test statistic is compared to the critical value of 1.64 which was found using figure 4.0. Because the test statistic is greater than the critical value, I reject the null hypothesis. This means that statistically there is a difference between sample means stream's pollutant level of 6.4 and the allowable pollutant level of 4.2. This also indicates that the sample of stream's pollutant level is over the allowable limit. Looking in the back of the book, a probability value of .97347 was found.

Question 4

   For this question a hypothesis test was performed to see if there is a statistical difference between the average home value by block group in the city of Eau Claire compared to the block groups in the county of Eau Claire. The null hypothesis is that there is no difference between the average home values by block group between the city and county. The alternative hypothesis is that there is a difference between the average home values by block group between the city and county. Because there are 53 block groups within the city of Eau Claire, A Z-Test will be used to find the test statistic. The sample mean, population mean, standard deviation, and number of samples were found using the statistics feature in the attribute table window. Using the the equation from figure 4.1, a test statistic of -2.572 is calculated. because there was no defined confidence level stated in the question, a 95% confidence level was chosen. A 95% confidence level is pretty standard for census data. Also, a one tail test will be performed. The critical value determined with these parameters is -1.64. This was chosen based off of the table in figure 4.0. Because test statistic is lower than the critical value I reject the null hypothesis. This means that statistically there is a difference between the average home values at the block group level in the city of Eau Claire compared to the county. After looking at the Z-Score chart in the back of the text book, the probability of the city of Eau Claire's block groups is .0051. This means that the sample block groups in the city of Eau Claire at in the .51 percentile which is really low.
  A map was created showing average home values by block groups. This is shown below in figure 4.2. The city block groups are shown in the purple map and the county block groups are shown in the green map.
Map Comparing Average Home Values at the Block Group Level in the City and County of Eau Claire
Fig 4.3: Map Comparing Average Home Values at the Block Group Level in the City and County of Eau Claire
  The two different color schemes were chosen because there are different values in the legend and it would be misleading if only one color scheme was used. Looking at county map, it visually looks like many of the block groups have lower average home value than most of the county block groups. This can be rephrased to say that the county block groups have a higher average home value than the city block groups. Many of the block groups in the city are smaller than the ones in the rest of the county. This is why a separate map showing the city block groups was created

No comments:

Post a Comment