Tuesday, March 23, 2010

Distribution Functions (Examples)

Below is a chart that shows a number of possible distribution functions and thier relationships.


Source

Sunday, March 21, 2010

Video: Z-Scores

Standard Normal Table

See image below (Click to Enlarge)




Tuesday, March 9, 2010

Basic Demographics of #AgChat Facebook Group Members

Created using R and ‘members to .csv’ Facebook Ap
March 9, 2010


Breakdown by Gender (Click to Enlarge)



Representation by City and State

Augusta , Illinois
Chicago , Illinois
Indianapolis, Indiana
Hampton , Iowa
Miltonvale, Kansas
Louisville , Kentucky
Caneyville, Kentucky
Frankfort , Kentucky
Winnipeg, Manitoba (Canada)
Saginaw , Michigan
Deckerville, Michigan
Springfield , Missouri
Tecumseh, Oklahoma
Portland, Oregon
Fredrikstad , Ostfold (Norway)
Dallas ,Texas
Selah , Washington
Union West , Virginia

# of Members by State (Click to Enlarge)



Representation By Country (Click to Enlarge)
(Canada, Norway & the U.S.)



Notes:
Note:This data is for demonstration purposes only. There were actually 643 members of the #AgChat Facebook group as of this date, but the ‘members to .csv’ ap limits data retrieval to 499 observations, so this represents only a sampling of actual members. Observations are also omitted for missing values for variables in each respective analysis.For instance, only 24 observations of the available 499 had hometown data listed.

Monday, February 1, 2010

R Code for Sample Statistics Problesm

#################################################
# SELECTED PRACTICE PROBLEMS USING R -SCHAUMS
#################################################

# P.49-50

# 3.1
sales<-c(.10,.10,.25,.25,.25,.35,.40,.53,.90,1.25,1.35,2.45,2.71,3.09,4.10)

summary(sales)
sum(sales)

#3.5

salary<-c(240,240,240,240,240,240,240,240,255,255,265,265,280,280,290,300,305,325,330,340)

sum(salary)# just to check your work done by hand

summary(salary)


#---------------------------------
# EXAMPLE OF HISTOGRAM
#---------------------------------


library(lattice)# graphing package

hist(salary) # graph histogram

d <- density(salary) # fit curve to data

plot(d) # plot curve

# 4.9

minutes<-c(5,5,5,7,9,14,15,15,16,18)

print(minutes)

summary(minutes)

var(minutes)

sd(minutes)

library(lattice) # for graphics,but not necessary if previously loaded

hist(minutes)

d<-density(minutes)

plot(d)

#4.24

weights<-c(21,18,30,12,14,17,28,10,16,25)

sum(weights)

summary(weights)

sum(weights*weights) gives sum of X squared

var(weights)

sd(weights)


#4.33

cars<-c(2,4,7,10,10,10,12,12,14,15)

print(cars)

sum(cars)

summary(cars)

sum(cars*cars)

var(cars)

sd(cars)

(4.16866/9.6)*100 #co-efficient of variation

#################################################
# SELECTED PRACTICE PROBLEMS USING R -WILLIAMS
#################################################

#P.107

#1a

sample<-c(10,20,12,17,16)

print(sample)

summary(sample)

#2a

sample<-c(10,20,21,17,16,12)

print(sample)

summary(sample)

#p.151

#61

loans<-c(10.1,14.8,5,10.2,12.4,12.2,2,11.5,17.8,4)

print(loans)

sum(loans)

summary(loans)

sum(loans^2)

var(loans)

sd(loans)

#63

public<-c(28,29,32,37,33,25,29,32,41,34)

print(public)

sum(public)

summary(public)

sum(public^2)

var(public)

sd(public)

(4.64/32)*100 #CV

auto<-c(29,31,33,32,34,30,31,32,35,33)

print(auto)

sum(auto)

summary(auto)

sum(auto^2)

var(auto)

sd(auto)

(1.83/3.33)*100 #CV

Statistics References

I will post links related to statistics that might be useful as resources as you are learning statistics.

Handbook of Biological Statistics - Online stats textbook

NetMBA - Statistics


SatTrek


Standard Normal Table ( TAMU) (pdf)


t-table (1&2tailed) (image)

t-table (image)

Video: Z-scores

Distribution Functions

Data Sets -from Math Forum

Video- Regression Demo

Using R for Intro Statistics- John Veranzi

Sunday, January 31, 2010

Animal Cruelty and Statistical Reasoning

In a recent article, animal rights activists (Mercy for Animals-MFA) went undercover and made some observations about animal abuse on dairy farms. See-
Governor Paterson, Shut This Dairy Down

The author of the above article states:

"But the grisly footage that every farm randomly chosen for investigation--MFA has investigated 11--seems to yield, indicates the violence is not isolated, not coincidental, but agribusiness-as-usual."

Where the statement above could get carried away, is if someone tried to apply it not only to the population of dairy farmers in that state or region, but to the industry as a whole. It's not clear how broadly they are using the term 'agribusiness as usual' but let's say a reader of the article wanted to apply it to the entire dairy industry.

This is exactly why economists and scientists employ statistical methods. Anyone can make outrageous claims about a number of policies, but are these claims really consistent with evidence? How do we determine if some claims are more valid than others?

Statistical inference is the process by which we take a sample and then try to make statements about the population based on what we observe from the sample. If we take a sample (like a sample of dairy farms) and make observations, the fact that our sample was 'random' doesn't necessarily make our conclusions about the population it came from valid.

Before we can say anything about the population, we need to know 'how rare is this sample?' We need to know something about our 'sampling distribution' to make these claims.

According to the USDA, in 2006 there were 75,000 dairy operations in the U.S. According to the activists claims, they 'randomly' sampled 11 dairies and found abuse on all of them. That represents just .0146% of all dairies. If we wanted to investigate the proportion of dairy farms that were abusing animals, if we wanted to be 90% confident in our estimate ( that is construct a 90% confidence interval) and we wanted the estimate (within the confidence interval)to be within a margin of error of .05, then the sample size required to estimate this proportion can be given by the following formula:

n = (z/2E)^2 where

z = value from the standard normal distribution associated with a 90% confidence interval

E = the margin of error

The sample size we would need is: (1.645/2*.05)^2 = (16.45)^2 = 270.65 ~271 farms!

To do this we have to make some assumptions:

Since we don't know the actual proportion of dairy farms that abuse animals, the most objective estimate may be 50%. The formula above is derived based on that assumption. (if we assumed 90% then it turns out based on the math (not shown) that the sample size would have to be the same as if we assumed that only 10% of farms abused their animals, which gives a sample size of about 98 or way more than 11). This also assumes normally distributed data. But to calculate anything, we would have to depend still on someone's subjective opinion of whether a farm was engaging in abuse or not.

I'm sure the article that I'm referring to above was never intended to be scientific, but the author should have chosen their words more carefully. What they have is allegedly a 'random' observation and nothing more. They have no 'empirical' evidence to infer from their 'random' samples that these abuses are 'agribusiness-as-usual' for the whole population of dairy farmers.

While MFA may have evidence sufficient for taking action against these individual dairies, the question becomes how high should the burden of proof be to support an increase in government oversight of the industry as a whole? (which seems to be the goal of many activist organizations)This kind of analysis involves consideration of the tradeoffs involved. This may depend partly on subjective views. We can use statistics to validate claims made on both sides of the debate, but statistical tests have no 'power' in weighing one person's preferences over another. Economics has no way to make interpersonal comparisons of utility.

Note: The University of Iowa has a great number of statistical calculators for doing these sorts of calculations. The sample size option can be found here. In the box, just select 'CI for one proportion' Deselect finite population ( since the population of dairies is quite large at 75,000)then select your level of confidence and margin of error.

References:

Profits, Costs, and the Changing Structure of Dairy Farming / ERR-47
Economic Research Service/USDA Link

"Governor Patterson Shut Down This Dairy", Jan 27,2010. OpEdNews.com