Wednesday, February 9, 2011

Chebyschev, the Empricial Rule, and Few other Basic Concepts



Chebyshev's theorem can be used to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean, regardless of the distribution.
The empirical rule can be used to determine the percentage of data values that must be within one, two, and three standard deviations of the mean for data having a bell-shaped distribution.

Correlation coefficient: measures the strength of the linear relationship between x and y. 

POPULATION PARAMETERS VS SAMPLE STATISTICS
When we take sample data and calculate a mean, we are calculating a sample statistic. The sample mean is used to ‘estimate’ the actual mean of the population we are sampling from. The population mean is referred to as a population parameter. Sample statistics are used to estimate corresponding population parameters. For this reason, sample statistics are often referred to as ‘estimators.’ In statistics we often use different symbols to represent sample statistics vs. sample parameters.
 
For those interested, below is the R code used in tonight's handout.
 


# *------------------------------------------------------------------
# | PROGRAM NAME: EX_CORRELATION_R
# | DATE: 2/8/11
# | CREATED BY: MATT BOGARD 
# | PROJECT FILE:              
# *----------------------------------------------------------------
# | PURPOSE: DEMONSTRATION OF CORRELATION USING R              
# |
# *------------------------------------------------------------------
# | COMMENTS:               
# |
# |  1:  
# |  2: 
# |  3: 
# |*------------------------------------------------------------------
# | DATA USED:               
# |
# |
# |*------------------------------------------------------------------
# | CONTENTS:               
# |
# |  PART 1: positive linear relationship 
# |  PART 2: negative linear relationship
# |  PART 3: no linear relationship
# *-----------------------------------------------------------------
# | UPDATES:               
# |
# |
# *------------------------------------------------------------------
 
 
 
# *------------------------------------------------------------------
# |                
# |PART 1: positive linear relationship
# |  
# |  
# *-----------------------------------------------------------------
 
 
 
x <- c(1,2,3,4,5,6,7,8,9,10)
 
y <- c(2,3,4,4,5,6,9,8,8,10)
 
# plot x and y
 
plot(x,y)  
title("positive linear relationship") 
 
# fit a linear regression line to the data (a topic for later in the semester)
 
reg1 <- lm(y~x) 
print(reg1) # output
abline(reg1) # plot line
title("positive linear relationship")
 
cov(x,y) # covariance between x and y
 
sd(x) # standard deviation of x
sd(y) # standard deviaiton of y
 
cor(x,y) # correlation coefficient for x and y
 
 
# *------------------------------------------------------------------
# |                
# |PART 2: negative linear relationship
# |  
# |  
# *-----------------------------------------------------------------
 
 
 
# let's keep the same x as above, but look at new data for y: 
 
y2 <- c(9,10,8,7,5,4,6,4,2,1) # read in data for y2
 
# plot x and y2
 
plot(x,y2)
 
# fit line to x and y2 
 
reg2 <- lm(x~ y2)
summary(reg2) 
abline(reg2)
title("negative linear relationship")
 
cov(x,y2) # covariance between x and y2
sd(x) # standard deviation of x
sd(y2) # standard deviation of y2
cor(x,y2) # correlation between x and y
 
 
# *------------------------------------------------------------------
# |                
# |PART 3: no linear relationship
# |  
# |  
# *-----------------------------------------------------------------
 
 
y3 <- c(5,7,10,5,1,8,7,4,5,9) # read in y3 data
 
plot(x,y3) # plot x and y3 data
 
# fit line to x and y3
 
 
reg2 <- lm(x~ y3)
summary(reg2) 
abline(reg2)
title("no linear relationship")
 
cov(x,y3) # covariance between x and y3
sd(x) # standard deviation of x
sd(y3) # standard deviation of y3
cor(x,y3) # correlation between x and y3
Created by Pretty R at inside-R.org