__Chebyshev's theorem__can be used to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean, regardless of the distribution.

__The empirical rule__can be used to determine the percentage of data values that must be within one, two, and three standard deviations of the mean for data having a bell-shaped distribution.

__Correlation coefficient:__**measures the strength of the linear relationship between x and y.**

**POPULATION PARAMETERS VS SAMPLE STATISTICS**

When we take sample data and calculate a mean, we are calculating a

**The sample mean is used to**

*sample statistic.**‘estimate’*the actual mean of the population we are sampling from. The population mean is referred to as a

**Sample statistics are used to estimate corresponding population parameters. For this reason, sample statistics are often referred to as**

*population parameter.***‘estimators.’**In statistics we often use different symbols to represent sample statistics vs. sample parameters.

For those interested, below is the R code used in tonight's handout.

# *------------------------------------------------------------------ # | PROGRAM NAME: EX_CORRELATION_R # | DATE: 2/8/11 # | CREATED BY: MATT BOGARD # | PROJECT FILE: # *---------------------------------------------------------------- # | PURPOSE: DEMONSTRATION OF CORRELATION USING R # | # *------------------------------------------------------------------ # | COMMENTS: # | # | 1: # | 2: # | 3: # |*------------------------------------------------------------------ # | DATA USED: # | # | # |*------------------------------------------------------------------ # | CONTENTS: # | # | PART 1: positive linear relationship # | PART 2: negative linear relationship # | PART 3: no linear relationship # *----------------------------------------------------------------- # | UPDATES: # | # | # *------------------------------------------------------------------ # *------------------------------------------------------------------ # | # |PART 1: positive linear relationship # | # | # *----------------------------------------------------------------- x <- c(1,2,3,4,5,6,7,8,9,10) y <- c(2,3,4,4,5,6,9,8,8,10) # plot x and y plot(x,y) title("positive linear relationship") # fit a linear regression line to the data (a topic for later in the semester) reg1 <- lm(y~x) print(reg1) # output abline(reg1) # plot line title("positive linear relationship") cov(x,y) # covariance between x and y sd(x) # standard deviation of x sd(y) # standard deviaiton of y cor(x,y) # correlation coefficient for x and y # *------------------------------------------------------------------ # | # |PART 2: negative linear relationship # | # | # *----------------------------------------------------------------- # let's keep the same x as above, but look at new data for y: y2 <- c(9,10,8,7,5,4,6,4,2,1) # read in data for y2 # plot x and y2 plot(x,y2) # fit line to x and y2 reg2 <- lm(x~ y2) summary(reg2) abline(reg2) title("negative linear relationship") cov(x,y2) # covariance between x and y2 sd(x) # standard deviation of x sd(y2) # standard deviation of y2 cor(x,y2) # correlation between x and y # *------------------------------------------------------------------ # | # |PART 3: no linear relationship # | # | # *----------------------------------------------------------------- y3 <- c(5,7,10,5,1,8,7,4,5,9) # read in y3 data plot(x,y3) # plot x and y3 data # fit line to x and y3 reg2 <- lm(x~ y3) summary(reg2) abline(reg2) title("no linear relationship") cov(x,y3) # covariance between x and y3 sd(x) # standard deviation of x sd(y3) # standard deviation of y3 cor(x,y3) # correlation between x and y3