Thursday, January 27, 2011

Example: Data Distributions

Given the following data:

240 240 240 240 240 240 240 240 255 255
265 280 280 290 300 305 325 330 340 265

Below is a graphical representation of the distribution for this data. The actual graph is referred to as a histogram, and the green curve is a kernel density estimate, which is an estimation of the probability density function for this data. 


The R code used to generate this graph, and to calculate the mean is below:

salary <- c(240, 240, 240, 240, 240, 240, 240, 240, 255, 255, 265, 280, 280, 290, 300, 305, 325, 330, 340, 265)
 
mean(salary) # calculate the mean
 
hs <- hist(salary) # plot the distribution - histogram and store
                   # the data about the distribution as the variable hs
 
d <- density(salary) # calculate the density function for salary
 
rs <- max(hs$counts)/max(d$y) # resclale the density function so that it can be graphed on the same plot
 
 
lines(d$x, d$y*rs, type ="l", col = 51)  # graph the density function for salary
Created by Pretty R at inside-R.org