Thursday, April 29, 2010

Standard Normal Distribution & Z-values

############################################
# Below is R code for plotting normal distributions - illustrating
# the changes in the distribution when standard deviation changes.
# -selected graphics are imbedded below
#
# Also calculates area under the normal curve for given Z
#
# Finally plots 3-D bivariate normal density
#
#############################################



# PROGRAM NAME: NORMAL_PLOT_R
#
# ECON 206
#
# ORIGINAL SOURCE: The Standard Normal Distribution in R:
# http://msenux.redwoods.edu/math/R/StandardNormal.php
#
#
#
#


#PLOT NORMAL DENSISTY

x=seq(-4,4,length=200)
y=dnorm(x,mean=0,sd=1)
plot(x,y,type="l",lwd=2,col="red")

#######################################





###################################





# INCREASE THE STANDARD DEVIATION

x=seq(-4,4,length=200)
y=dnorm(x,mean=0,sd=2.5)
plot(x,y,type="l",lwd=2,col="red")


#####################################



####################################

# DECREASE THE STANDARD DEVIATION

x=seq(-4,4,length=200)
y=dnorm(x,mean=0,sd=.5)
plot(x,y,type="l",lwd=2,col="red")

# CALCULATING AREA FOR GIVEN Z-VALUES

# RECALL, FOR THE STANDARD NORMAL DISTRIBUTION THE MEAN = 0 AND
# THE STANDARD DEVIATION = 1
# THE pnorm FUNCTION GIVES THE PROBABILITY FOR THE AREA TO THE LEFT
# OF THE SPECIFIED Z-VALUE ( THE FIRST VALUE ENTERED IN THE FUNCTION)
# THE OUTPUT SHOULD MATCH WHAT YOU GET FROM THE NORMAL TABLE IN YOUR BOOK
# OR THE HANDOUT I SENT YOU

pnorm(0,mean=0, sd=1) # Z =0

pnorm(1,mean=0, sd=1) # Z <= 1

pnorm(1.55, mean=0, sd=1) # Z<=1.55

pnorm(1.645, mean=0, sd=1) # Z<= 1.645


# LETS LOOK AT A BIVARIATE NORMAL
# DISTRIBUTION


# first simulate a bivariate normal sample

library(MASS)

bivn <- mvrnorm(1000, mu = c(0, 0), Sigma = matrix(c(1, 0, 0, 1), 2))

# now we do a kernel density estimate

bivn.kde <- kde2d(bivn[,1], bivn[,2], n = 100)

# now plot your results

contour(bivn.kde)
image(bivn.kde)

persp(bivn.kde, phi = 45, theta = 30)


# fancy contour with image

image(bivn.kde); contour(bivn.kde, add = T)

# fancy perspective

persp(bivn.kde, phi = 45, theta = 30, shade = .1, border = NA)

###############################################

Regression Demo

Sunday, April 25, 2010

Basic Regression in R

#
# COMMENTS: BASIC INTRODUCTION TO REGRESSION USING R
#


#
# WILLIAMS P 572 #1
#


# GET DATA

x<-c(1,2,3,4,5)

y<-c(3,7,5,11,14)


# PLOT DATA

plot(x,y)

reg1 <- lm(y~x) # COMPUTE REGRESSION ESTIMATES

summary(reg1) # PRINT OUTPUT

abline(reg1) # PLOT REGRESSION LINE



#
# SCHAUM'S P. 274 #14.2
#


#GET DATA

x<-c(20,16,34,23,27,32,18,22)
y<-c(64,61,84,70,88,92,72,77)

plot(x,y) #PLOT DATA

reg1 <- lm(y~x) # COMPUTE REGRESSION ESTIMATES

summary(reg1) # PRINT OUTPUT

abline(reg1) # PLOT REGRESSION LINE

Social Network Analysis of Tweets Using R

This past week I attended the AACRAO conference in New Orleans. This provided a great opportunity to demonstrate one of the many things you can do wit R. ( for more about the R statistical programming language see here) R is capable of many outside the box applications (beyond just the basic statistical techniques) such as social network analysis.

The image below represents the network of the last 100 tweets using the hashtag #aacrao10. The data was captured and the network was constructed using R. Thanks to Drew Conway for showing me how to do this.

(Click to enlarge)



Labeled dots indicate users that used the specified hashtag while unlabeled dots indicate 'friends' of users that used the specified hashtag.The code also allows you to compute important network metrics such as measures of centrality that are helpful in key actor analysis.

Tuesday, April 6, 2010

Statistics Definitions

SAMPLE POINT- each individual outcome of an experiment
SAMPLE SPACE-the collection of all possible sample points in an experiment
EVENT-a collection of sample points
PRIOR PROBABILITY-initial estimate of the probability of an event
RANDOM VARIABLE-a numerical description of the outcome of an experiment
EXPECTED VALUE OF A RANDOM VARIABLE-a measure of the average value of a random variable
VARIANCE OF A RANDOM VARIABLE-a measure of the dispersion of a random variable
PARAMETER-a numerical measure from a population
STATISTIC- a numerical measure from a sample
SAMPLING DISTRIBUTION-a probability distribution for all possible values of a statistic
PROPERTIES OF ESTIMATORS:
Unbiased: Property of a statistic where the expected value of a statistic/estimator = the parameter being estimated
Consistency: as n increases the probability that the value of a statistic/estimator gets closer to the parameter being estimated increases
CENTRAL LIMIT THEOREM-the sampling distribution of the sample mean can be approximated by a normal probability distribution as the sample size becomes large. This holds even if the population from which the sample mean comes from is not normal, or unknown.
ASYMPTOTIC DISTRIBUTION- an estimator’s sampling distribution in large samples
ASYMPTOTIC PROPERTIES OF AN ESTIMATOR- properties of an estimator in large samples