Saturday, March 19, 2011

Applying the Standard Normal Distribution (Z) to Real Data

In a previous post, I introduced the concept of the standard normal variable Z and how it relates to probability. Here I will demonstrate how to translate actual data to Z and make probability interpretations (under the assumption of normality).

Example 1: Assume we are interested in tire mileage, and we know that tires have a population mean mileage μ = 36,500 σ = 5000 x ~ normal.


What is the probability that a tire (x) will last more than 40,000 miles?

To solve this problem, we translate our real word data to data that is distributed standard normal, or we convert x = 40,000 to some value of z and base our probability on z.

Step 1: Calculate Z 

z =  (x-μ)/σ = (40,000-36,500)/5000 = .70

Step 2: Find the corresponding value from the z-table

For z =.70 the value from the table is .7580

Step 3: Interpret and Calculate Probability

The probability of a value > 40,000 is the same as finding a z > .70. Based on the rules we defined for positive z-values, we know that the probability of a value greater than .70 = 1-.7580 = .2420.  This implies that the probability that Z will exceed .70 and that our original data (x =mileage) will exceed 40,000 is 24.2 %.


Example 2: Assume we are interested in tire mileage, and we know that tires have a population mean mileage μ = 36,500 σ = 5000  x ~ normal.


What is the probability that a tire (x) will last less than 30,100 miles?

 To solve this problem, we translate our real word data to data that is distributed standard normal, or we convert x = 40,000 to some value of z and base our probability on z.

Step 1: Calculate Z


z =  (x-μ)/σ = (30,100-36,500)/5000 = -1.28

Step 2: Find the corresponding value from the z-table 

For z = -1.28 the value from the table is .8999 ~ .90

Step 3: Interpret and Calculate Probability 

The probability of a value of z < -1.28 and the probability that x (mileage) will be less than 30,100 is 1-.90 = .10 or 10%.




Example 3: Assume we are interested in tire mileage, and we know that tires have a population mean mileage μ = 36,500 σ = 5000 x ~ normal.


What is the probability that a tire (x) will last less than 50,000 miles?


Step 1: Calculate Z



z =  (x-μ)/σ = (50,000-36,500)/5000 =  2.7


Step 2: Find the corresponding value from the z-table 


For z = 2.7 the value from the table is .9965 


Step 3: Interpret and Calculate Probability 

The probability of z  being less than 2.7 or x (tire mileage) being less than 50,000 is 99.65%. ( remember, for a positive Z, the probability of getting a lower value of Z is obtained from taking the corresponding value directly from the standard normal table)


Example 4: Assume we are interested in tire mileage, and we know that tires have a population mean mileage μ = 36,500 σ = 5000 x ~ normal.


What is the probability that a tire (x) will last between 30,000 and 40,000 miles?

Step 1:  Calculate P( x < 30,000)  

z =  (x-μ)/σ = (30,000-36,500)/5000 = -1.3

Based on the value from the z table, and a negative z value, the probability of getting a lower x (or the probability of mileage < 30,000) = 1-.9032 = .0968 ~ 9.7%

Step 2: Calculate P( x > 40,000)

z =  (x-μ)/σ = (40,000-36,500)/5000 =.70

From example 1 we know this turns out to be 24.2%.

Step 3:  Calculate P(30,000 <= x <= 40,000)

This is the same as finding the probability of a z between -1.3 and .70. We get this by subtracting the results from step 1 and 2:

1 - .097 - .242 = .661 ~ 66.1 % or
100 - 9.7% - 24.2%= 66.1%

Using the Standard Normal Table to Calculate Probability

You may be familiar with the standard normal distribution and how it can be used to make probability interpretations by looking up values from the standard normal table  that correspond to specific values of z. These values from the table represent the area under the standard normal curve, and have a probability interpretation.

There are 3 instances in which you may be looking at z-values. 1) when z is positive, 2) when z is negative, and 3) when you will be looking at ranges between 2 values of z (which could be positive or negative) Below I outline these instances and the rules you will need to know to correctly use the z-table to calculate probabilities.

z = (x-μ)/σ

1) If z is positive:

a) the probability of observing a lower value of z is the area to the left of z and is the value taken directly from the standard normal table.
b) The probability of a larger value of z is the area to the right of z and = 1-(value from the table)

2) If Z is negative:

a) The probability of a lower z is the area to the left of z and  = 1-(value from table)
b) The probability of a higher value of z is the area to the right of z and = the value directly from the table.


3) The probability of getting a value between -z and z is the area between -z and z. You get this by:

a) calculate the area < - Z= A
b) calculate the area > Z =B
c) calculate 1 - A -B

Thursday, March 10, 2011

R Code Example for Google Visualization Chart

# ------------------------------------------------------------------
# | PROGRAM NAME: googleVis_R
# | DATE: 1/12/11
# | CREATED BY: Matt Bogard
# | PROJECT FILE:
# |----------------------------------------------------------------
# | PURPOSE: Tutorial for creating Motion Charts in R with the GoogleVis package
# |
# |
# |
# |------------------------------------------------------------------
# | COMMENTS: See the following references for more details
# |
# | 1: http://blog.revolutionanalytics.com/2011/01/create-motion-charts-in-r-with-the-googlevis-package.html
# | 2: http://stackoverflow.com/questions/4646779/embedding-googlevis-charts-into-a-web-site/4649753#4649753
# | 3: http://cran.r-project.org/web/packages/googleVis/googleVis.pdf
# | 4: for more info on accessing the google API and data format requirements:
# | http://code.google.com/apis/visualization/documentation/gallery/motionchart.html#Data_Format
# |
# |
# |------------------------------------------------------------------
# | DATA USED: via google & iris data set
# |
# |
# |------------------------------------------------------------------
# | CONTENTS:
# |
# | PART 1: motion chart using googleVis data
# | PART 2: motion chart using your own data- this case
# | the well know iris data set (a default R data set)
# | PART 3:
# | PART 4:
# | PART 5:
# |
# |-----------------------------------------------------------------
# | UPDATES:
# |
# |
# ------------------------------------------------------------------
 
 
# *------------------------------------------------------------------*
# | set R working directory- this is where your data file will go
# | with the script for creating the visualization
# *------------------------------------------------------------------*
 
 
setwd("C:\\your directory\\R Data")
 
 
# *------------------------------------------------------------------*
# | install the googleVis package (as with any package, this only has
# | to be done forthe initial first use
# *------------------------------------------------------------------*
 
install.packages('googleVis')
 
 
# *------------------------------------------------------------------*
# | call the googleVis library
# *------------------------------------------------------------------*
 
 
library(googleVis)
 
# *-----------------------------------------------------------------*
# |
# |
# |
# | PART 1: motion chart using googleVis data
# |
# |
# |
# *------------------------------------------------------------------*
 
 
# *------------------------------------------------------------------*
# | create googelVis data object
# *------------------------------------------------------------------*
 
 
M <- gvisMotionChart(Fruits, "Fruit", "Year")
 
 
# *------------------------------------------------------------------*
# | look at data object- this includes the script that
# | will be used if you want to publish on your web page/blog
# *------------------------------------------------------------------*
 
 
print(M)
 
# *------------------------------------------------------------------*
# | plot the visualization-this command will open your default browser
# | and produce the visualization - this may not work depending on your
# | security and browser settings
# *------------------------------------------------------------------*
 
plot(M)
 
 
# *------------------------------------------------------------------*
# | create the data object that contains everything necessary to create the
# | chart on your web site/blog
# *------------------------------------------------------------------*
 
 
M$html$chart
 
# *------------------------------------------------------------------*
# | save the data object, which is an html file in your R
# | data directory
# *------------------------------------------------------------------*
 
cat(M$html$chart, file="tmp.html")
 
# from this point you can open the file in say, notepad++ and copy the
# script into your blog or web page and the motion chare will be functional
 
 
 
# *-----------------------------------------------------------------*
# |
# |
# |
# | PART 2: motion chart using your own data- this case
# | the well know iris data set (a default R data set)
# |
# |
# |
# *------------------------------------------------------------------*
 
 
 
 
# *------------------------------------------------------------------*
# | take a look at the data
# *------------------------------------------------------------------*
 
names(iris)
print(iris)
 
 
# simulate a time variable and add it to the data set
 
iris$time <- rep(1:50, 3)
names(iris)
 
# *------------------------------------------------------------------*
# | create googelVis data object
# *------------------------------------------------------------------*
 
r <- gvisMotionChart(iris, "Species", "time")
 
# *------------------------------------------------------------------*
# | look at data object- this includes the script that
# | will be used if you want to publish on your web page/blog
# *------------------------------------------------------------------*
 
names(r)
print(r)
 
# *------------------------------------------------------------------*
# | plot the visualization-this command will open your default browser
# | and produce the visualization - this may not work depending on your
# | security and browser settings
# *------------------------------------------------------------------*
 
plot(r)
 
# *------------------------------------------------------------------*
# | create the data object that contains everything necessary to create the
# | chart on your web site/blog
# *------------------------------------------------------------------*
 
 
r$html$chart
 
# *------------------------------------------------------------------*
# | save the data object, which is an html file in your R
# | data directory
# *------------------------------------------------------------------*
 
cat(r$html$chart, file="tmp2.html") 
Created by Pretty R at inside-R.org

Visualizing Taxes and Deficits II

(flash enabled browser required)



For the simplest visualization, deselect 'trails' and select (checkbox) the variables DEFICIT, INCOME_TAX, TOTAL_REVENUE, SPENDING

For the best visualization, deselect 'trails' under color select 'unique colors' for Size select 'IN_BILLIONS'  Select variables DEFICIT, INCOME_TAX, TOTAL_REVENUE

In any case, notice how early on, in the years following cuts in marginal income taxes, total revenues are increasing, revenues from income taxes are increasing, and the DEFICIT IS PLUNGING. All along spending is steadily incresing. Then about 2008 the deficit explodes, both in billions of dollars and as a percentage of GDP as tax revenues start to plunge. Spending also increases dramatically. 

It is also interesting to click on the barcode tab (at the top of the chart) and watch the relative size and position of the bars change with respect to revenue, spending, and deficits.

The data source is the CBO Budget/Historical Tables. I'd provide a link but it moves around constantly. Just Google it and did for it and you can find the data. (or see below)

This was produced  using the R google Vis package (for example code see here).
This is the format required for the R googleVis package. (I saved it as a csv file)

BUDGET_ITEM YEAR IN_BILLIONS PCT_GDP
CORP_TAX 2003 131.8 34.90604764
CORP_TAX 2004 189.4 45.88989817
CORP_TAX 2005 278.3 87.42060525
CORP_TAX 2006 353.9 142.5975397
CORP_TAX 2007 370.2 230.3657102
CORP_TAX 2008 304.3 667.9837559
CORP_TAX 2009 138.2 9.782782586
DEFICIT 2003 377.585 3.381560093
DEFICIT 2004 412.727 3.507495538
DEFICIT 2005 318.346 2.556790619
DEFICIT 2006 248.181 1.881156674
DEFICIT 2007 160.701 1.15429536
DEFICIT 2008 45.555 0.311849671
DEFICIT 2009 1412.686 9.313594409
INCOME_TAX 2003 793.7 7.108185563
INCOME_TAX 2004 809 6.875159344
INCOME_TAX 2005 927.2 7.446791422
INCOME_TAX 2006 1043.9 7.912529372
INCOME_TAX 2007 1163.5 8.357276253
INCOME_TAX 2008 1145.7 7.84296276
INCOME_TAX 2009 915.3 6.034414557
SPENDING 2003 2159.906 19.34359663
SPENDING 2004 2292.853 19.48545084
SPENDING 2005 2471.971 19.85359409
SPENDING 2006 2655.057 20.12474039
SPENDING 2007 2728.702 19.59992817
SPENDING 2008 2982.554 20.41726451
SPENDING 2009 3517.681 23.19146229
TOTAL_REVENUE 2003 131.8 1.180368977
TOTAL_REVENUE 2004 189.4 1.609586131
TOTAL_REVENUE 2005 278.3 2.235161834
TOTAL_REVENUE 2006 353.9 2.682483135
TOTAL_REVENUE 2007 370.2 2.659100704
TOTAL_REVENUE 2008 304.3 2.083105148
TOTAL_REVENUE 2009 138.2 0.911128692

Visualizing Taxes and Deficits

There has been a lot of debate about the impact of the early decade tax cuts on economic activity and deficits.

As the chart below depicts, from 2000-2009, we saw drastic increases in revenues (nearly 30% from 2001-2007) in the face of marginal tax cuts. Any deficit that resulted would have to be attributed to expenditures or outlays, and could not be attributed to cuts in marginal tax rates. As the graph shows, outlays also increased during this period, but even more drastically by 46%!


 As the next graphic shows, early on we saw a fairly rapid increase in the budget deficit from 2002-2003, a tapering off from 2003-2004 and  a rapid decline from 2004-2007, by as much as 61%! This is very impressive given the large amounts of spending increases depicted above. If it were not for the large influx of tax revenues during this period (in the face of marginal tax cuts) the deficit likely would have been on the increase vs. the precipitous fall depicted below.



However, on the heals of the financial crisis, going into 2008 & 2009, we start to see declining revenues, and unprecedented increases in spending and the deficit. From 2007 - 2009 we saw an increase in spending by about 28%, and an 88% increase over 2001 levels.  (indicated by the drastic upturn in outlays in the first graph)

But the impacts on the deficit were even more dramatic. From 2007-2008 we saw a 185% increase in the deficit, from 2008-2009 the deficit increased by 208%! Overall, compared to the 2002 levels that was an increase in the deficit of almost 800% over 7 years. If you compare to the 2007 low, considering the drastic reductions in the deficit after the tax cuts,  that is nearly an 800% increase in the deficit in just 3 years!

From 2004-2007 there was a rapid decline followed by a spike in the deficit during 2008 & 2009

Looking at the data, it appears that the reduction in marginal tax rates in the 2000's did not coincide with the rapid increase in the budget deficit that occurred at the end of the decade, but in fact were in step with the very rapid reduction in the budget deficit through 2007.

Most likely the increased in the in later years resulted from decreased revenues and increased expenditures associated with the financial crisis, not cuts in marginal tax rates. The real question becomes what was the cause of the financial crisis? There is no solid macroeconomic theory that links tax rates to business cycles, but many competing theories on business cycles as they relate to monetary policy or shocks to the production function.

This adhoc analysis however does not prove that the effect of marginal tax cuts on the economy as a whole were positive or nrgative. Looking at one or two variables at a time leaves lots of room for interpretation and errors. Only by building and testing models that specify multiple relationships among variables can you truly gauge the impact of the tax cuts on the deficit and economic output.  Lawrence Lindsey did this in 1987, looking specifically at revenue from income taxes paid by those earning over $200,000. Others have looked at the impact of tax cuts on economic activity, in terms of multipliers, and other research has been done relating taxes, spending, and unemployment (see references below). That is the proper context to view the impact of tax cuts or any policy analysis.

References:

Lindsey, Lawrence B. 1987. “Individual Taxpayer Response to Taxcuts, 1982-1984.” J. of Public Economics 33 (July) 173-206

WHY DO EUROPEANS WORK (MUCH) LESS? IT IS TAXES AND GOVERNMENT SPENDING
Economic Inquiry, 2008, vol. 46, issue 2, pages 197-207

Economist Greg Mankiw gives a great review of the empirical work related to tax cuts and spending multipliers here on his blog:  http://gregmankiw.blogspot.com/2008/12/spending-and-tax-multipliers.html

Data Used: U.S. Budget Historical Tables http://www.whitehouse.gov/omb/budget/fy2009/hist.html (accessed Feb 2, 2009)


RECEIPTS  OUTLAYS      DEFICIT
2000 ............................................................................... 2,025,198  1,788,957 236,241
2001 ............................................................................... 1,991,142 1,862,906 128,236
2002 ............................................................................... 1,853,149 2,010,907 –157,758
2003 ............................................................................... 1,782,321 2,159,906 –377,585
2004 ............................................................................... 1,880,126 2,292,853 –412,727
2005 ............................................................................... 2,153,625 2,471,971 –318,346
2006 ............................................................................... 2,406,876 2,655,057 –248,181
2007 ............................................................................... 2,568,001 2,728,702 –160,701
2008 ............................................................................... 2,523,999 2,982,554 –458,555
2009 ............................................................................... 2,104,995 3,517,681 –1,412,686