Gov 2002: Problem Set 8
Submission instructions | PDF | Rmd |
Problem Set Instructions
This problem set is due on April 12, 11:59 pm Eastern time. Please upload a PDF of your solutions to gradescope. We will accept hand-written solutions for problems 1-3 but we strongly advise you to typeset your answers in Rmarkdown. Problem 4 should be typeset. Please list the names of other students you worked with on this problem set.
Question 1
Let
Compute
and .Compute
, for the case that with .Now consider a third finite-variance random variable
. Suppose the following CEF is true in the population: Find the partial effects of and on .
Question 2
In this problem we will explore how centering an independent (subtracting off the variable’s mean) affects the interpretation of coefficients in linear projections.
Suppose that
. Let . Find the coefficients of the linear projection in terms of and . Does centering around its mean affect these parameters?Now suppose that
. Derive an expression for the partial effect of , , and the expectation of that partial effect (where the expectation is over the distribution of and ).A common trick with interactions is to center one of the variables for easier interpretation. Let
, where . Rewrite the linear projection as a function of instead of and relate the new coefficient on to the linear projection in (b). That is, write [ L[Y X_1, X_2] = _0 + _1 X_1 + _2 Z_2 + _3 X_1Z_2 ] and express the coefficients in terms of and . How does the coefficient obtained in part (c) relate to the average of the partial effects in (b)? (Hint: you’ll need to add and subtract certain values to obtain the new expression.)In a sentence or two, explain the substantive interpretation of
and why using instead of might be useful. (Hint: consider a case such where is assignment to some treatment and is birth year.) Does this transformation affect the interpretation of the interaction term?
Question 3
This question highlights the importance of the assumptions we make about the population regression function.
(a)
Suppose the following linear model is true in the population for some outcome variable
Show that if this model is true and
(Note that this is the opposite of what we showed in lecture, where we saw that if we assume
(b)
With regression we don’t typically make many distributional assumptions about
Let
Question 4: Regression Analysis of Subprime Loans
This problem will guide you through thinking about the conditional expectation function and how it relates to regression and how we can connect it back to hypothesis testing.
For this problem, we are going to use the subprime data. Recall that these are data collected by the U.S. government on all home lending transactions in Cape Coral and Fort Myers. They contain information on each loan applicant and give information on whether that applicant received a subprime loan (high.rate
) as well as on the amount of the loan (loan.amount
). They also contain basic demographic information such as race, gender, and income.
Assume the data represent the “truth” (i.e., an entire population). Also assume that the data in this population are distributed i.i.d. Take a sample of size 250, without replacement, from this population. Set your seed to 02138
before doing so. You will be working with this sample throughout this problem.
(a)
You care about the relationship between the variables income
and loan.amount
– seems like there should be a relationship between those two, right?
As per usual, you have a friend (you really need to get some new friends) who proposes that you use the following strategy to see if there is a relationship: Create a new income income.bin
variable that takes on four values using the cut()
function in R
:
- a value for if income falls into the [0, 25] percentile range (which you can find via
quantile()
), - a value if it falls into the (25, 50] range,
- a value for the (50, 75] range, and a value for the (75, 100] range.
Note that the lower bounds are NOT inclusive, except for the first range.
Run a regression of loan.amount
on income.bin
, and report the coefficients, standard errors,
(b)
Let’s compare the approach in (a) to using a regression on the original continuous variable. What is an assumption we make in the approach in (a) that we don’t make when we run a regression? We are looking for an assumption related to the fact that you have taken a continuous variable and stratified it into four categories.
(c)
Based on the results of the regression on binned income in (a), do you think the linearity assumption needed for a bivariate regression on the original continuous variable holds in this case? Why or why not?
(d)
In spite of your friend’s opinion, you decide to run a regression of loan.amount
on income
. Run this regression within your sample, and report the coefficients, standard errors,
(e)
Interpret the estimated coefficients. Is the interpretation consistent with the results from (a)?