## Problem Set Instructions

This problem set is due on March 24, 11:59 pm Eastern time. Please upload a PDF of your solutions to gradescope. We will accept hand-written solutions but we strongly advise you to typeset your answers in Rmarkdown. Please list the names of other students you worked with on this problem set.

## Question 1

All probability distributions have *moments*, which are standard expressions that define its shape in ways you’ve already heard of and other more nuanced ways (the variance, the skew, kurtosis, etc.). Describing a population distribution (or empirical sample distribution) in terms of its moments is really useful in social science (e.g. the skew of income in the U.S. population is positive) Specifically, the \(n\)th central moment of a random variable \(X\) is defined as \(E[(X-E[X])^n]\), but it is more common to work with the \(n\)th moment defined as as \(E[X^n]\) (getting rid of the \(E[X]^n\) part).

Suppose the random variable \(X\) for your population has the the following first four moments: \(E[X]=1/2\), \(E[X^2]=1/2\), \(E[X^3]=3/4\), \(E[X^4]=3/2\). Suppose you took an i.i.d. sample \(\{X_1,\ldots,X_{20}\}\) of size 20 from this distribution. Let \(T=(X^2_1 +...+ X^2_{20})/20 = \overline{X^2}\), an estimator of the second moment.

What are \(E[T]\) and \(V(T)\)? Be sure to explain why.

Use the central limit theorem to approximate the probability (in

`R`

) that \(T\) is less than or equal to 1.

## Question 2

Suppose sample \(n\) i.i.d draws, whre \(X_i \sim \text{Unif}(0, 1)\), for \(i \in 1...n\). Denote the sample mean by \(\bar{X}_n\).

What is the approximate asymptotic distribution of \(\bar{X}_n\)?

Now define a new random variables \(Z_n = \bar{X}_n^{1/2}\). What is the approximate asymptotic distribution of \(Z_n\)?

Set \(n = \{3, 25, 1000\}\) to see how well your approximation in part (b) works as a function of \(n\) by running \(10000\) simulations for each sample size. Report the ratio of the analytical variance estimate you found in part b over the simulated variance you find. Explain your results.

## Question 3

In class we learned that if a the variance of a sequence of random variables with finite mean goes to zero as \(n\to\infty\), then the sequence will converge in probability to some value. But this is a sufficient condition, not a necessary one. To see this, consider the sequence of random variables \(X_n\) be with probability distribution:

\[ X_n = \begin{cases} 0 & \text{with probability } 1 - 1/n \\ n & \text{with probability } 1/n \\ \end{cases} \]

Find \(\mathbb{E}[X_n]\).

Find \(\text{Var}(X_n)\). Does the variance of the sequence grow or shrink as \(n\) grows?

Use the definition of convergence in probability to show that \(X_n \xrightarrow{p} 0\).

## Question 4

This problem will use the `subprime`

data from last week’s problem set to walk you through a very common inference problem - testing whether the difference between two population values is non-zero. To begin this problem first download `subprime.csv`

and load it into R.

We are going to be interested primarily in the `loan.amount`

variable - the amount that each loan recipient received. Suppose a lawsuit has been filed in U.S. District Court by a group of Fort Myers women who claim that women in the area were loaned less money than men. The defendants – a group of local mortgage lenders – are vigorously denying these claims, and the case is now advancing to trial. Having heard about your expertise in this area, the federal judge hearing the case has brought you in to provide expert testimony. Your task in this problem is to assist the judge in her determination.

Suppose you were only able to interview \(100\) male and \(100\) female loan recipients at random, making them iid. To simulate this in R, set the seed to 02138 and draw \(100\) observations randomly from the male subset of the

`subprime`

data and \(100\) observations randomly from the female subset of the data. These \(200\) observations constitute your sample. Calculate (1) the average loan amount (`loan.amount`

) for women in your sample, (2) the average loan amount for men, and (3) and (4), the sample standard deviation for each. Report those results in a nicely formatted table.Let \(\mu_{m}\) and \(\mu_{w}\) be the population average loan amount for men and women respectively. Let \(\sigma^2_{m}\) and \(\sigma^2_{w}\) be the population variances in loan amount for men and women respectively. Denote the sample average loan amounts for men by \(\bar{X}_{m}\) and for women by \(\bar{X}_{w}\).

What is the expected value of the sampling distribution of \(\bar{X}_{m} - \bar{X}_{w}\)? What is the variance of the sampling distribution of \(\bar{X}_{m} - \bar{X}_{w}\)?

- Compute and report your sample difference in average loan amount for men and women. Recall that for large samples, the sampling distribution of a mean or difference-in-means is approximately normal. Suppose that we know that the true population \(\sigma^2_{m} = 32381.57\) and \(\sigma^2_{w} = 19097.95\). Using the normal approximation, what is the probability that we would observe a difference-in-means at least as extreme as the one in our sample if the true population difference-in-means \(\mu_{m} - \mu_{w}\) equals 0? Note that by “at least as extreme,” we mean a value that is further away from \(0\) than the value we observe - that is, \(P(|\bar{X}_{m} - \bar{X}_{w}| \ge \alpha)\) where \(\alpha\) is our observed value and \(||\) is the absolute value operator.

Hint: In R you can get the probability that a normally distributed random variable takes on a value less than or equal to some value `q`

using the command `pnorm(q, mean, sd)`

where `mean`

is the mean of the normal distribution and `sd`

is the standard deviation.

- Comment on your result in (c). Given what we observe in our sample, is it likely that there is no difference in loan amounts for men and women? A common threshold for “rejecting” our assumed hypothesis that \(\mu_{m} - \mu_{w} = 0\) is observing a sample that would occur with probability \(.05\) or less if that hypothesis were true (that is, a very unlikely sample). Would we reject the hypothesis that there is no difference in average loan amounts between men and women?