Principles Of Statistics
Jump to year
Paper 1, Section II, J
commentLet be random variables with joint probability density function in a statistical model .
(a) Define the Fisher information . What do we mean when we say that the Fisher information tensorises?
(b) Derive the relationship between the Fisher information and the derivative of the score function in a regular model.
(c) Consider the model defined by and
where are i.i.d. random variables, and is a known constant. Compute the Fisher information . For which values of does the Fisher information tensorise? State a lower bound on the variance of an unbiased estimator in this model.
Paper 2, Section II, J
commentLet be i.i.d. random observations taking values in with a continuous distribution function . Let for each .
(a) State the Kolmogorov-Smirnov theorem. Explain how this theorem may be used in a goodness-of-fit test for the null hypothesis , with continuous.
(b) Suppose you do not have access to the quantiles of the sampling distribution of the Kolmogorov-Smirnov test statistic. However, you are given i.i.d. samples with distribution function . Describe a test of with size exactly .
(c) Now suppose that are i.i.d. taking values in with probability density function , with . Define the density estimator
Show that for all and all ,
Paper 3, Section II, J
commentLet iid for some known and some unknown . [The gamma distribution has probability density function
and its mean and variance are and , respectively.]
(a) Find the maximum likelihood estimator for and derive the distributional limit of . [You may not use the asymptotic normality of the maximum likelihood estimator proved in the course.]
(b) Construct an asymptotic -level confidence interval for and show that it has the correct (asymptotic) coverage.
(c) Write down all the steps needed to construct a candidate to an asymptotic -level confidence interval for using the nonparametric bootstrap.
Paper 4, Section II, J
commentSuppose that , and suppose the prior on is a gamma distribution with parameters and . [Recall that has probability density function
and that its mean and variance are and , respectively. ]
(a) Find the -Bayes estimator for for the quadratic loss, and derive its quadratic risk function.
(b) Suppose we wish to estimate . Find the -Bayes estimator for for the quadratic loss, and derive its quadratic risk function. [Hint: The moment generating function of a Poisson distribution is for , and that of a Gamma distribution is for .]
(c) State a sufficient condition for an admissible estimator to be minimax, and give a proof of this fact.
(d) For each of the estimators in parts (a) and (b), is it possible to deduce using the condition in (c) that the estimator is minimax for some value of and ? Justify your answer.
Paper 1, Section II, J
commentState and prove the Cramér-Rao inequality for a real-valued parameter . [Necessary regularity conditions need not be stated.]
In a general decision problem, define what it means for a decision rule to be minimax.
Let be i.i.d. from a distribution, where . Prove carefully that is minimax for quadratic risk on .
Paper 2, Section II, J
commentConsider from a distribution with parameter . Derive the likelihood ratio test statistic for the composite hypothesis
where is the parameter space constrained by .
Prove carefully that
where is a Chi-Square distribution with one degree of freedom.
Paper 3, Section II, J
commentLet , let be a probability density function on and suppose we are given a further auxiliary conditional probability density function , on from which we can generate random draws. Consider a sequence of random variables generated as follows:
For and given , generate a new draw .
Define
where .
(i) Show that the Markov chain has invariant measure , that is, show that for all (measurable) subsets and all we have
(ii) Now suppose that is the posterior probability density function arising in a statistical model with observations and a prior distribution on . Derive a family such that in the above algorithm the acceptance probability is a function of the likelihood ratio , and for which the probability density function has covariance matrix for all .
Paper 4 , Section II, J
commentConsider drawn from a statistical model , with non-singular Fisher information matrix . For , define likelihood ratios
Next consider the probability density functions of normal distributions with corresponding likelihood ratios given by
Show that for every fixed , the random variables converge in distribution as to
[You may assume suitable regularity conditions of the model without specification, and results on uniform laws of large numbers from lectures can be used without proof.]
Paper 1, Section II, J
commentIn a regression problem, for a given fixed, we observe such that
for an unknown and random such that for some known .
(a) When and has rank , compute the maximum likelihood estimator for . When , what issue is there with the likelihood maximisation approach and how many maximisers of the likelihood are there (if any)?
(b) For any fixed, we consider minimising
over . Derive an expression for and show it is well defined, i.e., there is a unique minimiser for every and .
Assume and that has rank . Let and note that for some orthogonal matrix and some diagonal matrix whose diagonal entries satisfy . Assume that the columns of have mean zero.
(c) Denote the columns of by . Show that they are sample principal components, i.e., that their pairwise sample correlations are zero and that they have sample variances , respectively. [Hint: the sample covariance between and is .]
(d) Show that
Conclude that prediction is the closest point to within the subspace spanned by the normalised sample principal components of part (c).
(e) Show that
Assume for some . Conclude that prediction is approximately the closest point to within the subspace spanned by the normalised sample principal components of part (c) with the greatest variance.
Paper 2, Section II, J
comment(a) We consider the model and an i.i.d. sample from it. Compute the expectation and variance of and check they are equal. Find the maximum likelihood estimator for and, using its form, derive the limit in distribution of .
(b) In practice, Poisson-looking data show overdispersion, i.e., the sample variance is larger than the sample expectation. For and , let ,
Show that this defines a distribution. Does it model overdispersion? Justify your answer.
(c) Let be an i.i.d. sample from . Assume is known. Find the maximum likelihood estimator for .
(d) Furthermore, assume that, for any converges in distribution to a random variable as . Suppose we wanted to test the null hypothesis that our data arises from the model in part (a). Before making any further computations, can we necessarily expect to follow a normal distribution under the null hypothesis? Explain. Check your answer by computing the appropriate distribution.
[You may use results from the course, provided you state it clearly.]
Paper 3, Section II, J
commentWe consider the exponential model , where
We observe an i.i.d. sample from the model.
(a) Compute the maximum likelihood estimator for . What is the limit in distribution of ?
(b) Consider the Bayesian setting and place a , prior for with density
where is the Gamma function satisfying for all . What is the posterior distribution for ? What is the Bayes estimator for the squared loss?
(c) Show that the Bayes estimator is consistent. What is the limiting distribution of ?
[You may use results from the course, provided you state them clearly.]
Paper 4, Section II, J
commentWe consider a statistical model .
(a) Define the maximum likelihood estimator (MLE) and the Fisher information
(b) Let and assume there exist a continuous one-to-one function and a real-valued function such that
(i) For i.i.d. from the model for some , give the limit in almost sure sense of
Give a consistent estimator of in terms of .
(ii) Assume further that and that is continuously differentiable and strictly monotone. What is the limit in distribution of . Assume too that the statistical model satisfies the usual regularity assumptions. Do you necessarily expect for all ? Why?
(iii) Propose an alternative estimator for with smaller bias than if for some with .
(iv) Further to all the assumptions in iii), assume that the MLE for is of the form
What is the link between the Fisher information at and the variance of ? What does this mean in terms of the precision of the estimator and why?
[You may use results from the course, provided you state them clearly.]
Paper 1, Section II,
commentA scientist wishes to estimate the proportion of presence of a gene in a population of flies of size . Every fly receives a chromosome from each of its two parents, each carrying the gene with probability or the gene with probability , independently. The scientist can observe if each fly has two copies of the gene A (denoted by AA), two copies of the gene (denoted by BB) or one of each (denoted by AB). We let , and denote the number of each observation among the flies.
(a) Give the probability of each observation as a function of , denoted by , for all three values , or .
(b) For a vector , we let denote the estimator defined by
Find the unique vector such that is unbiased. Show that is a consistent estimator of .
(c) Compute the maximum likelihood estimator of in this model, denoted by . Find the limiting distribution of . [You may use results from the course, provided that you state them clearly.]
Paper 2, Section II,
commentWe consider the model of a Gaussian distribution in dimension , with unknown mean and known identity covariance matrix . We estimate based on one observation , under the loss function
(a) Define the risk of an estimator . Compute the maximum likelihood estimator of and its risk for any .
(b) Define what an admissible estimator is. Is admissible?
(c) For any , let be the prior . Find a Bayes optimal estimator under this prior with the quadratic loss, and compute its Bayes risk.
(d) Show that is minimax.
[You may use results from the course provided that you state them clearly.]
Paper 3, Section II, K
commentIn the model of a Gaussian distribution in dimension , with unknown mean and known identity covariance matrix , we estimate based on a sample of i.i.d. observations drawn from .
(a) Define the Fisher information , and compute it in this model.
(b) We recall that the observed Fisher information is given by
Find the limit of , where is the maximum likelihood estimator of in this model.
(c) Define the Wald statistic and compute it. Give the limiting distribution of and explain how it can be used to design a confidence interval for .
[You may use results from the course provided that you state them clearly.]
Paper 4, Section II,
commentLet be an unknown function, twice continuously differentiable with for all . For some , we know the value and we wish to estimate its derivative . To do so, we have access to a pseudo-random number generator that gives i.i.d. uniform over , and a machine that takes input and returns , where the are i.i.d. .
(a) Explain how this setup allows us to generate independent , where the take value 1 or with probability , for any .
(b) We denote by the output . Show that for some independent
(c) Using the intuition given by the least-squares estimator, justify the use of the estimator given by
(d) Show that
Show that for some choice of parameter , this implies
Paper 1, Section II,
commentFor a positive integer , we want to estimate the parameter in the binomial statistical model , based on an observation .
(a) Compute the maximum likelihood estimator for . Show that the posterior distribution for under a uniform prior on is , and specify and . [The p.d.f. of is given by
(b) (i) For a risk function , define the risk of an estimator of , and the Bayes risk under a prior for .
(ii) Under the loss function
find a Bayes optimal estimator for the uniform prior. Give its risk as a function of .
(iii) Give a minimax optimal estimator for the loss function given above. Justify your answer.
Paper 2, Section II,
commentWe consider the problem of estimating in the model , where
Here is the indicator of the set , and is known. This estimation is based on a sample of i.i.d. , and we denote by the ordered sample.
(a) Compute the mean and the variance of . Construct an unbiased estimator of taking the form , where , specifying .
(b) Show that is consistent and find the limit in distribution of . Justify your answer, citing theorems that you use.
(c) Find the maximum likelihood estimator of . Compute for all real . Is unbiased?
(d) For , show that has a limit in for some . Give explicitly the value of and the limit. Why should one favour using over ?
Paper 3, Section II,
commentWe consider the problem of estimating an unknown in a statistical model where , based on i.i.d. observations whose distribution has p.d.f. .
In all the parts below you may assume that the model satisfies necessary regularity conditions.
(a) Define the score function of . Prove that has mean 0 .
(b) Define the Fisher Information . Show that it can also be expressed as
(c) Define the maximum likelihood estimator of . Give without proof the limits of and of ) (in a manner which you should specify). [Be as precise as possible when describing a distribution.]
(d) Let be a continuously differentiable function, and another estimator of such that with probability 1 . Give the limits of and of (in a manner which you should specify).
Paper 4, Section II,
commentFor the statistical model , where is a known, positive-definite matrix, we want to estimate based on i.i.d. observations with distribution .
(a) Derive the maximum likelihood estimator of . What is the distribution of ?
(b) For , construct a confidence region such that .
(c) For , compute the maximum likelihood estimator of for the following parameter spaces:
(i) .
(ii) for some unit vector .
(d) For , we want to test the null hypothesis (i.e. against the composite alternative . Compute the likelihood ratio statistic and give its distribution under the null hypothesis. Compare this result with the statement of Wilks' theorem.
Paper 1, Section II,
commentDerive the maximum likelihood estimator based on independent observations that are identically distributed as , where the unknown parameter lies in the parameter space . Find the limiting distribution of as .
Now define
and find the limiting distribution of as .
Calculate
for the choices and . Based on the above findings, which estimator of would you prefer? Explain your answer.
[Throughout, you may use standard facts of stochastic convergence, such as the central limit theorem, provided they are clearly stated.]
Paper 2, Section II,
comment(a) State and prove the Cramér-Rao inequality in a parametric model , where . [Necessary regularity conditions on the model need not be specified.]
(b) Let be i.i.d. Poisson random variables with unknown parameter . For and define
Show that for all values of .
Now suppose is an estimator of with possibly nonzero bias . Suppose the function is monotone increasing on . Prove that the mean-squared errors satisfy
Paper 3, Section II, J
commentLet be i.i.d. random variables from a distribution, , and consider a Bayesian model for the unknown parameter, where is a fixed constant.
(a) Derive the posterior distribution of .
(b) Construct a credible set such that
(i) for every , and
(ii) for any ,
where denotes the distribution of the infinite sequence when drawn independently from a fixed distribution.
[You may use the central limit theorem.]
Paper 4, Section II, J
commentConsider a decision problem with parameter space . Define the concepts of a Bayes decision rule and of a least favourable prior.
Suppose is a prior distribution on such that the Bayes risk of the Bayes rule equals , where is the risk function associated to the decision problem. Prove that is least favourable.
Now consider a random variable arising from the binomial distribution , where . Construct a least favourable prior for the squared risk . [You may use without proof the fact that the Bayes rule for quadratic risk is given by the posterior mean.]
Paper 1, Section II, J
commentConsider a normally distributed random vector modelled as where is the identity matrix, and where . Define the Stein estimator of .
Prove that dominates the estimator for the risk function induced by quadratic loss
Show however that the worst case risks coincide, that is, show that
[You may use Stein's lemma without proof, provided it is clearly stated.]
Paper 2, Section II, J
commentConsider a random variable arising from the binomial distribution , . Find the maximum likelihood estimator and the Fisher information for .
Now consider the following priors on :
(i) a uniform prior on ,
(ii) a prior with density proportional to ,
(iii) a prior.
Find the means and modes of the posterior distributions corresponding to the prior distributions (i)-(iii). Which of these posterior decision rules coincide with ? Which one is minimax for quadratic risk? Justify your answers.
[You may use the following properties of the distribution. Its density , is proportional to , its mean is equal to , and its mode is equal to
provided either or .
You may further use the fact that a unique Bayes rule of constant risk is a unique minimax rule for that risk.]
Paper 3, Section II, J
commentDefine what it means for an estimator of an unknown parameter to be consistent.
Let be a sequence of random real-valued continuous functions defined on such that, as converges to in probability for every , where is non-random. Suppose that for some and every we have
and that has exactly one zero for every . Show that as , and deduce from this that the maximum likelihood estimator (MLE) based on observations from a model is consistent.
Now consider independent observations of bivariate normal random vectors
where and is the identity matrix. Find the MLE of and show that the MLE of equals
Show that is not consistent for estimating . Explain briefly why the MLE fails in this model.
[You may use the Law of Large Numbers without proof.]
Paper 4, Section II,
commentGiven independent and identically distributed observations with finite mean and variance , explain the notion of a bootstrap sample , and discuss how you can use it to construct a confidence interval for .
Suppose you can operate a random number generator that can simulate independent uniform random variables on . How can you use such a random number generator to simulate a bootstrap sample?
Suppose that and are cumulative probability distribution functions defined on the real line, that as for every , and that is continuous on . Show that, as ,
State (without proof) the theorem about the consistency of the bootstrap of the mean, and use it to give an asymptotic justification of the confidence interval . That is, prove that as where is the joint distribution of
[You may use standard facts of stochastic convergence and the Central Limit Theorem without proof.]
Paper 1, Section II, J
commentState without proof the inequality known as the Cramér-Rao lower bound in a parametric model . Give an example of a maximum likelihood estimator that attains this lower bound, and justify your answer.
Give an example of a parametric model where the maximum likelihood estimator based on observations is biased. State without proof an analogue of the Cramér-Rao inequality for biased estimators.
Define the concept of a minimax decision rule, and show that the maximum likelihood estimator based on in a model is minimax for estimating in quadratic risk.
Paper 2, Section II, J
In a general decision problem, define the concepts of a Bayes rule and of admissibility. Show that a unique Bayes rule is admissible.
Consider i.i.d. observations from a , model. Can the maximum likelihood estimator of be a Bayes rule for estimating in quadratic risk for any prior distribution on that has a continuous probability density on Justify your answer.
Now model the as i.i.d. copies of , where is drawn from a prior that is a Gamma distribution with parameters and (given below). Show that the posterior distribution of is a Gamma distribution and find its parameters. Find the Bayes rule