Paper 2, Section II, J

(i) Explain what the Moran model and the infinite alleles model are. State Ewens' sampling formula for the distribution of the allelic frequency spectrum $\left(a_{1}, \ldots, a_{n}\right)$ in terms of $\theta$ where $\theta=N u$ with $u$ denoting the mutation rate per individual and $N$ the population size.

Let $K_{n}$ be the number of allelic types in a sample of size $n$ . Give, without justification, an expression for $\mathbb{E}\left(K_{n}\right)$ in terms of $\theta$ .

(ii) Let $K_{n}$ and $\theta$ be as above. Show that for $1 \leqslant k \leqslant n$ we have that

P\left(K_{n}=k\right)=C \frac{\theta^{k}}{\theta(\theta+1) \cdots(\theta+n-1)}

for some constant $C$ that does not depend on $\theta$ .

Show that, given $\left\{K_{n}=k\right\}$ , the distribution of the allelic frequency spectrum $\left(a_{1}, \ldots, a_{n}\right)$ does not depend on $\theta$ .

Show that the value of $\theta$ which maximises $\mathbb{P}\left(K_{n}=k\right)$ is the one for which $k=\mathbb{E}\left(K_{n}\right)$ .