Consider a Yule process starting with a single individual-that is, suppose $X(0)=1$. Let $T_i$ denote the time it takes the process to go from a population of size $i$ to one of size $i+1$.
(a) Argue that $T_i, i=1, \ldots, j$, are independent exponentials with respective rates $i \lambda$.
(b) Let $X_1, \ldots, X_j$ denote independent exponential random variables each having rate $\lambda$, and interpret $X_i$ as the lifetime of component $i$. Argue that
$\max \left(X_1, \ldots, X_j\right)$ can be expressed as
$$
\max \left(X_1, \ldots, X_j\right)=\varepsilon_1+\varepsilon_2+\cdots+\varepsilon_j
$$
where $\varepsilon_1, \varepsilon_2, \ldots, \varepsilon_j$ are independent exponentials with respective rates $j \lambda$, $(j-1) \lambda, \ldots, \lambda$
Hint: Interpret $\varepsilon_i$ as the time between the $i-1$ and the $i$ th failure.
(c) Using (a) and (b) argue that
$$
P\left\{T_1+\cdots+T_j \leqslant t\right\}=\left(1-e^{-\lambda t}\right)^j
$$
(d) Use (c) to obtain that
$$
P_{1 j}(t)=\left(1-e^{-\lambda t}\right)^{j-1}-\left(1-e^{-\lambda t}\right)^j=e^{-\lambda t}\left(1-e^{-\lambda t}\right)^{j-1}
$$
and hence, given $X(0)=1, X(t)$ has a geometric distribution with parameter $p=e^{-\lambda t}$.
(e) Now conclude that
$$
P_{i j}(t)=\left(\begin{array}{l}
j-1 \\
i-1
\end{array}\right) e^{-\lambda t i}\left(1-e^{-\lambda t}\right)^{j-i}
$$