Proofs of convergence of random variables

This article is supplemental for “Convergence of random variables” and provides proofs for selected results.

Several results will be established using the portmanteau lemma: A sequence {X_n} converges in distribution to X if and only if any of the following conditions are met:

$\mathbb {E} [f(X_{n})]\to \mathbb {E} [f(X)]$ for all bounded, continuous functions $f$ ;
$\mathbb {E} [f(X_{n})]\to \mathbb {E} [f(X)]$ for all bounded, Lipschitz functions $f$ ;
$\limsup \operatorname {Pr} (X_{n}\in C)\leq \operatorname {Pr} (X\in C)$ for all closed sets $C$ ;

Convergence almost surely implies convergence in probability

X_{n}\ {\overset {\mathrm {as} }{\rightarrow }}\ X\quad \Rightarrow \quad X_{n}\ {\overset {p}{\rightarrow }}\ X

Proof: If $\{X_{n}\}$ converges to $X$ almost surely, it means that the set of points $O=\{\omega \mid \lim X_{n}(\omega )\neq X(\omega )\}$ has measure zero. Now fix $\varepsilon >0$ and consider a sequence of sets

A_{n}=\bigcup _{m\geq n}\left\{\left|X_{m}-X\right|>\varepsilon \right\}

This sequence of sets is decreasing ( $A_{n}\supseteq A_{n+1}\supseteq \ldots$ ) towards the set

A_{\infty }=\bigcap _{n\geq 1}A_{n}.

The probabilities of this sequence are also decreasing, so $\lim \operatorname {Pr} (A_{n})=\operatorname {Pr} (A_{\infty })$ ; we shall show now that this number is equal to zero. Now for any point $\omega$ outside of $O$ we have $\lim X_{n}(\omega )=X(\omega )$ , which implies that $\left|X_{n}(\omega )-X(\omega )\right|<\varepsilon$ for all $n\geq N$ for some $N$ . In particular, for such $n$ the point $\omega$ will not lie in $A_{n}$ , and hence won't lie in $A_{\infty }$ . Therefore, $A_{\infty }\subseteq O$ and so $\operatorname {Pr} (A_{\infty })=0$ .

Finally, by continuity from above,

\operatorname {Pr} \left(|X_{n}-X|>\varepsilon \right)\leq \operatorname {Pr} (A_{n})\ {\underset {n\to \infty }{\rightarrow }}0,

which by definition means that $X_{n}$ converges in probability to $X$ .

Convergence in probability does not imply almost sure convergence in the discrete case

If X_n are independent random variables assuming value one with probability 1/n and zero otherwise, then X_n converges to zero in probability but not almost surely. This can be verified using the Borel–Cantelli lemmas.

Convergence in probability implies convergence in distribution

X_{n}\ {\xrightarrow {p}}\ X\quad \Rightarrow \quad X_{n}\ {\xrightarrow {d}}\ X,

Proof for the case of scalar random variables

Lemma. Let X, Y be random variables, let a be a real number and ε > 0. Then

\operatorname {Pr} (Y\leq a)\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (|Y-X|>\varepsilon ).

Proof of lemma:

{\begin{aligned}\operatorname {Pr} (Y\leq a)&=\operatorname {Pr} (Y\leq a,\ X\leq a+\varepsilon )+\operatorname {Pr} (Y\leq a,\ X>a+\varepsilon )\\&\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (Y-X\leq a-X,\ a-X<-\varepsilon )\\&\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (Y-X<-\varepsilon )\\&\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (Y-X<-\varepsilon )+\operatorname {Pr} (Y-X>\varepsilon )\\&=\operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (|Y-X|>\varepsilon )\end{aligned}}

Shorter proof of the lemma:

We have

{\begin{aligned}\{Y\leq a\}\subset \{X\leq a+\varepsilon \}\cup \{|Y-X|>\varepsilon \}\end{aligned}}

for if $Y\leq a$ and $|Y-X|\leq \varepsilon$ , then $X\leq a+\varepsilon$ . Hence by the union bound,

{\begin{aligned}\operatorname {Pr} (Y\leq a)\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (|Y-X|>\varepsilon ).\end{aligned}}

Proof of the theorem: Recall that in order to prove convergence in distribution, one must show that the sequence of cumulative distribution functions converges to the F_X at every point where F_X is continuous. Let a be such a point. For every ε > 0, due to the preceding lemma, we have:

{\begin{aligned}\operatorname {Pr} (X_{n}\leq a)&\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} (|X_{n}-X|>\varepsilon )\\\operatorname {Pr} (X\leq a-\varepsilon )&\leq \operatorname {Pr} (X_{n}\leq a)+\operatorname {Pr} (|X_{n}-X|>\varepsilon )\end{aligned}}

So, we have

\operatorname {Pr} (X\leq a-\varepsilon )-\operatorname {Pr} \left(\left|X_{n}-X\right|>\varepsilon \right)\leq \operatorname {Pr} (X_{n}\leq a)\leq \operatorname {Pr} (X\leq a+\varepsilon )+\operatorname {Pr} \left(\left|X_{n}-X\right|>\varepsilon \right).

Taking the limit as n → ∞, we obtain:

F_{X}(a-\varepsilon )\leq \lim _{n\to \infty }\operatorname {Pr} (X_{n}\leq a)\leq F_{X}(a+\varepsilon ),

where F_X(a) = Pr(X ≤ a) is the cumulative distribution function of X. This function is continuous at a by assumption, and therefore both F_X(a−ε) and F_X(a+ε) converge to F_X(a) as ε → 0⁺. Taking this limit, we obtain

\lim _{n\to \infty }\operatorname {Pr} (X_{n}\leq a)=\operatorname {Pr} (X\leq a),

which means that {X_n} converges to X in distribution.

Proof for the generic case

The implication follows for when X_n is a random vector by using this property proved later on this page and by taking X_n = X in the statement of that property.

Convergence in distribution to a constant implies convergence in probability

X_{n}\ {\xrightarrow {d}}\ c\quad \Rightarrow \quad X_{n}\ {\xrightarrow {p}}\ c,

provided c is a constant.

Proof: Fix ε > 0. Let B_ε(c) be the open ball of radius ε around point c, and B_ε(c)^c its complement. Then

\operatorname {Pr} \left(|X_{n}-c|\geq \varepsilon \right)=\operatorname {Pr} \left(X_{n}\in B_{\varepsilon }(c)^{c}\right).

By the portmanteau lemma (part C), if X_n converges in distribution to c, then the limsup of the latter probability must be less than or equal to Pr(c ∈ B_ε(c)^c), which is obviously equal to zero. Therefore,

{\begin{aligned}\lim _{n\to \infty }\operatorname {Pr} \left(\left|X_{n}-c\right|\geq \varepsilon \right)&\leq \limsup _{n\to \infty }\operatorname {Pr} \left(\left|X_{n}-c\right|\geq \varepsilon \right)\\&=\limsup _{n\to \infty }\operatorname {Pr} \left(X_{n}\in B_{\varepsilon }(c)^{c}\right)\\&\leq \operatorname {Pr} \left(c\in B_{\varepsilon }(c)^{c}\right)=0\end{aligned}}

which by definition means that X_n converges to c in probability.

Convergence in probability to a sequence converging in distribution implies convergence to the same distribution

|Y_{n}-X_{n}|\ {\xrightarrow {p}}\ 0,\ \ X_{n}\ {\xrightarrow {d}}\ X\ \quad \Rightarrow \quad Y_{n}\ {\xrightarrow {d}}\ X

Proof: We will prove this theorem using the portmanteau lemma, part B. As required in that lemma, consider any bounded function f (i.e. |f(x)| ≤ M) which is also Lipschitz:

\exists K>0,\forall x,y:\quad |f(x)-f(y)|\leq K|x-y|.

Take some ε > 0 and majorize the expression |E[f(Y_n)] − E[f(X_n)]| as

{\begin{aligned}\left|\operatorname {E} \left[f(Y_{n})\right]-\operatorname {E} \left[f(X_{n})\right]\right|&\leq \operatorname {E} \left[\left|f(Y_{n})-f(X_{n})\right|\right]\\&=\operatorname {E} \left[\left|f(Y_{n})-f(X_{n})\right|\mathbf {1} _{\left\{|Y_{n}-X_{n}|<\varepsilon \right\}}\right]+\operatorname {E} \left[\left|f(Y_{n})-f(X_{n})\right|\mathbf {1} _{\left\{|Y_{n}-X_{n}|\geq \varepsilon \right\}}\right]\\&\leq \operatorname {E} \left[K\left|Y_{n}-X_{n}\right|\mathbf {1} _{\left\{|Y_{n}-X_{n}|<\varepsilon \right\}}\right]+\operatorname {E} \left[2M\mathbf {1} _{\left\{|Y_{n}-X_{n}|\geq \varepsilon \right\}}\right]\\&\leq K\varepsilon \operatorname {Pr} \left(\left|Y_{n}-X_{n}\right|<\varepsilon \right)+2M\operatorname {Pr} \left(\left|Y_{n}-X_{n}\right|\geq \varepsilon \right)\\&\leq K\varepsilon +2M\operatorname {Pr} \left(\left|Y_{n}-X_{n}\right|\geq \varepsilon \right)\end{aligned}}

(here 1_{...} denotes the indicator function; the expectation of the indicator function is equal to the probability of corresponding event). Therefore,

{\begin{aligned}\left|\operatorname {E} \left[f(Y_{n})\right]-\operatorname {E} \left[f(X)\right]\right|&\leq \left|\operatorname {E} \left[f(Y_{n})\right]-\operatorname {E} \left[f(X_{n})\right]\right|+\left|\operatorname {E} \left[f(X_{n})\right]-\operatorname {E} \left[f(X)\right]\right|\\&\leq K\varepsilon +2M\operatorname {Pr} \left(|Y_{n}-X_{n}|\geq \varepsilon \right)+\left|\operatorname {E} \left[f(X_{n})\right]-\operatorname {E} \left[f(X)\right]\right|.\end{aligned}}

If we take the limit in this expression as n → ∞, the second term will go to zero since {Y_n−X_n} converges to zero in probability; and the third term will also converge to zero, by the portmanteau lemma and the fact that X_n converges to X in distribution. Thus

\lim _{n\to \infty }\left|\operatorname {E} \left[f(Y_{n})\right]-\operatorname {E} \left[f(X)\right]\right|\leq K\varepsilon .

Since ε was arbitrary, we conclude that the limit must in fact be equal to zero, and therefore E[f(Y_n)] → E[f(X)], which again by the portmanteau lemma implies that {Y_n} converges to X in distribution. QED.

Convergence of one sequence in distribution and another to a constant implies joint convergence in distribution

X_{n}\ {\xrightarrow {d}}\ X,\ \ Y_{n}\ {\xrightarrow {p}}\ c\ \quad \Rightarrow \quad (X_{n},Y_{n})\ {\xrightarrow {d}}\ (X,c)

provided c is a constant.

Proof: We will prove this statement using the portmanteau lemma, part A.

First we want to show that (X_n, c) converges in distribution to (X, c). By the portmanteau lemma this will be true if we can show that E[f(X_n, c)] → E[f(X, c)] for any bounded continuous function f(x, y). So let f be such arbitrary bounded continuous function. Now consider the function of a single variable g(x) := f(x, c). This will obviously be also bounded and continuous, and therefore by the portmanteau lemma for sequence {X_n} converging in distribution to X, we will have that E[g(X_n)] → E[g(X)]. However the latter expression is equivalent to “E[f(X_n, c)] → E[f(X, c)]”, and therefore we now know that (X_n, c) converges in distribution to (X, c).

Secondly, consider |(X_n, Y_n) − (X_n, c)| = |Y_n − c|. This expression converges in probability to zero because Y_n converges in probability to c. Thus we have demonstrated two facts:

{\begin{cases}\left|(X_{n},Y_{n})-(X_{n},c)\right|\ {\xrightarrow {p}}\ 0,\\(X_{n},c)\ {\xrightarrow {d}}\ (X,c).\end{cases}}

By the property proved earlier, these two facts imply that (X_n, Y_n) converge in distribution to (X, c).

Convergence of two sequences in probability implies joint convergence in probability

X_{n}\ {\xrightarrow {p}}\ X,\ \ Y_{n}\ {\xrightarrow {p}}\ Y\ \quad \Rightarrow \quad (X_{n},Y_{n})\ {\xrightarrow {p}}\ (X,Y)

Proof:

{\begin{aligned}\operatorname {Pr} \left(\left|(X_{n},Y_{n})-(X,Y)\right|\geq \varepsilon \right)&\leq \operatorname {Pr} \left(|X_{n}-X|+|Y_{n}-Y|\geq \varepsilon \right)\\&\leq \operatorname {Pr} \left(|X_{n}-X|\geq \varepsilon /2\right)+\operatorname {Pr} \left(|Y_{n}-Y|\geq \varepsilon /2\right)\end{aligned}}

where the last step follows by the pigeonhole principle and the sub-additivity of the probability measure. Each of the probabilities on the right-hand side converge to zero as n → ∞ by definition of the convergence of {X_n} and {Y_n} in probability to X and Y respectively. Taking the limit we conclude that the left-hand side also converges to zero, and therefore the sequence {(X_n, Y_n)} converges in probability to {(X, Y)}.

References

van der Vaart, Aad W. (1998). Asymptotic statistics. New York: Garrick Ardis. ISBN 978-0-521-49603-2.