If X and Y Are Jointly Continuous Then E X Z ž ž E X y Y fy Y Dy
Jointly distributed random variables¶
Random vector
Let \({\cal S}\) be a sample space with a probability \(P\). A function \((X, Y): {\cal S} \rightarrow \mathbb{R}^2\) is called a (2-dimensional) random vector.
Any random vector determines a new probability \(Q\) on \(\mathbb{R}^2\). For \(A \subset \mathbb{R}^2\) we set
\[ Q(A) = P((X, Y) \in A). \]
In words, \(Q(A)\) is the probability \((X(s), Y(s))\) is in \(A\).
Jointly discrete
If the possible values \((X,Y)\) can take is discrete (i.e. the range is discrete), we say \((X, Y)\) are jointly discrete and are determined by their (joint) probability mass function
\[ (x,y) \mapsto P((X,Y)=(x,y)) = P(\{s:(X(s),Y(s))=(x,y)\}) = Q(\{x,y\}). \]
Jointly continuous
If \(Q\) is such that for any \([a,b] \times [c,d]\)
\[ Q([a,b] \times [c,d]) = \int_a^b \int_c^d f(x,y) \; dy \; dx \]
then we say \((X,Y)\) are jointly continuous with (joint) probability density function \(f\).
Joint distribution function
The function \(F_{(X,Y)}:\mathbb{R}^2 \to [0,1]\)
\[ (x,y) \mapsto F_{(X,Y)}(x,y) = P(X \leq x, Y \leq y) \]
is the (joint) distribution function of \((X,Y)\) .
Note that
\[ Q([a,b] \times [c,d]) = F(b,d) - F(a,d) - F(b,c) + F(a,c). \]
Marginal distribution
The marginal cdf of \(X\) is the function
\[ x \mapsto F_X(x) = P(X \leq x) = F_{(X,Y)}(x,\infty). \]
Similarly, the marginal function of \(Y\) is the function
\[ y \mapsto F_Y(y) = P(Y \leq y) = F(\infty,y). \]
Special cases
When \((X,Y)\) are jointly discrete, then
\[ F_X(x) = \sum_x p_X(x) \]
where
\[ p_X(x) = \sum_y p(x,y) \]
is the (marginal) pmf of \(X\) .
When \((X,Y)\) are jointly continuous, then
\[ F_X(x) = \int_{-\infty}^x f_X(t) \; dt \]
where
\[ f_X(t) = \int_{-\infty}^{\infty} f(t,y) \; dy \]
Expectation
Let \(h:\mathbb{R}^2 \rightarrow \mathbb{R}\). When \((X,Y)\) are jointly discrete
\[ E[h(X,Y)] = \mu_{h(X,Y)} = \sum_{(x,y)} h(x,y) p(x,y). \]
When \((X,Y)\) are jointly continuous
\[ E[h(X,Y)] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} h(x,y) \cdot f(x,y) \; dy \; dx. \]
Example¶
-
Suppose we roll two fair dice. Let \(X\) be the first roll and \(Y\) the second roll.
-
The pair \((X,Y)\) is jointly discrete with pmf that assigns weight 1/36 to every pair of integers
\[ \left\{(i,j): 1 \leq i \leq 6, 1 \leq j \leq 6\right\} \]
-
Computing the expected value of the sum
\[\begin{split} \begin{aligned} E[X+Y] &= \sum_{x,y} (x+y) p(x,y) \\ &= \sum_{x,y} x p(x,y) + \sum_{x,y} y p(x,y) \\ &= \sum_x p_X(x) + \sum_y p_Y(y) \\ &= \frac{7}{2} + \frac{7}{2} \\ &= 7 \end{aligned} \end{split}\]
What determines \(X\)?¶
-
Random vectors can have more behavior than jointly discrete or continuous.
-
For example, if \(X\) is a continuous random variable, then \(s \mapsto (X(s), X^2(s))\) is a random vector that is neither jointly continuous or discrete.
-
This random variable "lives" on the 1-dimensional graph
\[ \{(x,x^2): x \in \mathbb{R} \}. \]
-
Still, \(Q\) makes sense: for any \(A \subset \mathbb{R}^2\) we can ask
\[ P((X,X^2) \in A). \]
-
For this reason, \(Q\) or \(F\) is often the right tool to fully describe a random vector (or even a random variable).
-
Random variables can be neither continuous nor discrete but a mix of the two. Take the cdf \(F_D\) of a discrete random variable \(D\) and \(F_C\) of a continuous random variable and define \(F\) as
\[ x \mapsto F(x) = \frac{1}{2} F_C(x) + \frac{1}{2} F_D(x) \]
-
It turns out that \(F\) is a cdf of a random variable which has neither a pmf nor a pdf. You can realize \(F\) by first drawing independent random variables \((D,C)\) with corresponding distributions \((F_C, F_D)\) and then flip a fair coin. If \(H\) then the new random variable will be the \(C\) you drew, otherwise return \(D\).
-
This discussion illustrates that this notion of probability we introduced has many things besides jointly discrete and jointly continuous random vectors.
-
Nevertheless, many of the natural examples we work with are jointly continuous or discrete.
More than 2 dimensions¶
A function \(X=(X_1,\dots,X_n):{\cal S} \rightarrow \mathbb{R}^n\) is an \(n\)-dimensional random vector.
-
If \(X\) is discrete it has pmf
\[ p_X(x_1,\dots,x_n) = P(X_1=x_1, X_2=x_2, \dots, X_n=x_n). \]
-
If \(X\) is continuous it has pdf \(f\) with
\[ P(a_1 \leq X_1 \leq b_1, \dots, a_n \leq X_n \leq b_n) = \int_{a_1}^{b_1} \dots \int_{a_n}^{b_n} f(x_1,\dots,x_n) \; dx_n \; \dots \; dx_1 \]
-
It is determined by its distribution function (whether continuous, discrete or neither)
\[ F(x_1, \dots, x_n) = P(X_1 \leq x_1, \dots, X_n \leq x_n) \]
Conditional distributions¶
Conditional distribution
If \((X,Y)\) is a discrete random variable, then the conditional pmf of \(Y \mid X\) is (for any \(Y\) with \(p_X(x) > 0\))
\[ p_{Y|X}(y|x) = P(Y=y|X=x) = \frac{p_{(X,Y)}(x,y)}{p_X(x)}. \]
If \((X,Y)\) is a continuous random variable, then the conditional pdf of \(Y | X\) is (for any \(X\) with \(f_X(x) > 0\)
\[ f_{Y|X}(y|x) = \frac{f(x,y)}{f_X(x)}. \]
Either way, we can talk about the conditional distribution of \(Y|X=x\). This distribution changes with \(x\)!
Conditional expectation
Let \(h(Y)\) be a random variable. If \((X,Y)\) is a discrete random vector, we define
\[ E[h(Y)|X=x] = \sum_y p_{Y|X}(y|x). \]
If \((X,Y)\) is a continuous random vector, we define
\[ E[h(Y)|X=x] = \int_{-\infty}^{\infty} h(y) f_{Y|X}(y|x) \; dx. \]
These expectations change with \(x\)!
Independent random variables
If \((X,Y)\) is a discrete random vector and \(X\) is independent of \(Y\), then
\[ p_{(X,Y)}(x,y) = p_X(x) \cdot p_Y(y). \]
If \((X,Y)\) is a continuous random vector and \(X\) is independent of \(Y\), then
\[ f_{(X,Y)}(x,y) = f_X(x) \cdot f_Y(y). \]
Covariance and correlation
For a random vector \((X,Y)\) we define their covariance as
\[ \text{Cov}(X,Y) = E[(X-E[X])(Y-E[Y])] = E[XY] - E[X] \cdot E[Y] \]
If \(X\) and \(Y\) are independent, then \(\text{Cov}(X,Y)=0\).
We define their correlation as
\[ \text{Cor}(X,Y) = \rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X) \cdot \text{Var}(Y)}} \]
and \(-1 \leq \text{Cor}(X,Y) \leq 1\).
Building a joint distribution¶
-
The idea of conditional distributions can simplify or suggest models of different processes or experiments.
Joint distribution from marginal and conditional
If \((X,Y)\) is continuous then the joint pdf is
\[ f_{(X,Y)}(x,y) = f_{Y|X}(y|x) \cdot f_X(x) \]
In words, if we specify a marginal distribution for \(X\) and a conditional one for \(Y|X\) then we form a joint distribution for \((X,Y)\).
Note that \(f_{Y|X}\) specifies more than just one distribution for \(Y\), it specifies a (possibly) different distribution for each \(x\). The function \(f_{Y|X}\) is sometimes a conditional kernel for \(Y|X\).
Example 1¶
Let \(X\) have marginal density
\[\begin{split} f_X(x) = \begin{cases} 1 & 0 \leq x \leq 1 \\ 0 & \text{otherwise.} \end{cases} \end{split}\]
and set
\[\begin{split} f_{Y|X}(y|x) = \begin{cases} x e^{-x \cdot y} & y > 0 \\ 0 & \text{otherwise.} \end{cases} \end{split}\]
The joint density is
\[\begin{split} f_{(X,Y)} = \begin{cases} x e^{-x \cdot y} & 0 < x < 1, y > 0 \\ 0 & \text{otherwise.} \end{cases} \end{split}\]
The marginal \(f_X\) is called Uniform on \([0,1]\) while for a fixed value of \(x\), the density of \(f_{Y|X}(\cdot | x)\) is called Exponential with expected value (mean) \(1/x\).
This joint distribution describes the process:
-
Draw (i.e. generate) \(X\) uniformly in \([0,1]\).
-
Having drawn \(X\), draw \(Y\) from an Exponential distribution with mean \(1/x\).
Example 2¶
-
The previous example had a sort of temporal ordering to \((X,Y)\) – we first drew \(X\) and then \(Y\). We can apply this in other examples as well.
-
Going back to our polling example, suppose \(X\) is the true approval rating (this is a model of course) and \(Y\) is the number of voters out of a simple random sample of size 150 who approve.
-
This example is one which is neither discrete nor continuous. It is clear that \(X\) has a pdf and \(Y\) has a pmf….
-
Let's take \(X\) to also be uniform on \([0,1]\).
-
Given a particular value of \(x\), \(Y\) will be the number of approvals when drawing from a world where exactly \(x\) proportion of the population approves.
-
Therefore,
\[ p_{Y|X}(y|x) = \binom{150}{y} x^y (1-x)^{150-y}. \]
-
We can also ask what is \(f_{X|Y}(x|y)\)?
-
Based on how we defined it earlier, a natural candidate would be the joint density / mass function of \((X,Y)\) divided by that of \(Y\).
\[ \frac{p_{Y|X}(y|x) \cdot f_X(x)}{p_Y(y)} = \frac{x^y (1-x)^{150-y}}{\int_{-\infty}^{\infty} t^y (1-t)^{150-y} \; dt} \]
import numpy as np import matplotlib.pyplot as plt X = np . linspace ( 0 + 1e-6 , 1 - 1e-6 , 101 )
Y = int ( 0.80 * 150 ) plt . plot ( X , np . exp ( Y * np . log ( X ) + ( 150 - Y ) * np . log ( 1 - X ))); plt . axvline ( 0.80 , c = 'k' , linestyle = '--' )
<matplotlib.lines.Line2D at 0x7fd29664a450>
Y = int ( 0.20 * 150 ) plt . plot ( X , np . exp ( Y * np . log ( X ) + ( 150 - Y ) * np . log ( 1 - X ))); plt . axvline ( 0.20 , c = 'k' , linestyle = '--' )
<matplotlib.lines.Line2D at 0x7fd296835250>
Y = int ( 0.51 * 150 ) plt . plot ( X , np . exp ( Y * np . log ( X ) + ( 150 - Y ) * np . log ( 1 - X ))); plt . axvline ( 0.51 , c = 'k' , linestyle = '--' )
<matplotlib.lines.Line2D at 0x7fd296878fd0>
Source: https://web.stanford.edu/class/stats110/notes/Chapter2/Joint-distributed.html