Variance and Standard DeviationVarianceDefinitionThe variance defines a measure of the spread or dispersion within a set of data. There are two types: the population variance, usually denoted by $\sigma^2$ and the sample variance is usually denoted by $s^2$. Show
Population VarianceThe population variance is the variance of the population. To calculate the population variance, use the formula \[\sigma^2=\frac{1}{N}\sum\limits_{i=1}^N (x_i-\mu)^2\] where $N$ is the size of the population consisting of $x_1, x_2, \ldots x_N$ and $\mu$ is the population mean. Sample VarianceUsually we only have a sample, the sample variance is the variance of this sample. Given a sample of data of size $n$, the sample variance is calculated using \[s^2=\frac{1}{n-1}\sum\limits_{i=1}^n (x_i-\bar{x})^2 \text{.}\] Make sure you know when to make this distinction. To use the population variance you need all of the data available whereas to use the sample variance you only need a proportion of it. For example, if we take ten words at random from this page to calculate the variance of their length, a sample variance would be needed. To find the population variance, the length of every word on the page would be needed. Variance of a Random VariableFor a discrete random variable $X$, the variance can be worked out as follows: \[\mathrm{Var}[X] = \mathrm{E}[(X- \mathrm{E}[X])^2 ]\text{.}\] However this calculation can take a lot of time as it involves calculating the difference between each element of the sample space and the mean (which is equal to $\mathrm{E}[X]$ and abbreviated as $\mu$), squaring this difference and then finding the expected value of this new set of square differences. If we expand the formula for the variance, we see \begin{align} \mathrm{Var}[X] &= \mathrm{E}[(X - \mathrm{E}[X])^2 ] \\ &= \mathrm{E}[X^2 - 2X \mathrm{E}[X] + \mathrm{E}[X]^2] \\ &= \mathrm{E}[X^2] - 2 \mathrm{E}[X]\mathrm{E}[X] + ( \mathrm{E}[X])^2 \\ &= \mathrm{E}[X^2] - 2 \mathrm{E}[X]^2 + ( \mathrm{E}[X])^2 \\ & = \mathrm{E}[X^2] - (\mathrm{E}[X])^2\text{.} \end{align} So now we have an alternative formula, \[\mathrm{Var}[X] = \mathrm{E}[X^2]- (\mathrm{E}[X])^2\text{.}\] Variance of a discrete random variableGiven a discrete random variable $X$ over a sample space $S$, we can calculate the variance in one of the following ways: \begin{align} \mathrm{Var}[X] &= \sum\limits_{x\in S} \mathrm{P}[X=x](x - \mu)^2\text{,} \\ \mathrm{Var}[X] &= \sum\limits_{x\in S} \{ \mathrm{P}[X=x]\cdot x^2 \} - \mu ^2\text{.} \end{align} Variance of a continuous random variableGiven a continuous random variable $X$ over a sample space $S$ with probability density function $f(x)$, we can calculate the variance in one of the following ways: \begin{align*} \mathrm{Var}[X] &= \int\limits_{x\in S} f(x)\cdot (x - \mu)^2 \mathrm{d} x\text{,} \\ \mathrm{Var}[X] &= \int\limits_{x\in S} (f(x)\cdot x^2 )\mathrm{d} x - \mu ^2\text{.} \end{align*} These results are obtained by combining the definition of variance with the formulae for the expected value for both cases. Note: In the discrete case the mean $\mu = \displaystyle \sum\limits_{x\in S} \mathrm{P}[X=x] \cdot x$ whereas in the continuous case $\displaystyle \mu = \int\limits_{x\in S} f(x)\cdot x \mathrm{d} x$. Standard DeviationDefinitionThe standard deviation, often denoted by $\sigma$, is the positive square root of the variance. Data sets with a small standard deviation are tightly grouped around the mean, whereas a larger standard deviation indicates the data is more spread out. Population Standard DeviationThe population standard deviation is the standard deviation of the entire population and often denoted by $\sigma$. It is given by the formula \[\sigma = \sqrt{\frac{1}{N}\sum\limits_{i=1}^{N} (x_i - \mu)^2}\] where $N$ is the size of the population consisting of $x_1, x_2, \ldots x_N$ and $\mu$ is the population mean. Sample Standard DeviationThe sample standard deviation, often represented by $s$, is calculated using the formula \[s= \sqrt{ \frac{1}{n-1} \sum\limits_{x=1}^n (x_i-\bar{x})^2}\] where $n$ is the number of observations obtained in the sample, $x_1, x_2, \ldots, x_n$ are the obtained observations and $\bar{x}$ is the sample mean. To understand why $\frac{1}{n-1}$ is used rather than $\frac{1}{n}$ see degrees of freedom. Worked ExampleWorked Example The length, in seconds, of the thirteen songs on an album are \[128, 219, 316, 189, 512, 98, 155, 110, 468, 177, 203, 73, 252\text{.}\] Calculate the standard deviation. Solution First calculate the mean. \begin{align} \mu &= \frac{1}{N}\sum\limits_{i=1}^Nx_i \\ &= \frac{1}{13}( 128+ 219+ 316+ 189+ 512+ 98+ 155+ 110+ 468+177 + 203 + 73 + 252) \\ &=\frac{1}{13} (2900) \\ &= 223.0769\text{.} \end{align} Because we have the lengths of every song on the album, we calculate the population standard deviation. This is done using the formula \[\sigma = \sqrt{\frac{1}{N}\sum\limits_{i=1}^N (x_i-\mu)^2 } \text{.}\] So the square distance from the mean of each value needs to be calculated. \begin{align} (x_1-\mu)^2 &= (128-223.0769)^2 = (-95.0769)^2 = 9039.6169 \\ (x_2-\mu)^2 &=(219-223.0769)^2 = (-4.0769)^2 = 16.6211 \\ (x_3-\mu)^2 &=(316-223.0769)^2= (92.9231)^2 = 8634.7025 \\ (x_4-\mu)^2 &=(189-223.0769)^2= (-34.0769)^2 = 1161.2351 \\ (x_5-\mu)^2 &=(512-223.0769)^2= (288.9231)^2 = 83476.5577 \\ (x_6-\mu)^2 &=(98-223.0769)^2 = (-125.0769)^2 = 15644.2309 \\ (x_7-\mu)^2 &=(155-223.0769)^2 = (-68.0769)^2 = 4634.4643 \\ (x_8-\mu)^2 &= (110-223.0769)^2= (-113.0769)^2= 12786.3853 \\ (x_9-\mu)^2 &=(468-223.0769)^2 = (-244.9231)^2 = 59987.3249 \\ (x_{10}-\mu)^2 &= (177-223.0769)^2 = (-46.0769)^2 =2123.0807 \\ (x_{11}-\mu)^2 &= (203-223.0769)^2 = (-20.0769)^2 = 403.0819\\ (x_{12}-\mu)^2 &= (73-223.0769)^2 = (-150.0769)^2 = 22523.0759 \\ (x_{13}-\mu)^2 &= (252-223.0769)^2 = (28.9231)^2 = 836.5457 \end{align} So, by substituting into \[\sigma = \sqrt{\frac{1}{N}\sum\limits_{i=1}^N (x_i-\mu)^2}\] we obtain \[\sigma = 130.4627\text{.}\] Video ExampleDr. Lee Fawcett calculates the standard deviation of a set of data. WorkbookThis workbook produced by HELM is a good revision aid, containing key points for revision and many worked examples.
External Resources
Is population variance equal to sample variance?The sample variance, on average, is equal to the population variance. Let us understand the sample variance formula with the help of an example. Answer: Sample Mean = 142.4 cm, Sample Variance = 66.3 cm2.
Is there a difference between the variance of the population and variance of the sampling distribution of the sample means?“That is, the variance of the sampling distribution of the mean is the population variance divided by N, the sample size (the number of scores used to compute a mean). Thus, the larger the sample size, the smaller the variance of the sampling distribution of the mean.
Is population variance always greater than sample variance?The sample variance can never be zero is always smaller than the true value of the population variance could be smaller; equal to, or larger than the true value of the population variance Is always larger than the true value of the population variance.
How is sample variance different from population?Notice that there's only one tiny difference between the two formulas: What is this? When we calculate population variance, we divide by N (the population size). When we calculate sample variance, we divide by n-1 (the sample size – 1).
|