Is the population variance and sample variance are the same value for the same set of data?

Variance and Standard Deviation

Variance

Definition

The variance defines a measure of the spread or dispersion within a set of data. There are two types: the population variance, usually denoted by $\sigma^2$ and the sample variance is usually denoted by $s^2$.

Population Variance

The population variance is the variance of the population. To calculate the population variance, use the formula \[\sigma^2=\frac{1}{N}\sum\limits_{i=1}^N (x_i-\mu)^2\] where $N$ is the size of the population consisting of $x_1, x_2, \ldots x_N$ and $\mu$ is the population mean.

Sample Variance

Usually we only have a sample, the sample variance is the variance of this sample. Given a sample of data of size $n$, the sample variance is calculated using \[s^2=\frac{1}{n-1}\sum\limits_{i=1}^n (x_i-\bar{x})^2 \text{.}\]

Make sure you know when to make this distinction. To use the population variance you need all of the data available whereas to use the sample variance you only need a proportion of it. For example, if we take ten words at random from this page to calculate the variance of their length, a sample variance would be needed. To find the population variance, the length of every word on the page would be needed.

Variance of a Random Variable

For a discrete random variable $X$, the variance can be worked out as follows:

\[\mathrm{Var}[X] = \mathrm{E}[(X- \mathrm{E}[X])^2 ]\text{.}\]

However this calculation can take a lot of time as it involves calculating the difference between each element of the sample space and the mean (which is equal to $\mathrm{E}[X]$ and abbreviated as $\mu$), squaring this difference and then finding the expected value of this new set of square differences.

If we expand the formula for the variance, we see \begin{align} \mathrm{Var}[X] &= \mathrm{E}[(X - \mathrm{E}[X])^2 ] \\ &= \mathrm{E}[X^2 - 2X \mathrm{E}[X] + \mathrm{E}[X]^2] \\ &= \mathrm{E}[X^2] - 2 \mathrm{E}[X]\mathrm{E}[X] + ( \mathrm{E}[X])^2 \\ &= \mathrm{E}[X^2] - 2 \mathrm{E}[X]^2 + ( \mathrm{E}[X])^2 \\ & = \mathrm{E}[X^2] - (\mathrm{E}[X])^2\text{.} \end{align}

So now we have an alternative formula, \[\mathrm{Var}[X] = \mathrm{E}[X^2]- (\mathrm{E}[X])^2\text{.}\]

Variance of a discrete random variable

Given a discrete random variable $X$ over a sample space $S$, we can calculate the variance in one of the following ways: \begin{align} \mathrm{Var}[X] &= \sum\limits_{x\in S} \mathrm{P}[X=x](x - \mu)^2\text{,} \\ \mathrm{Var}[X] &= \sum\limits_{x\in S} \{ \mathrm{P}[X=x]\cdot x^2 \} - \mu ^2\text{.} \end{align}

Variance of a continuous random variable

Given a continuous random variable $X$ over a sample space $S$ with probability density function $f(x)$, we can calculate the variance in one of the following ways: \begin{align*} \mathrm{Var}[X] &= \int\limits_{x\in S} f(x)\cdot (x - \mu)^2 \mathrm{d} x\text{,} \\ \mathrm{Var}[X] &= \int\limits_{x\in S} (f(x)\cdot x^2 )\mathrm{d} x - \mu ^2\text{.} \end{align*}

These results are obtained by combining the definition of variance with the formulae for the expected value for both cases.

Note: In the discrete case the mean $\mu = \displaystyle \sum\limits_{x\in S} \mathrm{P}[X=x] \cdot x$ whereas in the continuous case $\displaystyle \mu = \int\limits_{x\in S} f(x)\cdot x \mathrm{d} x$.

Standard Deviation

Definition

The standard deviation, often denoted by $\sigma$, is the positive square root of the variance. Data sets with a small standard deviation are tightly grouped around the mean, whereas a larger standard deviation indicates the data is more spread out.

Population Standard Deviation

The population standard deviation is the standard deviation of the entire population and often denoted by $\sigma$. It is given by the formula \[\sigma = \sqrt{\frac{1}{N}\sum\limits_{i=1}^{N} (x_i - \mu)^2}\] where $N$ is the size of the population consisting of $x_1, x_2, \ldots x_N$ and $\mu$ is the population mean.

Sample Standard Deviation

The sample standard deviation, often represented by $s$, is calculated using the formula \[s= \sqrt{ \frac{1}{n-1} \sum\limits_{x=1}^n (x_i-\bar{x})^2}\] where $n$ is the number of observations obtained in the sample, $x_1, x_2, \ldots, x_n$ are the obtained observations and $\bar{x}$ is the sample mean. To understand why $\frac{1}{n-1}$ is used rather than $\frac{1}{n}$ see degrees of freedom.

Worked Example

Worked Example

The length, in seconds, of the thirteen songs on an album are \[128, 219, 316, 189, 512, 98, 155, 110, 468, 177, 203, 73, 252\text{.}\] Calculate the standard deviation.

Solution

First calculate the mean. \begin{align} \mu &= \frac{1}{N}\sum\limits_{i=1}^Nx_i \\ &= \frac{1}{13}( 128+ 219+ 316+ 189+ 512+ 98+ 155+ 110+ 468+177 + 203 + 73 + 252) \\ &=\frac{1}{13} (2900) \\ &= 223.0769\text{.} \end{align}

Because we have the lengths of every song on the album, we calculate the population standard deviation. This is done using the formula \[\sigma = \sqrt{\frac{1}{N}\sum\limits_{i=1}^N (x_i-\mu)^2 } \text{.}\]

So the square distance from the mean of each value needs to be calculated.

\begin{align} (x_1-\mu)^2 &= (128-223.0769)^2 = (-95.0769)^2 = 9039.6169 \\ (x_2-\mu)^2 &=(219-223.0769)^2 = (-4.0769)^2 = 16.6211 \\ (x_3-\mu)^2 &=(316-223.0769)^2= (92.9231)^2 = 8634.7025 \\ (x_4-\mu)^2 &=(189-223.0769)^2= (-34.0769)^2 = 1161.2351 \\ (x_5-\mu)^2 &=(512-223.0769)^2= (288.9231)^2 = 83476.5577 \\ (x_6-\mu)^2 &=(98-223.0769)^2 = (-125.0769)^2 = 15644.2309 \\ (x_7-\mu)^2 &=(155-223.0769)^2 = (-68.0769)^2 = 4634.4643 \\ (x_8-\mu)^2 &= (110-223.0769)^2= (-113.0769)^2= 12786.3853 \\ (x_9-\mu)^2 &=(468-223.0769)^2 = (-244.9231)^2 = 59987.3249 \\ (x_{10}-\mu)^2 &= (177-223.0769)^2 = (-46.0769)^2 =2123.0807 \\ (x_{11}-\mu)^2 &= (203-223.0769)^2 = (-20.0769)^2 = 403.0819\\ (x_{12}-\mu)^2 &= (73-223.0769)^2 = (-150.0769)^2 = 22523.0759 \\ (x_{13}-\mu)^2 &= (252-223.0769)^2 = (28.9231)^2 = 836.5457 \end{align}

So, by substituting into \[\sigma = \sqrt{\frac{1}{N}\sum\limits_{i=1}^N (x_i-\mu)^2}\] we obtain \[\sigma = 130.4627\text{.}\]

Video Example

Dr. Lee Fawcett calculates the standard deviation of a set of data.

Workbook

This workbook produced by HELM is a good revision aid, containing key points for revision and many worked examples.

  • Descriptive statistics including work on standard deviation and variance.

External Resources

  • Variance and standard deviation at MIT

Is population variance equal to sample variance?

The sample variance, on average, is equal to the population variance. Let us understand the sample variance formula with the help of an example. Answer: Sample Mean = 142.4 cm, Sample Variance = 66.3 cm2.

Is there a difference between the variance of the population and variance of the sampling distribution of the sample means?

“That is, the variance of the sampling distribution of the mean is the population variance divided by N, the sample size (the number of scores used to compute a mean). Thus, the larger the sample size, the smaller the variance of the sampling distribution of the mean.

Is population variance always greater than sample variance?

The sample variance can never be zero is always smaller than the true value of the population variance could be smaller; equal to, or larger than the true value of the population variance Is always larger than the true value of the population variance.

How is sample variance different from population?

Notice that there's only one tiny difference between the two formulas: What is this? When we calculate population variance, we divide by N (the population size). When we calculate sample variance, we divide by n-1 (the sample size – 1).