What two aspects of the data determine which measure of central tendency to use?

A measure of central tendency is an important aspect of quantitative data. It is an estimate of a “typical” value. Maria may be asked for the typical number of children seen per month.

Three of the many ways to measure central tendency are the mean, median and mode.

There are other measures, such as a trimmed mean, that we do not discuss here.

MeanThe mean is the average of data.

NOTE: At this point, we are going to start to use some basic notation to represent numbers as we present formulas and ways of calculating.  When you read "Let (some confusing symbols) represent" we are trying to convey the formula in a "generic" way.  If this gets confusing, skim over the formulas and pay more attention to the detailed example below!)

Let \(x_1, x_2, \ldots, x_n\) be our sample.  (As per the previous note, all we are doing is having the  \(x_1, x_2, \ldots, x_n\) represent numbers.  We could have easily illustrated this with real values such as (1,2,3,4 and 5)

The sample mean is usually denoted by \(\bar{x}\)  (If you are following this correctly, for the values of 1,2,3,4, and 5)\(\bar{x}\)  would be 3!)

\(\bar{x}=\sum_{i=1}^n \dfrac{x_i}{n}=\dfrac{1}{n}\sum_{i=1}^n x_i\)

where n is the sample size and \(x_i\) are the measurements. One may need to use the sample mean to estimate the population mean since usually only a random sample is drawn and we don't know the population mean.

Is this notation confusing you?  Don't let it get to you.  If this is not intuitive focus on the concepts of what the formulas are doing.  (in this example, we are adding all of the numbers (represented by the big squiggly E) and dividing by the total number of observations!

Quite simply, Maria would simply calculate the average number of children per month.

The sample mean (\(\bar{x}\)) is a  statistic and a population mean (\(\mu\)) is a  parameter.

Note on Notation

What if we say we used \(y_i\) for our measurements instead of \(x_i\)? Is this a problem? No. The formula would simply look like this: \(\bar{y}=\sum_{i=1}^n \dfrac{y_i}{n}=\dfrac{1}{n}\sum_{i=1}^n y_i\)

The formulas are exactly the same. The letters that you select to denote the measurements are up to you. For instance, many textbooks use \(y\) instead of \(x\) to denote the measurements. The point is to understand how the calculation that is expressed in the formula works. In this case, the formula is calculating the mean by summing all of the observations and dividing by the number of observations. There is some notation that you will come to see as standards, i.e, n will always equal sample size. We will make a point of letting you know what these are. However, when it comes to the variables, these labels can (and do) vary.

Median

The median is the middle value of the ordered data. Maria might be asked to report the median if she had one or two months with extremely larger or small numbers of children seen at the agency.

The most important step in finding the median is to first order the data from smallest to largest.

Steps to finding the median for a set of data:

  1. Arrange the data in increasing order, i.e. smallest to largest.
  2. Find the location of the median in the ordered data by \(\frac{n+1}{2}\), where n is the sample size.
  3. The value that represents the location found in Step 2 is the median.

Note on Odd or Even Sample Sizes
If the sample size is an odd number then the location point will produce a median that is an observed value. If the sample size is an even number, then the location will require one to take the mean of two numbers to calculate the median. The result may or may not be an observed value as the example below illustrates.

ModeThe mode is the value that occurs most often in the data. It is important to note that there may be more than one mode in the dataset. For Maria, the mode would be the month(s) with the largest number of children seen

Example 1-2: SAT Data

From an SAT data set, we get the following participation rates for the nine South Atlantic states (Region is SA): 74, 79, 65, 75, 71, 74, 64, 73, and 20. In order to find the median we must first rank the data from smallest to largest:

20, 64, 65, 71, 73, 74, 74, 75, 79

To find the middle point we take the number of observations plus one and divide by two. Mathematically this looks like this where n is the number of total observations:

\(\dfrac{n+1}{2}=\dfrac{9+1}{2}=5\)

Returning to the ordered string of data, the fifth observation is 73. Thus the median of this distribution is 73. The interpretation of the median is that 50% of the observations fall at or below this value and 50% fall at or above this value. In this example, this would mean that 50% of the observations are at or below 73 and 50% are at or above 73. If another value was observed, say 88, this would bring the number of observations to ten. Using the formula above to find the middle point would be at 5.5 (10 plus 1 divided by 2). Here we would find the median by taking the average of the fifth and sixth observations which would be the average of 73 and 74. The new median for these ten observations would be 73.5. As you can see, the median value is not always an observed value of the data set.

To find the mean, we simply add all of the numbers and then divide this total by total numbers summed. Mathematically this looks like this where again n is the number of observations:

\(\bar{x}=\dfrac{\sum^n_{i=1}x_i}{n}=\dfrac{74+79+65+75+71+74+64+73+20}{9}=66.11\)

What are the two measures of central tendency?

There are three main measures of central tendency: the mode, the median and the mean. Each of these measures describes a different indication of the typical or central value in the distribution.

What is the mode and with what type of data is it most appropriate?

The mode is the least used of the measures of central tendency and can only be used when dealing with nominal data. For this reason, the mode will be the best measure of central tendency (as it is the only one appropriate to use) when dealing with nominal data.

What are the measures of central tendency and what are they used for?

Measures of central tendency are used to describe what is normal for a set of data. Mean, median, and mode are the three measures of central tendency. The mean and median can only be used for numerical data; however, the mean is more sensitive to outliers than the median.

Which measure of central tendency best describes the data and why?

Mean is generally considered the best measure of central tendency and the most frequently used one. However, there are some situations where the other measures of central tendency are preferred.