What is a Normal Distribution in Statistics?
A normal distribution, also known as the bell curve or Gaussian distribution, is a hypothetical symmetrical distribution that explains how values of variables are distributed and makes comparisons among scores. The data in a normal distribution is symmetrically distributed, with most values clustering around the central region. A normal distribution is a continuous probability distribution and deals with continuous variables or data, such as height, IQ scores, and weight.
Properties of a Normal Distribution
A normal distribution has two main parameters; the mean and standard deviation. The mean is used by statisticians as a measure of central tendency to determine the location of the peak for the normal distribution. The standard deviation is used as a measure of dispersion; it determines the width of the distribution. The mean, median, and mode have the same value, and half the data falls below the mean, while the other half falls above the mean because of the symmetric nature of the distribution.
A normal distribution follows an empirical rule which helps in determining the percentage of values that fall within specific distances from the mean. The empirical rule states that 99.7% of data falls within three (+3 and -3) standard deviations of the mean, 95% of the data falls within two (+2 and -2) standard deviations of the mean, and 68% of the data falls within one (+1 and -1) standard deviation of the mean.
Standard Normal Distribution
A normal distribution can be standardized with a mean of 0 and standard deviation of 1 to become a standard normal distribution, also known as the Z- distribution. To standardize a normal distribution, the values or data in the normal distribution are converted to z- scores using the formula; z = where Z is the z score, μ is the mean, x represents the values or data in the normal distribution, and sigma is the standard deviation. When the area under the curve is scaled to a total of 1, the distribution becomes a pdf rather than a frequency distribution. Normal distributions are converted to standard normal distribution to compare the values with different means and standard deviations and find the probabilities across different populations.
Skewness and Kurtosis
Skewness measures the lack of symmetry in a normal distribution where most values are gathered at one side of the scale, while kurtosis measures the pointiness or thickness of the tail ends in a distribution. In a normal distribution, skewness and kurtosis have a value of zero, but as the values deviate from the normal curve, the values move above and below zero. A distribution is positively skewed when the values cluster at the lower end with the tail at the higher positive end. Positive kurtosis or leptokurtic is when the values cluster at the tail, while platykurtic or negative kurtosis is when the tails are flatter or thinner than in normal distribution.
Test for Normality
The Shapiro-Wilk and the Kolmogorov- Smirnov tests are used in statistics to check if the data follows a normal distribution. In most cases, the tests are not able to check for normality when the sample data is small; therefore, a larger sample is used to increase the normality of distribution. A significant p-value (p < .05) from these tests indicates that data does not follow a normal distribution. If the values follow a normal distribution, parametric methods are used for data analysis, but if the data is non-normal, it is converted using logarithmic transformation.
Summary
A normal distribution is widely used in real-life situations like comparing weight, height, test scores, and IQ scores because it is a continuous probability distribution that can take on any value. The mean and standard deviation determine the location of the peak and width of the distribution, therefore, the symmetric nature of the distribution curve. It is important to test for normality of the data using the Shapiro-Wilk and the Kolmogorov- Smirnov tests because not all data follow a normal distribution.