The normal distribution is one of the most widely used probability distributions in statistics. It is symmetric and bell-shaped, characterized by its mean (average) and standard deviation (spread). For a normal distribution, data follows the 68-95-99.7 rule, meaning approximately 68% of data falls within one standard deviation of the mean, 95% within two, and 99.7% within three.
When identifying potential outliers in a normal distribution, we're looking for data points lying beyond three standard deviations from the mean. This is because such points are rare (less than 0.3% probability) and appear on the extreme tails of the distribution. To find the probability of an observation being an outlier, calculate the probability of a data point being beyond these limits using the cumulative distribution function (CDF).
In practice, this involves computing:
- \(P(Z > 3 \text{ or } Z < -3) = 1 - (F(3) - F(-3))\)
- The CDF, \(F\), captures the probability that a random variable takes a value less than or equal to \(x\).
- Using statistical software or a standard z-table helps find these probabilities efficiently.