Box

A box plot is based on a dataset’s quartiles, or the values that divide the dataset into 4 equal quarters. 

  • The first quartile (Q1) 
    • exactly 25% of the elements that are less than the first quartile and exactly 75% of the elements that are greater than it. 
  • The second quartile (Q2) 
    • sits in the middle, dividing the data in half. Q2 is also known as the median. 
  • The third quartile (Q3) 
    • exactly 75% of the elements that are less than the third quartile and 25% of the elements that are greater than it.  

In a box and whiskers plot, the ends of the box and its center line mark the locations of these three quartiles.

Diagram showing how box and whiskers are derived from a set of data.

Whiskers

The distance between Q3 and Q1 is known as the interquartile range (IQR) and plays a major part in how long the whiskers extending from the box are. Each whisker extends to the furthest data point in each wing that is within 1.5 times the IQR. 


Outliers

Any data point further than that distance is considered an outlier, and is marked with a dot. 


When a data distribution is symmetric, you can expect the median to be in the exact center of the box: the distance between Q1 and Q2 should be the same as between Q2 and Q3. Outliers should be evenly present on either side of the box. If a distribution is skewed, then the median will not be in the middle of the box, and instead off to the side. You may also find an imbalance in the whisker lengths, where one side is short with no outliers, and the other has a long tail with many more outliers.

Data shape can affect the way a box and whiskers plot looks.

Source:

https://en.wikipedia.org/wiki/Box_plot

https://chartio.com/learn/charts/box-plot-complete-guide/#interpreting-a-box-and-whiskers