Comparing Data Sets
Comparing data sets involves examining both the central tendency (where most values cluster) and the spread (how much values vary) of two or more collections of numerical data. The process requires calculating measures like mean, median, and range to determine which set has higher typical values and which shows more consistent behaviour. This fundamental statistical skill appears in Year 8 of the UK National Curriculum when learners describe mathematical relationships between variables using scatter graphs.
Why it matters
Comparing data sets underpins critical decisions across industries and research. Medical researchers compare treatment outcomes between patient groups, with one drug showing a mean recovery time of 8 days (range 3-12) versus another at 10 days (range 2-18). Sports analysts evaluate player performance consistency — a footballer averaging 2.3 goals per match with minimal variation versus another with the same average but erratic scoring patterns. Businesses compare customer satisfaction scores across different branches, identifying locations with both higher ratings and more reliable service quality. Marketing teams analyse conversion rates between website designs, seeking options that perform better and more predictably. Quality control departments compare manufacturing batches, prioritising processes that produce items closer to target specifications. These comparisons inform strategic decisions worth millions of pounds, making statistical literacy essential for GCSE students progressing to A-Level Mathematics and beyond.
How to solve comparing data sets
Comparing Data Sets
- Compare averages (mean, median) to see which set is 'higher'.
- Compare spread (range, IQR) to see which set is more consistent.
- Use the same type of average for a fair comparison.
- Back up comparisons with specific values.
Example: Set A: median 12, range 8. Set B: median 15, range 3 → B is higher and more consistent.
Worked examples
Two factories produce widgets. Factory A averages 11 per day, Factory B averages 18 per day. Which produces more?
Answer: Factory B
- Compare the means → 18 > 11 — Factory B's average (18) is greater than Factory A's average (11).
Team A scores: 10, 10, 5, 7, 8 (mean=8). Team B scores: 8, 8, 8, 8, 8 (mean=8). Which is more consistent?
Answer: Team B
- Compare the spread → The second set has no variation (all values equal) — All values in the second set are the same, meaning zero spread.
- Conclusion → Team B is more consistent — Less spread means more consistency.
Compare ranges: Set A = {5, 7, 10, 11} range=6, Set B = {7, 8, 9, 11} range=4. Which is more spread out?
Answer: Set A
- Compare the ranges → Range A = 6, Range B = 4 — Range A (6) > Range B (4).
- Conclusion → Set A is more spread out — A larger range means more spread.
Common mistakes
- Comparing only averages whilst ignoring spread — concluding that two football teams are equally good when both average 2 goals per match, despite one team scoring consistently (2, 2, 2, 2) and another erratically (0, 1, 3, 4).
- Using different measures for comparison — stating that Set A (median 15) outperforms Set B (mean 12) without recognising that median and mean measure central tendency differently.
- Misinterpreting range direction — claiming that a temperature range of 8°C indicates more consistency than a range of 12°C, when larger ranges actually show greater variation.