Back to RStudio and Project Skills

Higher Applications of Mathematics

Scatter plots and correlation in RStudio

Graphing and measuring relationships.

Before you start

  • Know the statistical question you are trying to answer.
  • Check that variables are named clearly and measured in suitable units.
  • Be ready to write an interpretation, not just copy RStudio output.

Method chooser

Which RStudio method do I use?

RStudio lesson

Key idea

  • This topic focuses on examining relationships between two numerical variables. In Higher Applications, RStudio is used as a practical tool to calculate, graph and test ideas from data.
  • A strong answer shows what command was used, what output was produced, and what that output means in context. Use careful language such as 'This suggests...' or 'There is evidence to suggest...'.

Key commands and skills

  • plot(x, y); cor(x, y)
  • names(data)
  • summary(data)
  • Use comments in scripts with # to explain your steps.

Technology output practice

Output interpretation preview

Read the simulated output, pick out the key value, then turn it into a written conclusion. This is a learning preview, not a real RStudio environment.

Context

Correlation output

A pupil investigates the relationship between weekly revision hours and assessment score.

Simulated output

> cor(study$hours, study$score)
[1] 0.82

r

0.82

A strong positive correlation.

Direction

Positive

Higher revision hours tended to go with higher scores.

Limit

Association

Correlation alone does not prove that one variable caused the other.

What it means

The value is close to 1, so the association is strong and positive. The context still matters: other factors may also affect score.

What to write

There is a strong positive relationship between weekly revision hours and assessment score. Pupils who revised for longer tended to score higher, but this does not prove revision time was the only cause.

Weak answer: The correlation is 0.82.

Watch out

Many pupils stop after copying r. Add direction, strength and context, and avoid claiming proof of cause.

Which conclusion is best?

Choose an option, then check the feedback.

Worked examples

Walkthrough 1

Run the command

A pupil is using RStudio for examining relationships between two numerical variables with a small school-friendly data set.

  1. Load or identify the data frame.
  2. Check the exact column names with names(data).
  3. Run the key command: plot(x, y); cor(x, y)

The output should be checked against the variables and the original question.

Walkthrough 2

Read the output

RStudio has produced numerical or graphical output.

  1. Find the key value, graph feature or p-value.
  2. Check the unit and variable name.
  3. Avoid copying every line of output into the conclusion.

Correlation describes strength and direction, but does not prove causation.

Walkthrough 3

Write the interpretation

The result must be used in a project conclusion.

  1. Start with a cautious phrase such as 'This suggests...'.
  2. Refer to the context and variables.
  3. Mention a limitation if the data set is small, biased or observational.

The conclusion should be clear, cautious and linked to evidence.

Watch out

  • Misspelling a data frame or column name.
  • Forgetting brackets or quotation marks in a command.
  • Copying output without explaining what it means.
  • Claiming causation from correlation.
  • Using strong language such as 'proves' when the data only suggests evidence.

Statistics connection

Related Statistics topics

Next step

Move into practice

Use the learning notes to read output tables carefully, then try varied summary, correlation, regression and test-output interpretation.

RStudio mixed quiz