x Turn on thread page Beta
 You are Here: Home >< Maths

# Showing correlation between two data sets watch

1. I am looking to show the relationship between two datasets traffic flow and accidents, in order to show whether they have some kind of relationship (negative or positive).

I have visualised two graphs...
Accidents against time
Flow against time

I was wondering how I could visualise some relationship between these two data sets?

The main issue I see here is that of course the metrics for quantifying accidents and flow is different. I need to program this calculation in Python, so this a problem.

I have tried having % change as the y axis and time as the x axis. Where all data plots are percentage change relative to the initial time/year plus the percentage change accumulated so far. This way I can have a line for flow and a line for accidents on the same graph.

Hope that made sense Any ideas on how best to approach this would be much appreciated.

Thank you.
2. (Original post by posthumus)
I am looking to show the relationship between two datasets traffic flow and accidents, in order to show whether they have some kind of relationship (negative or positive).

I have visualised two graphs...
Accidents against time
Flow against time

I was wondering how I could visualise some relationship between these two data sets?

The main issue I see here is that of course the metrics for quantifying accidents and flow is different. I need to program this calculation in Python, so this a problem.

I have tried having % change as the y axis and time as the x axis. Where all data plots are percentage change relative to the initial time/year plus the percentage change accumulated so far. This way I can have a line for flow and a line for accidents on the same graph.

Hope that made sense Any ideas on how best to approach this would be much appreciated.

Thank you.
What you have written is a bit vague - so can we have some more details? So, what exactly is the data that you have and how is it recorded?
3. (Original post by Gregorius)
What you have written is a bit vague - so can we have some more details? So, what exactly is the data that you have and how is it recorded?
Hi sorry,

The first data set has the number of accidents per year.

The second data set is a measure of the average daily vehicle count for a particular year.

The years range from 2000 to 2015 for both data sets.

I would like to show the relationship between the two (such as higher average vehicle count correlates to an increase/decrease in accidents), and was wondering how is the best way to show this

Thanks.
4. (Original post by posthumus)
Hi sorry,

The first data set has the number of accidents per year.

The second data set is a measure of the average daily vehicle count for a particular year.

The years range from 2000 to 2015 for both data sets.

I would like to show the relationship between the two (such as higher average vehicle count correlates to an increase/decrease in accidents), and was wondering how is the best way to show this

Thanks.
Let me check - you have 16 data points (years 2000 through 2016) comprising an average daily vehicle count for each year and number of accidents for each year year for a particular region of a particular country?
5. (Original post by Gregorius)
Let me check - you have 16 data points (years 2000 through 2016) comprising an average daily vehicle count for each year and number of accidents for each year year for a particular region of a particular country?
Correct (except no data for 2016)

At the moment I have two separate graphs with 15 points, I'd like to correlate these to provide some analysis/insight.

Thanks
6. (Original post by posthumus)
Correct (except no data for 2016)

At the moment I have two separate graphs with 15 points, I'd like to correlate these to provide some analysis/insight.

Thanks
Then the simplest thing to do is to plot accident rate (y axis) against flow (x axis). Although these do have different units, it sounds as if you are looking for an empirical relationship between these quantities. This will, at least, give you a qualitative feel for the relationship.

If you want to go on and do something statistical then it would be a good idea (if you have access to the underlying data sources) to un-group it - having everything averaged over a year loses a lot of detail; and for data like this there's no reason to suppose that the relationship will be as simple as linear.
7. (Original post by Gregorius)
Then the simplest thing to do is to plot accident rate (y axis) against flow (x axis). Although these do have different units, it sounds as if you are looking for an empirical relationship between these quantities. This will, at least, give you a qualitative feel for the relationship.

If you want to go on and do something statistical then it would be a good idea (if you have access to the underlying data sources) to un-group it - having everything averaged over a year loses a lot of detail; and for data like this there's no reason to suppose that the relationship will be as simple as linear.

If I plot accident rate against flow, and each point would represent a particular year. Will this not take away from the fact that I want to consider the relationship over time?
8. (Original post by posthumus)

If I plot accident rate against flow, and each point would represent a particular year. Will this not take away from the fact that I want to consider the relationship over time?
If you want to show what's happening over time as well, then how about a plot of accident rate versus time, with the plotting point size scaled by the traffic flow rate.

I've attached a couple of examples. In the first, the flow rate and the accident rate are approximately linearly increasing over time (which would probably correspond to what most people would consider a very likely scenario). In the second plot, flow rate reaches a maximum in the middle of the time period and then declines. In both cases accident rate is linear with flow rate. I haven't attempted to pretty these up, but they should give some sort of idea of what can be done.

BTW, Tufte's book "The Visual Display of Quantitative Information" is very useful for this sort of thing.
Attached Images

TSR Support Team

We have a brilliant team of more than 60 Support Team members looking after discussions on The Student Room, helping to make it a fun, safe and useful place to hang out.

This forum is supported by:
Updated: March 29, 2016
Today on TSR

### Loughborough better than Cambridge

Loughborough at number one

Poll
Useful resources

### Maths Forum posting guidelines

Not sure where to post? Read the updated guidelines here

### How to use LaTex

Writing equations the easy way

### Study habits of A* students

Top tips from students who have already aced their exams