# Showing correlation between two data setsWatch

#1
I am looking to show the relationship between two datasets traffic flow and accidents, in order to show whether they have some kind of relationship (negative or positive).

I have visualised two graphs...
Accidents against time
Flow against time

I was wondering how I could visualise some relationship between these two data sets?

The main issue I see here is that of course the metrics for quantifying accidents and flow is different. I need to program this calculation in Python, so this a problem.

I have tried having % change as the y axis and time as the x axis. Where all data plots are percentage change relative to the initial time/year plus the percentage change accumulated so far. This way I can have a line for flow and a line for accidents on the same graph.

Hope that made sense Any ideas on how best to approach this would be much appreciated.

Thank you.
0
quote
2 years ago
#2
(Original post by posthumus)
I am looking to show the relationship between two datasets traffic flow and accidents, in order to show whether they have some kind of relationship (negative or positive).

I have visualised two graphs...
Accidents against time
Flow against time

I was wondering how I could visualise some relationship between these two data sets?

The main issue I see here is that of course the metrics for quantifying accidents and flow is different. I need to program this calculation in Python, so this a problem.

I have tried having % change as the y axis and time as the x axis. Where all data plots are percentage change relative to the initial time/year plus the percentage change accumulated so far. This way I can have a line for flow and a line for accidents on the same graph.

Hope that made sense Any ideas on how best to approach this would be much appreciated.

Thank you.
What you have written is a bit vague - so can we have some more details? So, what exactly is the data that you have and how is it recorded?
0
quote
#3
(Original post by Gregorius)
What you have written is a bit vague - so can we have some more details? So, what exactly is the data that you have and how is it recorded?
Hi sorry,

The first data set has the number of accidents per year.

The second data set is a measure of the average daily vehicle count for a particular year.

The years range from 2000 to 2015 for both data sets.

I would like to show the relationship between the two (such as higher average vehicle count correlates to an increase/decrease in accidents), and was wondering how is the best way to show this

Thanks.
0
quote
2 years ago
#4
(Original post by posthumus)
Hi sorry,

The first data set has the number of accidents per year.

The second data set is a measure of the average daily vehicle count for a particular year.

The years range from 2000 to 2015 for both data sets.

I would like to show the relationship between the two (such as higher average vehicle count correlates to an increase/decrease in accidents), and was wondering how is the best way to show this

Thanks.
Let me check - you have 16 data points (years 2000 through 2016) comprising an average daily vehicle count for each year and number of accidents for each year year for a particular region of a particular country?
0
quote
#5
(Original post by Gregorius)
Let me check - you have 16 data points (years 2000 through 2016) comprising an average daily vehicle count for each year and number of accidents for each year year for a particular region of a particular country?
Correct (except no data for 2016)

At the moment I have two separate graphs with 15 points, I'd like to correlate these to provide some analysis/insight.

Thanks
0
quote
2 years ago
#6
(Original post by posthumus)
Correct (except no data for 2016)

At the moment I have two separate graphs with 15 points, I'd like to correlate these to provide some analysis/insight.

Thanks
Then the simplest thing to do is to plot accident rate (y axis) against flow (x axis). Although these do have different units, it sounds as if you are looking for an empirical relationship between these quantities. This will, at least, give you a qualitative feel for the relationship.

If you want to go on and do something statistical then it would be a good idea (if you have access to the underlying data sources) to un-group it - having everything averaged over a year loses a lot of detail; and for data like this there's no reason to suppose that the relationship will be as simple as linear.
1
quote
#7
(Original post by Gregorius)
Then the simplest thing to do is to plot accident rate (y axis) against flow (x axis). Although these do have different units, it sounds as if you are looking for an empirical relationship between these quantities. This will, at least, give you a qualitative feel for the relationship.

If you want to go on and do something statistical then it would be a good idea (if you have access to the underlying data sources) to un-group it - having everything averaged over a year loses a lot of detail; and for data like this there's no reason to suppose that the relationship will be as simple as linear.

If I plot accident rate against flow, and each point would represent a particular year. Will this not take away from the fact that I want to consider the relationship over time?
0
quote
2 years ago
#8
(Original post by posthumus)

If I plot accident rate against flow, and each point would represent a particular year. Will this not take away from the fact that I want to consider the relationship over time?
If you want to show what's happening over time as well, then how about a plot of accident rate versus time, with the plotting point size scaled by the traffic flow rate.

I've attached a couple of examples. In the first, the flow rate and the accident rate are approximately linearly increasing over time (which would probably correspond to what most people would consider a very likely scenario). In the second plot, flow rate reaches a maximum in the middle of the time period and then declines. In both cases accident rate is linear with flow rate. I haven't attempted to pretty these up, but they should give some sort of idea of what can be done.

BTW, Tufte's book "The Visual Display of Quantitative Information" is very useful for this sort of thing.
0
quote
X

new posts
Latest
My Feed

### Oops, nobody has postedin the last few hours.

Why not re-start the conversation?

see more

### See more of what you like onThe Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

### University open days

• University of Warwick
Sat, 20 Oct '18
• University of Sheffield
Sat, 20 Oct '18
• Edge Hill University
Faculty of Health and Social Care Undergraduate
Sat, 20 Oct '18

### Poll

Join the discussion

#### Who is most responsible for your success at university

Mostly me (765)
90.43%
Mostly my university including my lecturers/tutors (81)
9.57%