Showing correlation between two data sets Watch

posthumus
Badges: 20
Rep:
?
#1
Report Thread starter 2 years ago
#1
I am looking to show the relationship between two datasets traffic flow and accidents, in order to show whether they have some kind of relationship (negative or positive).

I have visualised two graphs...
Accidents against time
Flow against time

I was wondering how I could visualise some relationship between these two data sets?

The main issue I see here is that of course the metrics for quantifying accidents and flow is different. I need to program this calculation in Python, so this a problem.

I have tried having % change as the y axis and time as the x axis. Where all data plots are percentage change relative to the initial time/year plus the percentage change accumulated so far. This way I can have a line for flow and a line for accidents on the same graph.

Hope that made sense Any ideas on how best to approach this would be much appreciated.

Thank you.
0
quote
reply
Gregorius
Badges: 14
Rep:
?
#2
Report 2 years ago
#2
(Original post by posthumus)
I am looking to show the relationship between two datasets traffic flow and accidents, in order to show whether they have some kind of relationship (negative or positive).

I have visualised two graphs...
Accidents against time
Flow against time

I was wondering how I could visualise some relationship between these two data sets?

The main issue I see here is that of course the metrics for quantifying accidents and flow is different. I need to program this calculation in Python, so this a problem.

I have tried having % change as the y axis and time as the x axis. Where all data plots are percentage change relative to the initial time/year plus the percentage change accumulated so far. This way I can have a line for flow and a line for accidents on the same graph.

Hope that made sense Any ideas on how best to approach this would be much appreciated.

Thank you.
What you have written is a bit vague - so can we have some more details? So, what exactly is the data that you have and how is it recorded?
0
quote
reply
posthumus
Badges: 20
Rep:
?
#3
Report Thread starter 2 years ago
#3
(Original post by Gregorius)
What you have written is a bit vague - so can we have some more details? So, what exactly is the data that you have and how is it recorded?
Hi sorry,

The first data set has the number of accidents per year.

The second data set is a measure of the average daily vehicle count for a particular year.

The years range from 2000 to 2015 for both data sets.

I would like to show the relationship between the two (such as higher average vehicle count correlates to an increase/decrease in accidents), and was wondering how is the best way to show this

Thanks.
0
quote
reply
Gregorius
Badges: 14
Rep:
?
#4
Report 2 years ago
#4
(Original post by posthumus)
Hi sorry,

The first data set has the number of accidents per year.

The second data set is a measure of the average daily vehicle count for a particular year.

The years range from 2000 to 2015 for both data sets.

I would like to show the relationship between the two (such as higher average vehicle count correlates to an increase/decrease in accidents), and was wondering how is the best way to show this

Thanks.
Let me check - you have 16 data points (years 2000 through 2016) comprising an average daily vehicle count for each year and number of accidents for each year year for a particular region of a particular country?
0
quote
reply
posthumus
Badges: 20
Rep:
?
#5
Report Thread starter 2 years ago
#5
(Original post by Gregorius)
Let me check - you have 16 data points (years 2000 through 2016) comprising an average daily vehicle count for each year and number of accidents for each year year for a particular region of a particular country?
Correct (except no data for 2016)

At the moment I have two separate graphs with 15 points, I'd like to correlate these to provide some analysis/insight.

Thanks
0
quote
reply
Gregorius
Badges: 14
Rep:
?
#6
Report 2 years ago
#6
(Original post by posthumus)
Correct (except no data for 2016)

At the moment I have two separate graphs with 15 points, I'd like to correlate these to provide some analysis/insight.

Thanks
Then the simplest thing to do is to plot accident rate (y axis) against flow (x axis). Although these do have different units, it sounds as if you are looking for an empirical relationship between these quantities. This will, at least, give you a qualitative feel for the relationship.

If you want to go on and do something statistical then it would be a good idea (if you have access to the underlying data sources) to un-group it - having everything averaged over a year loses a lot of detail; and for data like this there's no reason to suppose that the relationship will be as simple as linear.
1
quote
reply
posthumus
Badges: 20
Rep:
?
#7
Report Thread starter 2 years ago
#7
(Original post by Gregorius)
Then the simplest thing to do is to plot accident rate (y axis) against flow (x axis). Although these do have different units, it sounds as if you are looking for an empirical relationship between these quantities. This will, at least, give you a qualitative feel for the relationship.

If you want to go on and do something statistical then it would be a good idea (if you have access to the underlying data sources) to un-group it - having everything averaged over a year loses a lot of detail; and for data like this there's no reason to suppose that the relationship will be as simple as linear.
Thank you for your response.

If I plot accident rate against flow, and each point would represent a particular year. Will this not take away from the fact that I want to consider the relationship over time?
0
quote
reply
Gregorius
Badges: 14
Rep:
?
#8
Report 2 years ago
#8
(Original post by posthumus)
Thank you for your response.

If I plot accident rate against flow, and each point would represent a particular year. Will this not take away from the fact that I want to consider the relationship over time?
If you want to show what's happening over time as well, then how about a plot of accident rate versus time, with the plotting point size scaled by the traffic flow rate.

I've attached a couple of examples. In the first, the flow rate and the accident rate are approximately linearly increasing over time (which would probably correspond to what most people would consider a very likely scenario). In the second plot, flow rate reaches a maximum in the middle of the time period and then declines. In both cases accident rate is linear with flow rate. I haven't attempted to pretty these up, but they should give some sort of idea of what can be done.

BTW, Tufte's book "The Visual Display of Quantitative Information" is very useful for this sort of thing.
Attached files
0
quote
reply
X

Reply to thread

Attached files
Write a reply...
Reply
new posts
Latest
My Feed

See more of what you like on
The Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

Personalise

University open days

  • University of Warwick
    Undergraduate Open Days Undergraduate
    Sat, 20 Oct '18
  • University of Sheffield
    Undergraduate Open Days Undergraduate
    Sat, 20 Oct '18
  • Edge Hill University
    Faculty of Health and Social Care Undergraduate
    Sat, 20 Oct '18

Who is most responsible for your success at university

Mostly me (765)
90.43%
Mostly my university including my lecturers/tutors (81)
9.57%

Watched Threads

View All