Hey there! Sign in to join this conversationNew here? Join for free
x Turn on thread page Beta

Showing correlation between two data sets watch

    • Thread Starter
    Offline

    18
    ReputationRep:
    I am looking to show the relationship between two datasets traffic flow and accidents, in order to show whether they have some kind of relationship (negative or positive).

    I have visualised two graphs...
    Accidents against time
    Flow against time

    I was wondering how I could visualise some relationship between these two data sets?

    The main issue I see here is that of course the metrics for quantifying accidents and flow is different. I need to program this calculation in Python, so this a problem.

    I have tried having % change as the y axis and time as the x axis. Where all data plots are percentage change relative to the initial time/year plus the percentage change accumulated so far. This way I can have a line for flow and a line for accidents on the same graph.

    Hope that made sense Any ideas on how best to approach this would be much appreciated.

    Thank you.
    Offline

    13
    ReputationRep:
    (Original post by posthumus)
    I am looking to show the relationship between two datasets traffic flow and accidents, in order to show whether they have some kind of relationship (negative or positive).

    I have visualised two graphs...
    Accidents against time
    Flow against time

    I was wondering how I could visualise some relationship between these two data sets?

    The main issue I see here is that of course the metrics for quantifying accidents and flow is different. I need to program this calculation in Python, so this a problem.

    I have tried having % change as the y axis and time as the x axis. Where all data plots are percentage change relative to the initial time/year plus the percentage change accumulated so far. This way I can have a line for flow and a line for accidents on the same graph.

    Hope that made sense Any ideas on how best to approach this would be much appreciated.

    Thank you.
    What you have written is a bit vague - so can we have some more details? So, what exactly is the data that you have and how is it recorded?
    • Thread Starter
    Offline

    18
    ReputationRep:
    (Original post by Gregorius)
    What you have written is a bit vague - so can we have some more details? So, what exactly is the data that you have and how is it recorded?
    Hi sorry,

    The first data set has the number of accidents per year.

    The second data set is a measure of the average daily vehicle count for a particular year.

    The years range from 2000 to 2015 for both data sets.

    I would like to show the relationship between the two (such as higher average vehicle count correlates to an increase/decrease in accidents), and was wondering how is the best way to show this

    Thanks.
    Offline

    13
    ReputationRep:
    (Original post by posthumus)
    Hi sorry,

    The first data set has the number of accidents per year.

    The second data set is a measure of the average daily vehicle count for a particular year.

    The years range from 2000 to 2015 for both data sets.

    I would like to show the relationship between the two (such as higher average vehicle count correlates to an increase/decrease in accidents), and was wondering how is the best way to show this

    Thanks.
    Let me check - you have 16 data points (years 2000 through 2016) comprising an average daily vehicle count for each year and number of accidents for each year year for a particular region of a particular country?
    • Thread Starter
    Offline

    18
    ReputationRep:
    (Original post by Gregorius)
    Let me check - you have 16 data points (years 2000 through 2016) comprising an average daily vehicle count for each year and number of accidents for each year year for a particular region of a particular country?
    Correct (except no data for 2016)

    At the moment I have two separate graphs with 15 points, I'd like to correlate these to provide some analysis/insight.

    Thanks
    Offline

    13
    ReputationRep:
    (Original post by posthumus)
    Correct (except no data for 2016)

    At the moment I have two separate graphs with 15 points, I'd like to correlate these to provide some analysis/insight.

    Thanks
    Then the simplest thing to do is to plot accident rate (y axis) against flow (x axis). Although these do have different units, it sounds as if you are looking for an empirical relationship between these quantities. This will, at least, give you a qualitative feel for the relationship.

    If you want to go on and do something statistical then it would be a good idea (if you have access to the underlying data sources) to un-group it - having everything averaged over a year loses a lot of detail; and for data like this there's no reason to suppose that the relationship will be as simple as linear.
    • Thread Starter
    Offline

    18
    ReputationRep:
    (Original post by Gregorius)
    Then the simplest thing to do is to plot accident rate (y axis) against flow (x axis). Although these do have different units, it sounds as if you are looking for an empirical relationship between these quantities. This will, at least, give you a qualitative feel for the relationship.

    If you want to go on and do something statistical then it would be a good idea (if you have access to the underlying data sources) to un-group it - having everything averaged over a year loses a lot of detail; and for data like this there's no reason to suppose that the relationship will be as simple as linear.
    Thank you for your response.

    If I plot accident rate against flow, and each point would represent a particular year. Will this not take away from the fact that I want to consider the relationship over time?
    Offline

    13
    ReputationRep:
    (Original post by posthumus)
    Thank you for your response.

    If I plot accident rate against flow, and each point would represent a particular year. Will this not take away from the fact that I want to consider the relationship over time?
    If you want to show what's happening over time as well, then how about a plot of accident rate versus time, with the plotting point size scaled by the traffic flow rate.

    I've attached a couple of examples. In the first, the flow rate and the accident rate are approximately linearly increasing over time (which would probably correspond to what most people would consider a very likely scenario). In the second plot, flow rate reaches a maximum in the middle of the time period and then declines. In both cases accident rate is linear with flow rate. I haven't attempted to pretty these up, but they should give some sort of idea of what can be done.

    BTW, Tufte's book "The Visual Display of Quantitative Information" is very useful for this sort of thing.
    Attached Images
      
 
 
 
Reply
Submit reply
Turn on thread page Beta
Updated: March 29, 2016
Poll
Do you agree with the proposed ban on plastic straws and cotton buds?
Useful resources

Make your revision easier

Maths

Maths Forum posting guidelines

Not sure where to post? Read the updated guidelines here

Equations

How to use LaTex

Writing equations the easy way

Student revising

Study habits of A* students

Top tips from students who have already aced their exams

Study Planner

Create your own Study Planner

Never miss a deadline again

Polling station sign

Thinking about a maths degree?

Chat with other maths applicants

Can you help? Study help unanswered threads

Groups associated with this forum:

View associated groups

The Student Room, Get Revising and Marked by Teachers are trading names of The Student Room Group Ltd.

Register Number: 04666380 (England and Wales), VAT No. 806 8067 22 Registered Office: International House, Queens Road, Brighton, BN1 3XE

Write a reply...
Reply
Hide
Reputation gems: You get these gems as you gain rep from other members for making good contributions and giving helpful advice.