Hello, I have been revising my understanding of statistics specifically bivariate data and the AQA A Level Mathematics Large Data Set. I have found the question below which I am hopelessly struggling with.
I have attached a copy of the data set here.
Using the data for the purchased quantities of food in the East Midlands from the LDS, plot a scatter diagram to investigate any correlation between purchased quantities of butter and margarine using data from 2006-2014.
Plot butter on the x-axis. Give your conclusions, including comments on the suitability of data.
So I have attempted to plot the scatter diagram. My first query is does the question intend for you to include both subsets of data on one axis, (which I have plotted on the x-axis) or rather does it demand two separate diagrams to investigate if there is any correlation, or a single diagram? I understand that in a scatter diagram the independent variable is plotted on the x-axis and the dependent variable on the y-axis. Since I did not think that either purchased quantities of butter nor margarine are dependent on each other I took them both to be independent variables and plot them on the x-axis.
Moreover, I have attempted to draw regression lines for each data set (lines of best fit) to better evaluate the distribution of the data, but do not think that I have done so accurately enough.
In the first diagram I have attached I believe that of the purchased quantities of butter the variables increase together thereby exhibiting a positive correlation, and presumably the correlation coefficient r has a positive value where r >0.
Similarly, of the purchased quantities of margarine the variables predominantly increase together and exhibit a positive correlation, however, this correlation is not as strong as for margarine and does include more outlying data points.
Moreover, the purchased quantities of butter continue to exceed that of margarine between 2006-2014 although there is a notable decline in the purchase of butter in 2009.
In terms of the suitability of the data, I believe that it is an extensive sample as it is divided among the regions of the UK and further collected to calculate an average for the purchased quantities per week, which is a very regular basis as opposed to say a month or year. The data extends from 2006-2014 which is a moderate time period to evaluate any changes in trend, although this could be extended from an earlier date to evaluate previous purchased quantities to broaden the data set and search for any further outliers. In this sense, the data is limited as it only concerns purchased quantities from 2006-2014 and excludes any previous or contemporary data.
However, would I instead plot the purchased quantities of margarine on the x-axis and thus the remaining variable, the purchased quantities of butter on the y-axis since a scatter diagram intends to show each pair of data values as a single point on the graph and to exhibit the type and strength of relationship between the two variables.
I have also done so and attached the graph here. In which case, I believe that the two variables are shown to be exhibit moderate positive correlation, especially discernible for the latter three points on the diagram. However, would it be more suitable to state that as a whole the bivariate data is uncorrelated and has zero correlation, i.e. a value of 0 for the correlation coefficient r=0?
I really want to improve upon my plotting of scatter diagrams and interpretation of data. How could I correct or improve upon my answer here, clearly I am rather confused but I am trying to comprehensively evaluate the information given. I would be very grateful of any response.