# Statistical Analysis with a very small sample size

#1
My data is non-linear, non-monotonic but is normally distributed. I only have data from 9 participants (were expecting around 50). I am testing to see if some vitamins have an impact on cognitive function. I have data from an online food diary and have averaged each pp's vitamin consumption for the days that they recorded their food intake. Additionally, I have their average cognitive scores for those days too. I cannot do a parametric test as my data is not linear and cannot do a non-parametric test as my data it not monotonic. This is due to the small amount of data I assume? I have done scatter plots for all of the variables against each other (e.g. Vit C against standard reaction time etc.), and there is no linearity. Would it be correct to just 'explore' the data and describe it, rather than try and do statistical analysis? and If so - how can I describe/explore the data besides using scatter plots?
0
1 year ago
#2
(Original post by contigo98)
My data is non-linear, non-monotonic but is normally distributed. I only have data from 9 participants (were expecting around 50). I am testing to see if some vitamins have an impact on cognitive function. I have data from an online food diary and have averaged each pp's vitamin consumption for the days that they recorded their food intake. Additionally, I have their average cognitive scores for those days too. I cannot do a parametric test as my data is not linear and cannot do a non-parametric test as my data it not monotonic. This is due to the small amount of data I assume? I have done scatter plots for all of the variables against each other (e.g. Vit C against standard reaction time etc.), and there is no linearity. Would it be correct to just 'explore' the data and describe it, rather than try and do statistical analysis? and If so - how can I describe/explore the data besides using scatter plots?
You're using hypothesis test language, but obviously don't have enough data to really confirm things. You don't state how many variables you have, but as you only have a small number of data, it's likely you'll get spurious associations between some variable pairs. If you just randomly explore different variable pairings, you're almost doomed to find something which appears interesting, but is likely spurious?

How do you know all the relationships are not (approximately) linear and monotonic?
0
#3
(Original post by mqb2766)
You're using hypothesis test language, but obviously don't have enough data to really confirm things. You don't state how many variables you have, but as you only have a small number of data, it's likely you'll get spurious associations between some variable pairs. If you just randomly explore different variable pairings, you're almost doomed to find something which appears interesting, but is likely spurious?

How do you know all the relationships are not (approximately) linear and monotonic?
I have tested them on SPSS, that is how i know they are not (approximately) linear and monotonic. I am cautious to do a statistical test just 'because' as this would not demonstrate in my dissertation that I have understanding of the appropriate use of statistical tests. These are my dependent variables (acronyms for different scores on the cognitive tests such as reaction time and vigilence); SRTM, CRTM and VIGRT and my independent variables are bet-carotene, B12 and folate. So I am looking at the impact of these vitamins on cognitive function.
0
1 year ago
#4
(Original post by contigo98)
I have tested them on SPSS, that is how i know they are not (approximately) linear and monotonic. I am cautious to do a statistical test just 'because' as this would not demonstrate in my dissertation that I have understanding of the appropriate use of statistical tests. These are my dependent variables (acronyms for different scores on the cognitive tests such as reaction time and vigilence); SRTM, CRTM and VIGRT and my independent variables are bet-carotene, B12 and folate. So I am looking at the impact of these vitamins on cognitive function.
depending on how much time you have, Id do some form of statistical analysis, even if it's just to make speculative hypotheses about relationships (or lack of). You have 3*3 possible x-y "correlations", so it's not too much to eyeball. You could also look at the correlation in the independent variables, as well as correlation in the independent variables - do the cognitive test measure different things? Depending on how you've collected the data, you could possibly analyse by subject as well, rather than aggregating their data.

None of this would really get over the lack of data problem, but might "pad out" the analysis section.
0
1 year ago
#5
I only have data from 9 participants (were expecting around 50). I am testing to see if some vitamins have an impact on cognitive function. I have data from an online food diary and have averaged each pp's vitamin consumption for the days that they recorded their food intake. Additionally, I have their average cognitive scores for those days too.
Let me check my understanding of your set-up: you have measured two parameters (vitamin consumption and cognitive score) on those days that the subject actually recorded their food intake. You have then averaged each of these measures per individual subject, and you wish to discern whether there is a relationship between the two averaged measures, and if so, what the form of that relationship is.

My data is non-linear, non-monotonic but is normally distributed.
So, if you plot average cognitive function versus average vitamin intake, you get something that neither looks like it arose from a straight line relationship, nor from a monotonic relationship. When you make a histogram of each variable separately, you get something that looks normal (or doesn’t fail one of the standard tests for normality).

I cannot do a parametric test as my data is not linear and cannot do a non-parametric test as my data it not monotonic.
Telling whether a relationship is derived from a linear function or from a monotone function is rather tricky with so few points. (Remember that ordinary linear regression only requires the regression line to be a straight line, and monotone regression only requires the regression function to be monotone; conditional on these relationships the data itself can look pretty wild!)

This is due to the small amount of data I assume?
The small amount of data (once you’ve done the averaging) is a key problem here. Usually in problems of the relationship between variables we require much more data than this (the rule of thumb in linear regression is at least ten observations per coefficient estimated – and you have two here, intercept and slope. For anything non-linear, much higher).

I have done scatter plots for all of the variables against each other (e.g. Vit C against standard reaction time etc.), and there is no linearity. Would it be correct to just 'explore' the data and describe it, rather than try and do statistical analysis? and If so - how can I describe/explore the data besides using scatter plots?
One thing you might like to explore is to plot the relationship between your two basic variables within each subject (so don’t average them out). If you get some sense of structure within subjects, you might then use a random intercept (or random intercept and slope) model to bring the observations from all subjects together.

However, the most worrying thing to me is that you say you “were expecting around 50” responses. Getting only 9 responses means that there’s been some pretty heavy selection process going on! Is it likely that your 9 observations are in any sense a random sample from the underlying population? I doubt it. This is often a problem with small sample sizes; any inference you make from it is unlikely to generalize to the population.

My strongest piece of advice (I’m sorry to say) is “start again”.
0
#6
(Original post by Gregorius)
Let me check my understanding of your set-up: you have measured two parameters (vitamin consumption and cognitive score) on those days that the subject actually recorded their food intake. You have then averaged each of these measures per individual subject, and you wish to discern whether there is a relationship between the two averaged measures, and if so, what the form of that relationship is.

So, if you plot average cognitive function versus average vitamin intake, you get something that neither looks like it arose from a straight line relationship, nor from a monotonic relationship. When you make a histogram of each variable separately, you get something that looks normal (or doesn’t fail one of the standard tests for normality).

Telling whether a relationship is derived from a linear function or from a monotone function is rather tricky with so few points. (Remember that ordinary linear regression only requires the regression line to be a straight line, and monotone regression only requires the regression function to be monotone; conditional on these relationships the data itself can look pretty wild!)

The small amount of data (once you’ve done the averaging) is a key problem here. Usually in problems of the relationship between variables we require much more data than this (the rule of thumb in linear regression is at least ten observations per coefficient estimated – and you have two here, intercept and slope. For anything non-linear, much higher).

One thing you might like to explore is to plot the relationship between your two basic variables within each subject (so don’t average them out). If you get some sense of structure within subjects, you might then use a random intercept (or random intercept and slope) model to bring the observations from all subjects together.

However, the most worrying thing to me is that you say you “were expecting around 50” responses. Getting only 9 responses means that there’s been some pretty heavy selection process going on! Is it likely that your 9 observations are in any sense a random sample from the underlying population? I doubt it. This is often a problem with small sample sizes; any inference you make from it is unlikely to generalize to the population.

My strongest piece of advice (I’m sorry to say) is “start again”.
Hi,
Thank you so much for your lengthy response.
I am unsure what you mean by plotting the relationship between my to basic variables? What I have done so far is averaged all my 9 participants vitamin intakes for the 3 stated above (so 9 mean values in total) and then averaged their 3 cognitive scores for each of the 3 days. Each pp has 3 average vitamin scores for the 3 days food intake diary and 3 average cognitive test scores for different parts of the cognitive tests (all averaged from the 3 consecutive days of cognitive test). So my data has a set of 9 average values for reach vitamin and each 'section' of the cognitive score.

I have done scatter graphs/plots for each vitamin against each cognitive test score (so 9 in total. e.g. beta-carotene against SRTM, CRTM and VIGRT, and the same for the other 2 vitamins).These scatter graphs/plots do not show linearity.

I would ideally start over, however since this is a dissertation we are constrained on time and are not allowed to re-do our study! more just a case of dealing with our data as best as possible. I have been advised to analyse/explore/describe data as the sample is so small etc but just unsure how to do this without a statistical test!
Last edited by contigo98; 1 year ago
0
X

new posts
Back
to top
Latest
My Feed

### Oops, nobody has postedin the last few hours.

Why not re-start the conversation?

see more

### See more of what you like onThe Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

### Poll

Join the discussion

#### Were exams easier or harder than you expected?

Easier (21)
28.77%
As I expected (22)
30.14%
Harder (26)
35.62%
Something else (tell us in the thread) (4)
5.48%