I am doing a research trying to find if there's a correlation between twitter sentiments and sales and I'm doing it on two different companies. They are in the same industry and direct competitors. The time span is quarterly from Q1 2014 to Q4 2021 (28 data points each). I found the quarterly sales numbers easily because they are public companies. For twitter, I collected the tweets with Twitter Premium API v1.1 full archive search. I will not say the method of sentiment analysis.

My H0 is "there is no correlation between twitter sentiments and sales" and my H1 is "there i correlation between twitter sentiments and sales".

For company A, my p-value is < 0.05 and for company B, my p-value is > 0.05. How should I write the conclusion when I reject the H0 for one company and fail to reject the H0 for the other?

Thank you.

My H0 is "there is no correlation between twitter sentiments and sales" and my H1 is "there i correlation between twitter sentiments and sales".

For company A, my p-value is < 0.05 and for company B, my p-value is > 0.05. How should I write the conclusion when I reject the H0 for one company and fail to reject the H0 for the other?

Thank you.

Is this a university project, or ....

Must admit, Id be very wary about reading too much into any relatively simple analysis on 28 data points given the number of variables that would affect sales and trying to correlate sales change to twitter posts is going to be problematic at best. Your data could easily be affected by

* it includes the covid pandemic time at the end

* if the companies are in competition, does one sales affect the other

* how would you recognise an A company verses a B company if they give different results

* ....

never mind about all the other explanatory variables for real world data which could completely change the meaning you read into a correlation (conditional dependence/independence).

Must admit, Id be very wary about reading too much into any relatively simple analysis on 28 data points given the number of variables that would affect sales and trying to correlate sales change to twitter posts is going to be problematic at best. Your data could easily be affected by

* it includes the covid pandemic time at the end

* if the companies are in competition, does one sales affect the other

* how would you recognise an A company verses a B company if they give different results

* ....

never mind about all the other explanatory variables for real world data which could completely change the meaning you read into a correlation (conditional dependence/independence).

(edited 1 year ago)

Original post by reyjusuf

Yes, it's a dissertation.

Can you provide any more info regarding what you did / how you resolved problems like the above? Even a scatter plot of the data would be useful.

(edited 1 year ago)

Original post by mqb2766

Can you provide any more info regarding what you did / how you resolved problems like the above? Even a scatter plot of the data would be useful.

Basically for each company I did the following:

Collect quarterly sales data from their websites

Collect tweets from the corresponding quarters (28 times) using hashtags #(name of the company) as the keyword. I used the Twitter full_archive_search Premium API https://developer.twitter.com/en/docs/twitter-api/premium/search-api/quick-start/premium-full-archive.

I performed sentiment analysis using Multinomial NB (I previosly tested multiple methods and I found Multinomial NB to have the highest accuracy) to classify the tweets as positive, negative and neutral.

For each quarter, I counted the number of positive, negative and neutral tweets and scored them using the formula (Positive - Negative)/(Positive + Negative + Neutral). This number is the one I'm correlate with sales and I used Excel's CORREL function to find the correlation coefficient.

For company 1, it's 0.39

For company 2, it's 0.24.

I needed to see if it's significant or not so I found the p-value using this:

https://opentextbc.ca/introstatopenstax/chapter/testing-the-significance-of-the-correlation-coefficient/

Since I have 28 data, my degrees of freedom in the t-table is 26. my t-value needs to be above 1.706 (one tailed test with 95% confidence level).

For company 1, t is 2.18 and for company 2, t is 1.27.

Why one-tailed?

Because I want to test if twitter sentiments have a POSITIVE correlation to sales. (negative correlation has no practical use).

Original post by reyjusuf

Basically for each company I did the following:

Collect quarterly sales data from their websites

Collect tweets from the corresponding quarters (28 times) using hashtags #(name of the company) as the keyword. I used the Twitter full_archive_search Premium API https://developer.twitter.com/en/docs/twitter-api/premium/search-api/quick-start/premium-full-archive.

I performed sentiment analysis using Multinomial NB (I previosly tested multiple methods and I found Multinomial NB to have the highest accuracy) to classify the tweets as positive, negative and neutral.

For each quarter, I counted the number of positive, negative and neutral tweets and scored them using the formula (Positive - Negative)/(Positive + Negative + Neutral). This number is the one I'm correlate with sales and I used Excel's CORREL function to find the correlation coefficient.

For company 1, it's 0.39

For company 2, it's 0.24.

I needed to see if it's significant or not so I found the p-value using this:

https://opentextbc.ca/introstatopenstax/chapter/testing-the-significance-of-the-correlation-coefficient/

Since I have 28 data, my degrees of freedom in the t-table is 26. my t-value needs to be above 1.706 (one tailed test with 95% confidence level).

For company 1, t is 2.18 and for company 2, t is 1.27.

Why one-tailed?

Because I want to test if twitter sentiments have a POSITIVE correlation to sales. (negative correlation has no practical use).

Collect quarterly sales data from their websites

Collect tweets from the corresponding quarters (28 times) using hashtags #(name of the company) as the keyword. I used the Twitter full_archive_search Premium API https://developer.twitter.com/en/docs/twitter-api/premium/search-api/quick-start/premium-full-archive.

I performed sentiment analysis using Multinomial NB (I previosly tested multiple methods and I found Multinomial NB to have the highest accuracy) to classify the tweets as positive, negative and neutral.

For each quarter, I counted the number of positive, negative and neutral tweets and scored them using the formula (Positive - Negative)/(Positive + Negative + Neutral). This number is the one I'm correlate with sales and I used Excel's CORREL function to find the correlation coefficient.

For company 1, it's 0.39

For company 2, it's 0.24.

I needed to see if it's significant or not so I found the p-value using this:

https://opentextbc.ca/introstatopenstax/chapter/testing-the-significance-of-the-correlation-coefficient/

Since I have 28 data, my degrees of freedom in the t-table is 26. my t-value needs to be above 1.706 (one tailed test with 95% confidence level).

For company 1, t is 2.18 and for company 2, t is 1.27.

Why one-tailed?

Because I want to test if twitter sentiments have a POSITIVE correlation to sales. (negative correlation has no practical use).

Im not familiar with how youve developed/used the naive bayes classifier to encode the tweets each quarter, but the reservations were about a simple correlation of tweets and sales without factoring in (or not) other explanatory variables, being able to distinguish companies in the first place, linear correlation assumptions (have you eyeballed the scatter plots youre trying to correlate and the time series data, are there any outliers which would have a high leverage on the correlation values) etc would mean that Id probably spend more time on explaining those things in the conclusion, rather than discussing a difference in the results. If youre claiming some form of significance between tweet sentiment and sales, how can you have any confidence that its not due to some other effect, such as an advertising push? One well known example is the correlation between ice cream sales and murders.

https://slate.com/news-and-politics/2013/07/warm-weather-homicide-rates-when-ice-cream-sales-rise-homicides-rise-coincidence.html

You dont say whether youre predicting actual sales value or some form of percentage raise/fall? If its the latter, perhaps you could combine the data to represent some form of generic company model? But given what youve described, if I was writing a conclusion, I would spend more time on justifying/analysing the data (does it have an underlying linear structure, what is the noise/errors like, are there many outliers in the time series or residual correlation plots, why did you not look at other explanatory variables for sales change in the analysis ... and honestly spend little time discussing the final significances as it will be easy to argue about them. State the figures, say they're a bit inconclusive, but if there are outliers (for instance), see if you can tell a story around them / discuss their leverage on the solution.

- Easy Maths modules at University
- How can I achieve a grade average of 78% in my third year of uni?
- Are my Uni choices okay for my A-levels?? (Going into Y13)
- Laptop
- Unpaired T test Help biology
- Everyone should study arithmetic
- Experimental Psychology at Oxford Uni
- Morse course at Warwick university
- Maths and archaeology
- Teavhing
- Maths or Maths and Economics degree
- Warwick vs UCL vs LSE for a career in private equity/ quant/ acturial science
- A-level Mathematics Study Group 2023-2024
- What topic are you on at school in Further maths yr 12?
- EPQ help!!
- Can't take further maths
- Primary Education with QTS Interview - English & Maths Test
- Grading system
- Mathematics AS re-sit
- How much statistics is there in clinical assistant psychologist roles?

Last reply 10 minutes ago

Official London School of Economics and Political Science 2024 Applicant ThreadLast reply 26 minutes ago

Official University of St Andrews Applicant Thread for 2024Last reply 34 minutes ago

Official Veterinary Medicine Applicants thread 2024 entryLast reply 43 minutes ago

BAE systems degree apprenticeships September 2024Last reply 1 hour ago

What is the difference between ejusdem generis and noscitur a sociis?Last reply 1 day ago

Did Cambridge maths students find maths and further maths a level very easy?Last reply 2 weeks ago

Edexcel A Level Mathematics Paper 2 unofficial mark scheme correct me if wrongMaths

71

Last reply 1 day ago

Did Cambridge maths students find maths and further maths a level very easy?Last reply 2 weeks ago

Edexcel A Level Mathematics Paper 2 unofficial mark scheme correct me if wrongMaths

71