#60095 | AsPredicted

'Correlating Twitter Text Sentiment with DerStandard.at Online Survey'
(AsPredicted #60095)


Created:       03/05/2021 06:19 AM (PT)

This is an anonymized version of the pre-registration.  It was created by the author(s) to use during peer-review.
A non-anonymized version (containing author names) should be made available by the authors when the work it supports is made public.

1) Have any data been collected for this study already?
It's complicated. We have already collected some data but explain in Question 8 why readers may consider this a valid pre-registration nevertheless.

2) What's the main question being asked or hypothesis being tested in this study?
Based on our pilot study using text data from DerStandard livetickers, we predict the following for data from Twitter:
1) There is a positive correlation between large-scale aggregates of affective expressions in tweets from Austria with the responses to an online survey on derstandard.at.
2) The combination of novel deep-learning and traditional word-count sentiment measures is a better predictor of self-reported affect than either of them alone.
3) We predict that 1) and 2) work with levels with a 3-day rolling window as well as inter-day changes (without a rolling window).
4) We predict that the correlation of affective expressions in tweets and the survey is higher for positive than negative sentiment.

3) Describe the key dependent variable(s) specifying how they will be measured.
As we already did for data in DerStandard liveticker postings, we use a baseline period from 2020-03-16 until 2020-04-20. We baseline correct all text sentiment measures for their average level in that period by subtracting and dividing by the baseline values. We analyse 2 sets of variables:
1) Responses to an online survey from 11.11. to 30.11.2020, displayed on the website of derstandard.at. One question “Wie war der letzte Tag?”/”How was your last day?” with four answers to choose from:
Schlecht/Bad
Eher schlecht/Somewhat Bad
Eher gut/Somewhat Good
Gut/Good
2) text sentiment measures: We select all tweets by Austrian users from 09.11. to 30.11. (beginning from two days earlier than the start of the survey because of the rolling window, see below). We will calculate three types of sentiment scores: LIWC-score based on LIWC dictionaries (word lists) for positive and negative affect adapted for Twitter and DerStandard as in our previous study (https://www.frontiersin.org/articles/10.3389/fdata.2020.00032/full). We will calculate the fraction of all postings that contain at least one term from the dictionary. GS-score: We will use the model “GermanSentiment” (GS) to classify postings as negative, positive or neutral, and use the fraction of positive postings to predict the proportion of positive survey responses. GS is a BERT type deep learning model fine tuned on German texts that include social media postings and newspaper articles. Combined sentiment score: We compute an aggregate sentiment measure that is the average of LIWC positive minus negative affect and GS positive minus negative. We take a three day rolling window, because the survey question asks about emotions experienced on the previous day. Responses can be influenced by emotions experienced on the day of the question and also two days before when a user responds just after midnight, before going to bed.

4) How many and which conditions will participants be assigned to?
No conditions (observational research)

5) Specify exactly which analyses you will conduct to examine the main question/hypothesis.
Pearson’s correlations (significance level p=0.05) of the proportion of positive responses in the survey with the sentiment measures: Proportion of positive survey responses with the combined sentiment score, the LIWC-scores for both negative and positive separately and the GS-scores for both negative and positive separately. In addition, we will regress the proportion of positive survey responses on the same variables as before. We perform all analysis both on baseline-corrected levels as well as on changes from one time period to the next.

6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations.
Twitter: We will only analyze tweets in German from users in Austria (as determined by Brandwatch) from the same time periods as for our existing analysis with DerStandard liveticker comments, and exclude retweets, as well as tweets posted by accounts (1) classified as organizations by Brandwatch, (2) with less than 100 or (3) more than 5000 followers to exclude spam and mass media.
No exclusions for the survey.

7) How many observations will be collected or what will determine sample size?
No need to justify decision, but be precise about exactly how the number will be determined.

All postings on Twitter that fulfill the criteria in 6).

8) Anything else you would like to pre-register?
(e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

This is an extension of the analysis of survey data that has been collected by derstandard.at and analysed by David Garcia and Max Pellert. Twitter data has not yet been analysed with the LIWC and/or GS nor aggregated for the survey time period. None of the analyses planned above have been conducted yet.
We plan to run additional robustness checks:
* When calculating LIWC scores, we check if the results stay very similar when 1) we calculate scores at the tweet level and then weight by the number of tokens, or 2) when we create a concatenated collection of all tweets and calculate one score from it.
* We want to compare the standard LIWC dictionaries to an expanded version of a dictionary published in a data repository connected to https://www.nature.com/articles/s41467-020-18349-0, by running the same analyses as above with it.