Stay Home, Be Brave? A Time Series Approach in Python

Explore Germans’ mental health during the first weeks of the COVID outbreak and learn about the importance of data quality checks

Published in

Towards Data Science

16 min readMay 6, 2022

Whenever trends, trajectories, or seasonality are of interest, time series analysis is your friend. This array of techniques seems more predominant in the economic and financial world (e.g., stock prices, marketing etc.), yet psychologists have become increasingly interested in adopting it as well. At first sight, this is not surprising, because psychological processes are inherently time-bound, on a scale from minutes to years (e.g., the immediate pain after social exclusion or the cognitive decline across a whole lifespan). However, it is obvious that humans are a particular hard nut to crack when it comes to forecasting. But what if we wanted to simply explore the dynamics of such a psychological process that happened over time?

Let’s take the psychological impact of COVID-19 for instance. Try to remember the early days of disturbing news about the spread of a new virus in spring 2020. Apart from a lot of statistics nobody really was familiar with yet, vans carrying dead bodies and overcrowded hospitals were shown in the media. Political measures to “flatten the curve” were implemented and changed our social lives in fundamental ways — almost overnight. Rules were re-defined on nearly a weekly basis and public functions were almost shut down completely. We could argue that we have lived in prolonged states of emergency without the prospect of improvement, circumstances that are likely to induce depressive symptoms and anxiety. Thus, it is not far-fetched to wonder how a relaxation in corona-measures may have reduced our perceived stress as well as concerns about the pandemic. On the 20th of April 2020, this is what happened in Germany: schools were opened again in baby steps and shopping was allowed in stores up to 800 m2. Now, how did this signal affect the public’s mental health on a day-to-day basis? How relieved were people, really, and did symptoms of ‘cabin fever’ persist into May?

The COVIDistress study

To find out, we will analyse data from the COVIDistress study, a collaborative international open-science project that aims to measure the experiences, behaviour, and attitudes during the early corona crisis across the globe. The survey included 48 languages and a total sample size of 173,426 participants who responded between the 30th of March and the 30th of May 2020. The authors suggest that younger age, being a woman, lower level of education, being single, staying with more children, and living in a country or area with a more severe COVID‐19 situation were associated with significantly higher levels of stress (Kowal et al., 2020). Since we are social species, we are wired to connect with other people but only on a voluntary basis: even if being cut off from society may evoke feelings of loneliness, a lack of privacy can likewise be painful. While it is valuable to acknowledge which groups seem more susceptible to the consequences of the quarantine, I was wondering if you could track down the trajectory of perceived stress over the course of the study and test its relationship with the onset of political measures. Not an easy task as we will find out — but more on that later.

Measure psychological impact of the corona lockdown

Perceived stress was measured with the PSS-10, a 10-item questionnaire developed by Cohen (1983) that evaluates the degree to which an individual has perceived life as unpredictable, uncontrollable, and overloading over the last week. To test how much people worried about the ongoing outbreak, they were asked to rate to which extent they were concerned about the consequences of the corona virus in respect to different areas (personal, own country etc.). For more detailed information about the exact wording, you can view the whole survey here. To give you a better understanding of what the terms actually mean, let me quickly explain them to you.

A little time series guide

Time series analysis often describes a systematic process (e.g., climate change, changes in stock prices etc.) that unfolds across time and is based on multiple observations that are equally spaced.

Even if this does not apply to our case study which has a more complex design, these multiple observations actually stem from a single source (e.g., an individual, an index). A rule of thumb says that you need at least 50 observations for roughly accurate prediction while more data are of course always welcome (McCleary et al., 1980).

Components

There are four sources of variability in time series data that either need to be explicitly modelled or removed by mathematical transformations (e.g., differencing) to make accurate forecasts:

Trend occurs when there is a clear change of the data’s level on the mid- to long-term. For instance, it may appear that the data’s average at the beginning of the series is higher compared to its end and thus exhibits a negative trend.
Season refers to a repeating pattern of increase or decrease that occurs consistently. It can be attributed to aspects of the calendar (e.g., months, days of the week etc.). For example, we can observe the temperature outside cycling up in the morning and falling down in the evening every day.
Cycles share an attribute with seasons: the reoccurrence of certain fluctuations. But unlike seasons, cycles are not of fixed duration, do not have to be attributed to aspects of the calendar and usually manifest themselves over a period longer than 2 years (e.g., business cycles).
Randomness describes an irregular variation that makes the trajectory jitter naturally and unsystematically. It can be attributed to noise and is the remaining variance once trend, seasons, and cycles have been removed from the data.

Concepts

Autocorrelation: Another common source of variance that deserves extra attention is called autocorrelation. It stems from the idea that the present state is somewhat influenced by previous states. Let’s say I am really ruminative in one moment which makes it unlikely that I will change to an easy-going happy mode soon. In psychologist jargon we say that prior affective states at least partly determine our current emotions. As we will see later however, this idea obviously only applies to emotions within a person and not between individuals and therefore longitudinal data are needed. In statistical terms, a time series shows autocorrelation if a variable is correlated with itself with a certain number of previous time points, referred to as lags. For example, a 2-lag autocorrelation is the Pearson correlation with the value that occurred two time before the current value. The autocorrelation coefficient across many lags is called autocorrelation function (ACF), which plays a role in model selection and evaluation.

Stationary: In real-life, many time series are not stationary, giving it the trajectory its wiggly, wobbly appearance. Technically, it means that the series mean, variance and/or autocorrelation structure does vary over time. But this aspect makes it much harder to predict any a future value because past values are not much alike. If we however account for the systematic patterns present in a series through mathematical transformation (e.g., differencing), we can achieve stationarity and start forecasting.

Model selection

If forecasting future points in the series is your main goal, an ARIMA models might be right for you. They are developed directly from the data without a need for a theory about the circumstances under which a process may occur. ARIMA stands for autoregressive integrated moving average. Both AR(q) and MA(p) terms specify the predictors that formally describe the autocorrelation present in a series whereas the I[d] describes the order of differencing that has been applied to render the series stationary.

We psychologists however often are actually particularly interested in these systematic aspects of a series. For example, we are keen on describing the underlying trend behind changes in perceived stress during the first lockdown. Alternatively, we could try to link these stress responses to external factors, such as the severity of the corona outbreak within each country. In another scenario, we could assess the impact of critical events such as a change in political measures (e.g., the introduction of masks, the conditional re-opening of stores etc.). Thus, apart from prediction, we are interested in descriptive and explanatory models. For this purpose, we could use a regression model first and then fit an ARIMA model to the residuals to account for any remaining autocorrelation. If you are interested in more technical details, you can find very good explanations in a paper by Jebb and colleagues (2015).

Okay — now let’s check our data first

Now we get to the fun part — coding in Python. Since the data can be downloaded online, everyone has free access to it thanks to the COVIDistress open source research project. Responses were stored by month, so we have one file for each month. To create a single file to work with, we need to concatenate them to load a dataframe that contains all months together. This is how it can be prepared with as little effort as possible: After setting a project path, we use list comprehension to look up all the relevant excel files for me using Python’s glob library. To save some unnecessary computing time, we can store the variable names that are of interest in a separate list. Then we go through all the files to look up the relevant columns and merge them together for all months using pd.concat(). The resulting dataframe can be exported to a csv file so we will only have to run this command once and return to the data any time. To avoid trouble when running the script over and over again, a condition is included before merging all the files: The csv file with all months must not be already stored in my folder, otherwise the resulting file will include thousands of duplicate entries. There is another trick that facilitates the subsequent computation and visualisation with matplotlib later: by telling pandas to parse dates based on the RecordedDate column, dates will be in directly converted into datetime64format and can be used as an index for the dataframe.

We can see a preview of this dataframe by using the head() method on df. By calling dtypes on df, we get the data type of each column. This is important to verify that the data were correctly detected by pandas. Because political measures to tackle the consequences of the corona virus varied by country, we will focus on Germany and therefore subset the data accordingly.

Hold on! We have missed something.

Point Warning GIF By Sophiaamoruso found on GIPHY

Now here is a trap: we have a special data structure that differs substantially from a longitudinal study design. Although the study went on for several weeks and records are sampled regularly across several weeks, each observation occurs only once. This is called repeated cross-sectional design and cannot be simply modelled in a traditional ARIMA model because past values cannot directly be linked to present or future values. Instead, ARIMA terms must be integrated in a multi-level model (MLM) (Lebo & Weber, 2014). Specifically, we can calculate the average stress level from all responses on a specific day and take it as a proxy for the population’s perceived stress level. However, we cannot tell the degree to which the respondents’ stress level on day one is associated with their stress level a week later because it was sampled from different people.

In addition, it is obvious that one’s perceived stress is not solely dependent on the state of the pandemic, but is also associated with a lot of other forces: One’s predisposition to stress, personal issues, and coping strategies to name just a few. Since the same respondents were not asked to rate their stress levels several times, we have no chance to disentangle these individual components from other sources of variance (something that repeated measures ANOVA’s take advantage from, for instance). But can we at least get a sense of the overall stress levels while keeping these limitations in mind? To find out if we have enough data to get a rough estimate for each day, we count the observations for each day. It tuns out that nearly 80% of the responses per date fell under the 100 participants mark. In contrast when we look at the international sample, there are only 3% of the days that include 100 participants. Nonetheless, it would be way more complicated to analyse the total sample because responses are nested in countries and dates which requires hierarchical modelling.

We can get the frequency of observations per month by using a neat function called strftime— it converts aspects of the datetime index (e.g., day, month, second etc.) into a nicely readable string. For instance, we can get the full month name by using %B within the brackets (more code in the documentation). To see how the response frequency to the survey has changed on a daily basis, we can create a histogram by simply calling the histogram function for matplotlib on the dataframe’s index — the x-axis will be formatted automatically.

Disclaimer: All images are created by the author unless stated otherwise.

Histogram of number of german participants per date, the red area represents N = 100 or fewer.

The plot just supports what we already know: even if there are roughly two weeks at the start of the study period that are backed by enough data, the interest in participation faded quickly.

Survey results

Now if we put these day-to-day variations aside for a moment, how did Germans generally handle the threats posed to their wellbeing during the first few months?

To find out, we first create two dataframes, each of which contains the relevant columns for items that are supposed to measure perceived stress as well as concerns about corona, respectively. By using the startswith() method, we can easily find all variables of the respective scale without tedious typing. Next, we create two lists to ensure that we put proper labels onto the graphs later:

a list with the relevant response categories for each scale (e.g., 1 = Strongly disagree, 2 = disagree etc.) and
a list that contains the statements participants were asked to rate (e.g., in the last week, how often have you been upset because of something that happened unexpectedly?, you can find the full survey here)

Then we build a function that takes the respective data subsets and statements as input and returns a dictionary with the percentage of responses that fall within each category of agreement on the scale. This way the function can be easily applied to our stress-and concerns-data later.

Here is an example of how it looks for the responses on the perceived stress scale from our german sample.

Dictionary output that contains the percentages of responses falling within categories

It turns out, for example, that 8% of the Germans never felt nervous and stressed lately while 21% of them did fairly often. However, we can get a better impression of these distributions with a neat and intuitive data visualization. Divergent stacked bar-charts are a nice way to demonstrate to which extent respondents endorsed or denied specific statements. For this purpose, I have used and adapted a code snippet I have found on Stackoverflow (credits to eitanlees!) in a way that the offsets were corrected depending on whether the number of response categories are odd or even.

Stacked divergent bar chart on the perceived stress responses from german respondents in 2020

On average, most people did okay mentally during the first weeks of the pandemic — at least this is what the proportions suggest: positive statements are shifted to the left, so the majority of participants felt able to cope with difficulties most of the time. Similarly, negative responses are shifted to the right thus most people just occasionally felt a lack of control over their lives. Speaking in numbers, only 4% never felt on top of things and 3% experienced difficulties piling up very often. Nonetheless, if we translate this back to absolute numbers, 4% of the sample (N = 2732) are at least 109 participants. Consequently, the plot does not paint such an exclusively positive picture because answers just vary widely, which suggests that individuals experienced the psychological consequences of the pandemic in different shades. This example shows yet another time that it is worthwhile to take your time to analyse the responses in depth instead of rushing to the averages right-away.

If we apply the same function to responses on concerns about corona, an interesting pattern emerges…

Stacked divergent bar chart on respondents’ concerns about the consequences of the corona virus in 2020

It looks as if people differed in respect to the concerns about the consequences of the Corona virus for themselves and their close friends, but mostly agreed upon its impact on their country and across the globe. Perhaps this comes down to the idea that people feel like they are still architects of their own fortune even in difficult situations. But when it comes to the general public, the consequences of such an encompassing problem do not lie within their hands anymore but depend on many more forces (e.g., political decisions, economic developments etc.).

Before we can create a single score across all questions that reflects a stress score for each individual, some of the statements need to be reversed. For example, if people highly agree with the statement “In the last week, I felt confident about my ability to handle my personal problems,” it suggests that their coping ability kept their perceived stress levels low, which should be reflected in their score too. To achieve this, we can make use of dictionary comprehension to guide the program to the variables for which numbers need to be swapped. We can then use pandas replace() method on our df_stressdataframe and simply pass the resulting dictionary as an argument.

By writing a function called aggregate_timeseries()that includes the following steps: we sum up all scores per statement row-wise, that is for each individual observation. Because the resulting object of this operation alone would be a Series with no name specified, so we convert it to a dataframe and fix the naming problem. Even if this score contains what we want — a composite score that reflects the perceived stress for each person — we can even break it down to one score per day across individuals, a technique widely referred to as downsampling. For this task, we can use the resample method to get the average of all observations that fall within a day. It is possible that the study period also includes a single day for which we do not have any data, so we can use interpolate() to approximate a possible stress score for that day. After a little formatting, our daily_timeseries() function can take any dataframe that includes one score per observation and turn it into a timeseries of daily averages.

Now, how does the overall stress score change over time? And how does the availability of data from day to day affect the resulting trajectory? Let’s visualize both, the variability in scores represented by individual data points as well as the daily average. First, we need to combine stress- and concerns-responses by merging the respective dataframes. To annotate the day at which measures were relaxed in Germany (20th of April 2020), we import datetime as dtto include a dashed vertical line in a format that is compatible to the datetime index. In addition, we deal with possibly overlapping data points by setting alpha to 0.1 to increase the opacity of individual points.

Responses on psychological impact questions across the whole study period (30th of May to 1rst of June 2020). Grey dots represent individual observations.

Now it is really obvious what is going on here. The higher the density of the grey scattered points, the more data we have for this specific point in time. In the first period of the study, the daily averages are thus backed up with roughly enough data to work with. It appears that there was a decline in the average stress estimate. This seems even more pronounced when it comes to the concerns about corona, but this comes down to a simple technical fact: the scale is narrower and this way the slope (e.g., from time A to time B) get steeper more easily than if the scale was wider.

For the rest of the study period, daily averages are based on very few data as the points are scattered like a hint of confetti. On single days, it is even calculated based on a single observation (making the use of an average redundant). The line thus fluctuates a lot between higher and lower values, depending on the sample’s responses. Look — this nicely demonstrates the problem we generally have with using the arithmetic mean to represent the average in small samples: it is super sensitive to extreme individual values. The trajectory we see here is thus not the development of perceived stress in the german population across time but most probably noise. So, we can reject the idea that we can use the data for more complex time series modelling because we run into the possibility that the trajectory is completely random: Each day, a new subsample is drawn to calculate a mean value which is supposed to represent the population’s psyche in the same way like the day before. It’s like comparing apples to oranges and still trying to find a connection between them.

Do data science on your data first

Don’t be disappointed for not being able to run the ‘actual’ analysis — it would just give us uninterpretable results that are not actually meaningful. This is also due to the nature of exploratory analysis: if the dataset is not suited for the analysis, there is no way to make it fit. Thus, we are unable to answer the question to which extent relaxations in corona had an impact on the stress experienced by the public. But apart from all the terms around time series analysis, we have learnt a lot about the importance of checking the suitability of your data. Eventually, statistics alone make no meaningful facts. Just the analyst does.

References

[1] A. Lieberoth, J. Rasmussen, S. Stoeckli, T. Tran, D.B. Ćepulić, H. Han, S. Y. Lin, J. Tuominen, G. A. Travaglino, and S. Vestergren, COVIDiSTRESS global survey network, (2020). COVIDiSTRESS global survey. DOI 10.17605/OSF.IO/Z39US, Retrieved from osf.io/z39us

[2] M. Kowal, T. Coll‐Martín, G. Ikizer, J. Rasmussen, K. Eichel, A. Studzińska, … & O. Ahmed, Who is the most stressed during the COVID‐19 pandemic? Data from 26 countries and areas. (2020), Applied Psychology: Health and Well‐Being, 12(4), 946–966

[3] R. McCleary, R. A. Hay, E. E. Meidinger and D. McDowall, Applied Time Series Analysis for the Social Sciences (1980), Sage Publications

[4] A. T. Jebb, L. Tay, W. Wang & Q. Huang, Time series analysis for psychological research: examining and forecasting change (2015), Frontiers in psychology, 6, 727

[5] M. J. Lebo & C. Weber, An effective approach to the repeated cross‐sectional design (2015), American Journal of Political Science, 59(1), 242–258