COVID-BEHAVE dataset: measuring human behaviour during the COVID-19 pandemic | Scientific Data –

Study design

The COVID-19 pandemic crisis had limited severely the options for recruiting participants since national regulations prohibited in general close contact (social distancing rules). This situation played a major role in setting up the experiment (such as distributing and configuring the sensing devices for the study participants). For this reason, and yet to ensure a reliable data collection, we decided to recruit healthy participants who already owned the necessary devices and could follow the instructions of the study remotely. Additionally, the study was conducted in the Netherlands and in Greece, providing a first insight into two European countries that followed different confinement measures to explore how these would affect the wellbeing of the residents. During the time the data collection took place, there was a hard lockdown in Greece followed by a strict curfew, while the lockdown rules imposed by the Dutch government were much softer. For instance, Dutch residents were allowed to go for a walk outdoors throughout the day, while Greek residents were restricted to stay at home and go out only for a valid reason (such as paying a visit to the doctor). Hence, the study was designed to take place during the COVID-19 lockdown, where we would have the opportunity to measure human behaviour during and after the lockdown, and presumably changes in between.

Participants were asked to use their own smartphone and activity tracker device (if any), as usual, and to answer a few online questions on a daily basis. An activity tracker is considered here any wrist-worn device (e.g., Fitbit, Garmin, Apple Watch) that tracks the performed activities or steps but also any pedometer mobile application that counts steps. The study was performed in uncontrolled settings, and thus, no explicit instructions were given to the participants on what tasks to perform or how often.

The study started after receiving the approval from the Ethics Committee EEMCS of the University of Twente (UT), with registration number ‘RP 2020–43’. The data collection namely started on Wednesday, May 20th, 2020 (the actual date varies per participant, since some subjects started a bit later depending on their availability) and it ended on Sunday, July 5th, 2020. However, there were some participants who decided, upon request, to continue with the data collection for a few more days. Some days later, a compensation (voucher for online shopping) was given to those participants who completed the study.


The recruitment was performed based on convenience sampling, asking for participants who were owing the necessary devices and were willing to participate in the experiment during the COVID-19 outbreak. Therefore, we did not specify any recruitment criteria that could reflect specific societal aspects (such as socioeconomically, demographically, or through participant’s education/employment attainment). During the recruitment phase, an information brochure including a short description of the study was distributed online via the University of Twente’s social media (Facebook, Twitter, LinkedIn). Further explanations of the research study were given to the candidates, including participants’ tasks and data privacy considerations. The recruited participants had to confirm their participation by signing the informed consent form, while final instructions were given explaining how to set up the sensing devices in order to participate in the study.

Initially, more than 30 participants showed interest to participate in the study. However, only 23 confirmed to participate in the data collection phase. Two of them decided to withdraw in a later phase. Finally, 21 subjects aged from 25 to 58 years old (average age 32.30 ± 7.80 years) completed the data collection period. Among the participants, 15 were females (average age 31.70 ± 7.45 years) and 6 were men (average age 34.00 ± 9.00 years). Regarding the location and the users’ environment, 5 participants were located in Greece, while the rest were living in the Netherlands during the data collection phase. It is worth mentioning that among the participants there was one pensioner with low-back chronic pain, two healthy students, while the rest of the subjects were healthy adults (some of them were working remotely and some of them were visiting their office regularly). Eventually, the participants were divided into two groups (Group A and Group B) according to the devices they used for the data collection. An overview of the Group A and Group B participants can be seen on Table 1 and Table 2, respectively.

Table 1 Overview of the participants for Group A.
Table 2 Overview of the participants for Group B.

Group A consists of 12 participants (average age 34.58 ± 9.97 years). Ten of them were living in the Netherlands, while the rest were living in Greece. The collected data consist of samples acquired through the mobile app (running on Android OS devices), activity tracker (also including pedometer applications), mobile-based questions (via the mobile app), and web-based questions (via the website).

Group B consists of 9 participants (average age 29.44 ± 1.17 years), for whom the collected data include samples from the mobile app (running on iOS devices), activity tracker (also including pedometer applications), and web-based questions (via the website). Six of them were living in the Netherlands, while three were living in Greece. Furthermore, four participants from this group installed the mobile app on iOS devices, while the rest did not install the app on their smartphone device (either they did not manage to install the app by themselves, or they were sceptical due to data privacy concerns).


The data collection, handling, and management were performed via the Behaviour Analysis Framework16 (a novel framework for sensing, identifying and quantifying different domains of human behaviour in a holistic way), applying the necessary technologies to collect, store, and maintain multimodal sensor data securely on the framework’s server. Respecting participants’ privacy, the collected data were anonymised and stored to a secure server at the University of Twente. All data were de-identified, and thus, neither real names nor private information was stored. Furthermore, data access was limited only to researchers with authorised credentials and passwords.

Before the data collection phase took place, the participants received two tutorials with detailed instructions on a) how to install the mobile app17 on their mobile Android or iOS devices and what smartphone sensors to enable/disable for the data acquisition with respect to the requirements of this study, and b) how to register and use the website18 in order to connect their activity tracker and answer the online questions.

The mobile app here refers to the AWARE app17 and it was used to collect data records from users’ smartphones and upload them on the UT server. Hence, sensor data related to accelerometer, GPS, ambient noise, phone calls, and text messages were collected and processed via this app. The AWARE app was also used to trigger the mobile-based questions, hereinafter referred to as the Experience Sampling Method (ESM) questions. The mobile app version used was the aware-phone-armeabi-release.apk published on 28 April 2020 for Android devices19,20 and the AWARE Client V2 1.10.9 published on 29 March 2020 for iOS devices21,22.

The website here refers to the Council of Coaches (COUCH) website18. This website was used by the participants to answer the online questions, which are presented to the users via a virtual coach instead of traditional prototypical questions to increase response adherence23. We refer to these as the Coach-as-a-Sensor (CaaS) questions. The CaaS represents any type of information that can be actively acquired during a conversation with a virtual character (e.g., emotional coach) and not passively through sensor data. During a virtual conversation, the user is asked some questions in order to monitor user’s behaviour. For example, a virtual coach expertise in emotional behaviour can ask CaaS questions about user’s mood (e.g., “How are you feeling today? Could you describe your overall mood during the morning hours?”) when the conversation allows it. Similarly, another coach expertise in cognitive behaviour can ask questions related to the engagement of the user with cognitive tasks (e.g., “Have you read a book today?”). Both of these questions aim to estimate user’s feelings, such as being sad (including answers for being very negative or negative), happy (including answers for being positive or very positive), or whether they were involved with cognitive tasks. Such questions are interspersed with the normal flow of the conversation, and in some cases, may be regarded as small talk or conversational connectors, thus reducing the perceived burden to the user compared to traditional ESM’s. However, our aim for this study was to use both ESM and CaaS questions as we believe they can complement each other rather than one being the replacement for the other.

In addition to the data collected via the Behaviour Analysis Framework, we also obtained data related to exogenous variables that could play a major role while analysing human behaviour. In particular, carrying out causality studies in human behaviour requires the consideration of confounding factors that could possibly force participants into certain choices. For instance, weather, news, social norms, and government measures could affect users’ behaviours. Depending on the granularity level of the exogenous variables, this can become a hardly approachable problem. In this work, we consider exogenous variables that refer to the weather conditions, but also to the outcomes of the pandemic outbreak in Greece and in the Netherlands. Meteorological data were obtained through the “Meteostat” repository24, referring to the weather conditions (e.g., temperature, precipitation, wind speed, and air pressure) in Athens and in Amsterdam for the period of the data collection. Moreover, a collection of data is provided via the “Our World in Data” database4, including observations related to the COVID-19 outbreak from January 2020 up to September 2020. These data refer to exogenous variables such as the number of confirmed cases and deaths, excess mortality, policy response, reproduction rate, tests and positivity, etc.

Data collection

Initially, the AWARE app was installed on the user’s smartphone, which allowed the smartphone data acquisition and the triggering of the mobile-based questions. The collected smartphone data consist of accelerometer and GPS signals for tracking physical activity, while ambient noise, location, the number of incoming/outgoing phone calls, and the number of received/sent text messages can be used to detect the user’s level of being socially active.

GPS coordinates are used to cluster the locations that a person visited throughout the day (e.g., home, work office, supermarket, etc.), estimating how much time the subjects spent at these locations (e.g., 20 hours at home). Clusters are calculated based on the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) implementation, where 0.5 km is the maximum distance that points can be from each other to be considered in a cluster and 2 is the minimum cluster size (everything else is classified as noise with the value −1). Furthermore, the distance and travel speed are calculated for each detected location according to the Haversine distance between two arrays of points on the earth (specified in decimal degrees). The sampling frequency is every 180 seconds.

Apart from gathering raw data (such as GPS), some higher-level features are also collected. These data are acquired via AWARE plugin tools or are retrieved directly from the vendor’s server (e.g., Fitbit Inc.). Users’ physical activities have been detected and retrieved via the AWARE Plugin: Google Activity Recognition25, which is used to estimate a new activity inference among the following labels: walking, running, biking, being in a vehicle (car, bus), tilting, or unknown. The frequency range to detect a new activity varies in seconds, depending on the duration of the ongoing activity. This information can be used to capture users’ mode of transportation or estimate the intensity level of the performed activities (e.g., the activities running and biking can be clustered as vigorous activities, while being in a vehicle can be clustered as sedentary activities).

Additionally, activity trackers data are collected and include the number of steps and the intensity of the performed activities per day (i.e., sedentary, lightly active, fairly active, and very active), which can be used to enhance the understanding of users’ physical behaviour. This information is retrieved directly from the vendor’s server (e.g., Fitbit Inc.) or is manually annotated by the user and based on the user’s pedometer mobile application. The frequency for counting steps ranges from hours to days, depending on the source. For instance, the Fitbit data count steps every minute, while the pedometer apps data refer to steps per day.

Conversation data are collected through the microphone sensor and are used to analyse the ambient noise and estimate if there is an ongoing conversation. This information is retrieved via the AWARE Plugin: Audio Conversations26 to detect if the user is engaged or not in a conversation with one or more people, without storing raw audio. More specifically, the plugin follows a duty cycle of collecting audio for 1 minute, followed by 3 minutes pause, assuming that a real conversation can be detected in a time frame of 4 minutes (since a continuous audio recording would have a huge impact on the device battery drain).

Phone calls data refer to the time that an incoming or outgoing call is initiated, including its duration. Furthermore, each contact is characterised by a unique device id number, and thus, more features can be extracted. For instance, if the calls were performed between familiar persons (such as family members and friends) or between new unknown contacts.

Text messages data refer to the time that an SMS was sent or received. Similar to the phone calls feature, each contact is characterised by a unique device id number, so it can be investigated if the SMS were exchanged between familiar persons (such as family members and friends) or between new unknown contacts. For instance, a text message that is occasionally received by a sender, to whom the user does not answer, could indicate that these types of messages are random and do not actually reveal reliable knowledge about the user’s social behaviour (e.g., messages related to advertisement purposes).

Concerning the smartphone audio (e.g., ambient noise and phone calls) and text messages (SMS), these were only used for annotating if a user was socially active, without examining the audio or text content (no raw audio or text content were stored). For instance, the audio decibels were used for modelling user’s social behaviour (instead of the audio content) and detecting if a conversation was taking place. In addition to protecting users’ privacy, GPS data are processed accordingly so that the actual users’ GPS trajectories cannot be revealed (such as the coordinates of users’ home).

Furthermore, ESM and CaaS data have been collected, by asking the related questions on a daily basis through the mobile app and through the website, respectively. These questions can be used to monitor user’s social, emotional, and cognitive behaviour changes during and after the lockdown. The related answers consist of free text, numeric values, Likert scale, yes/no queries, and multiple choices and refer to user’s physical activities, social interactions, feelings, and cognitive tasks. The sampling frequency varies per question and per participant.

ESM data consist of users’ input on the questions triggered via the mobile app, related to the social, emotional, and cognitive behaviours. The questions (Q1-Q3) were triggered via a pop-up notification three times per day, at 12 pm, 5 pm, and 9 pm, referring to users’ behaviour during the morning, afternoon, and evening hours respectively. However, these questions were only answered by the participants from Group A, by using the mobile app on their Android device. Group B participants who installed the AWARE mobile app on iOS devices could not answer the ESM mobile-based questions since these were not supported at the data-collection time. An overview of the ESM questions can be seen on Table 3.

Table 3 An overview of the ESM mobile-based questions asked through the mobile app.

CaaS data consist of users’ input on the online questions triggered via the website, related to the social (Q4-Q8), emotional (Q9-Q11), and cognitive (Q12-Q16) behaviours. These questions were answered once per day based on the user’s availability and time preference, just by logging on the COUCH website18 and following the given instructions. An overview of these questions can be seen on Table 4.

Table 4 An overview of the CaaS web-based questions asked through the website.

A few additional questions (Q17-Q23) were asked on a weekly basis and at the end of the experiment. These questions were asked for special events or situations that occurred to the participants during the data collection, aiming to reflect significant changes in their behaviour. This users’ input contains valuable information since it can be used to evaluate the models for detecting behaviour changes that occur during the lockdown and how these changes differentiate when the users progressively return back to their normal lifestyle. An overview of these questions can be seen on Table 5. The questions Q17 and Q18 were asked at the end of each week to track any possible behaviour changes that followed on a weekly basis. Furthermore, the subjects were asked to answer the questions Q19-Q23 at the end of the data collection phase to identify possible behaviour changes during the lockdown period, and answer retrospectively if there were any significant changes in user’s behaviour before and after the COVID-19 lockdown. These questions were asked based on users’ preferences, either via the AWARE app or through online Google forms (for those who did not install the AWARE app).

Table 5 An overview of the questions related to the self-reported weekly events.


The current dataset fairly represents some of the behaviours observed during and after the first wave of the COVID-19 lockdown, and their presumably changes in between. Even though we firmly believe that this study serves fully to the task of helping researchers to develop and test novel methodologies to study human behaviour individually and collectively, there is a number of limitations that need to be considered. Apart from typical cultural and ambient differences, the restrictions posed by governments during the lockdown differed among countries and even in between regions. Furthermore, the sample size is rather limited to generalise conclusions, and thus, the dataset should not be used to extrapolate further learning at societal level.

The strict confinement measures due to the COVID-19 outbreak hampered the recruitment phase. Therefore, the recruitment and the set-up of the experiment took place online based on convenience sampling, asking for participants who owned the necessary devices and were willing to participate in the experiment during the COVID-19 outbreak. This was a real barrier especially for senior participants, since we could not meet them in person and help them, for example, install the data collection app. It should be also noted that some people raised understandable concerns given the nature of the collected data, despite the fact that all data are collected and treated anonymously. Some individuals eventually accepted to participate after clarification, yet others declined the invitation for not being able to talk this through in person. Consequently, the strict restrictions posed by the COVID-19 lockdown had an inevitable impact on the recruited participants and their sample distribution in terms of regular factors (e.g., participants’ age, gender, income, etc.). Therefore, the collected data should not be used to reflect communities based on certain criteria, such as socioeconomically or demographically.

The data collection was conducted based on users’ availability and preferences, but also based on the available sensing devices that they owned. That means that different types of data were collected among participants. Trying to ensure data quality and integrity, we divided the participants into two groups: Group A and Group B. For Group A the data collection was conducted without any restrictions. However, the data related to Group B were not always consistent, since different types of data were collected among participants. For instance, some participants were not able to install the mobile app on their smartphones, while others did not give consent to allow access on retrieving their own smartphone data. Specifically, 5 participants (i.e., s02, s05, s08, s10, and s11) used only the activity tracker and answered the web-based questions, while only 4 participants from Group B gave consent to collect mobile data.

Another important remark is that the participants who installed the mobile app on iOS devices could not answer the ESM mobile-based questions due to the limitations on the iOS app (this option was not supported at the time of the data collection). Furthermore, we noticed that there is a significant amount of missing data related to the CaaS and/or the mobile sensor recordings for some subjects (e.g., s05, s13 and s15). The former could be due to the fact that subjects might be forgetting to log-in on the web-browser and answer the CaaS questions on a daily basis. The latter could be because either the subjects were accidentally disabling the mobile app to run in the background and operate smoothly (i.e., record sensor data and trigger ESM questions continuously), or the device settings for the battery optimisation were not fully adjusted to permanently allow the app to run in the background, resulting to discontinuous sensor recordings.

Depending on the smartphone manufacturers, some Android devices have a strict policy for stopping apps that run in the background after a certain time due to the battery optimisation. This ‘sleeping mode’ function ensures that unnecessary apps that run in silent mode will not affect the battery life of the smartphone device. However, this function could disable the mobile app from registering sensor data, continuously. Before the data collection started, we informed the participants on how to disable this ‘sleeping mode’ function, by providing a thorough set of instructions according to their device model. Additionally, the study investigator had a private video call with some participants for further assistance. However, a few participants found these instructions still hard to follow, since they were not familiar with adjusting the settings on their device, and eventually did not manage to successfully disable the battery optimisation for the AWARE app. Consequently, the mobile app was occasionally entering the ‘sleeping mode’ and the subjects had to open again the AWARE app for enabling it to run in the background, collect and upload data on the server.

While most of the collected data required a preprocessing phase to get them cleaned and ready for dissemination, the mobile data (apart from the GPS coordinates that required further processing to ensure users’ anonymity) were automatically processed and cleaned through the AWARE mobile app. For instance, physical activities and conversation data were automatically processed via the AWARE integrated plugins to provide the necessary information (e.g., Google Activity Recognition and Audio Conversations). However, some of the automatically processed recordings might contain noise due to the nature of the technology (e.g., the transition from Wi-Fi to cellular network could cause some inconsistencies), depending also on the smartphone manufacturer and the device permissions to record these data. For instance, GPS data (such as travel speed and distance features) for s01 consist of some nearly impossible values that do not always represent the actual movement. Another limitation of the GPS data is that real coordinates could not be made publicly available, and therefore, we calculated the difference of longitude and latitude, row by row, from their previous recordings. Consequently, we recommend further cleaning for the acquired data related to GPS, physical activities, and conversation.