To investigate whether Twitter data can be used as a proxy for the surveillance of the seasonal influenza epidemic in France and at the regional level.
Social media as Twitter are used today by people to disseminate health information but also to share or exchange on their health. Based on this observation, recent studies showed that Twitter data can be used to monitor trends of infectious diseases such as influenza. These studies were mainly carried out in United States where Twitter is very popular1-4. In our knowledge, no research has been implemented in France to know whether Twitter data can be a complementary data source to monitor seasonal influenza epidemic.
For this exploratory study, an R program allowing to the collection, pre-processing (geolocation and classification) and analysis of Tweets related to influenza-like illness was developed.
Stream API was used to collect Tweets in French language that contained terms “grippe”,”grippal”, “grippaux” without to specify geolocation coordinates.
In order to identify Tweets localized in France, a combination of automated filters has been implemented. At the end, were retained:
● Tweets with geolocation coordinates in France (GPS coordinates, country code, country, place name)
● Tweets whose place indicated in user’s profile matched with a city, department or region of France
● Tweets included FR-related time zone but excluding all Tweets reporting a FR time zone but a non-FR place-code.
In the second time, a support vector machine (SVM) classifier was used to filter out noise from the database. To train the classifier, 1500 Tweets were randomly sampled. Each of these 1500 training Tweets was manually inspected and tagged as valid or invalid according to the likelihood that they indicated influenza-like illness. This hand-tagged training set was converted to vector representation using their term-frequency-inverse document frequency (TF-IDF) scores. These TF-IDF vectors were then input to the SVM for training. To evaluate performances of the classifier: accurency, recall and F- measure were calculated from a 1000 randomly sampled Tweets manually tagged.
Data collected over the period from August 8, 2016 to March 26, 2017 were compared to those of the French syndromic surveillance system SurSaUD® (OSCOUR® and SOS Médecins network)5 by Spearman's rank correlation coefficient.
In accordance to the National Commission on Informatics and Liberty, information about user account were removed in database except location variables. Usernames contained in the text of the tweet have also been deleted.
Over the study period, the system collected 238,244 influenza-related Tweets of which 130,559 were located in France. After a cleaning step, 22,939 Tweets were classified by the algorithm as an influenza-like illness (ILI). The performances of the classifier were 0.739 for accuracy, 0.725 for recall and 0.732 for F-measure. Figure 1 shows that the weekly number of ILI Tweets follows the same trend as the weekly number of ED visits and physicians consultations for ILI. Regardless of data source, Spearman's correlation coefficients were positive and statistically significant at the national level and for each region of France (Table 1).
This exploratory study allowed to show that Twitter data can be used to monitor the epidemic of seasonal influenza in France and at regional level, in complementarity with existing systems. The system needs to be improved to confirm the trends observed during the next influenza epidemic.
1.Broniatowski DA, Paul MJ, Dredze M. National and local influenza surveillance through Twitter: An analysis of the 2012-2013 influenza epidemic. PLoS One. 2013;8(12):e83672.
2.Gesualdo F, Stilo G, Agricola E, Gonfiantini MV, Pandolfi E, Velardi P, et al. Influenza-like illness surveillance on Twitter through automated learning of naïve language. PLoS One. 2013;8(12):e82489.
3. Paul MJ, Dredze M, Broniatowski D. Twitter improves influenza forecasting. PLoS Curr. 2014;6.
4. Allen C, Tsou MH, Aslam A, Nagel A, Gawron JM. Applying GIS and machine learning methods to Twitter data for multiscale surveillance of influenza. PLoS One. 2016;11(7):e0157734.
5. Ruello M, Pelat C, Caserio-Schönemann C, et al. A regional approach for the influenza surveillance in France. OJPHI. 2017;9(1):e089.