
An international study in which researchers from the Lozano Blesa Clinical Hospital belonging to the IIS Aragón participated demonstrates that COVID-19 increases the risk of complications during pregnancy
26 April 2021
They show that certain antibiotics can prevent the coronavirus from entering cells
27 April 2021When we search the internet for flu or Covid symptoms, and also when we talk about our health on social networks, we leave a trail of data that can be used by epidemiologists to identify the outbreak of an emerging virus or a new outbreak. of an epidemic.
At the end of 2019, two digital disease monitoring systems, HealthMap and ProMED, detected the first alarm signs coming from the city of Wuhan, China, days before the World Health Organization (WHO) warned. The Health Map team obtained clues from a press release published online and ProMED detected that conversations were taking place on Weibo, a Chinese social network, about pneumonia of unknown origin.
Data from search engines, social networks, chats and other online publications are a kind of "digital crumbs" for epidemiologists, in the words of John Brownstein, one of the founders of HealthMap, professor of Medicine at Harvard and Chief of Innovation at Boston Children's Hospital. They constitute a huge amount of data, providing daily clues about outbreaks of various infectious diseases or other health problems.
In December 2019, sixteen days before the announcement of the SARS-CoV-2 outbreak by local authorities in Wuhan, among the posts and searches made on WeChat, a popular Chinese messaging application used by one billion people each month , the word 'feidian' began to appear more frequently, which in Mandarin Chinese means severe acute respiratory syndrome. Until then, WeChat users had not typed the word 'feidian' too many times, but between December 15 and 29 its use increased and did so especially rapidly on December 30, a day before the outbreak of atypical pneumonia became public.
This phenomenon was retrospectively analyzed by researchers at Xi'an Hospital in China, analyzing data from the WeChat Index, a publicly accessible service that provides the frequency with which users of this application type certain words. The researchers concluded that using this tool, and 'feidian' as a keyword, the first Covid outbreak could have been detected two weeks earlier. They also identified an increase in the use of terms such as 'SARS', 'coronavirus', 'new coronavirus', 'difficulty breathing', 'dyspnea' or 'diarrhea', although these keywords did not work as well to detect the epidemic in advance.
Two weeks lag
Epidemiological surveillance systems are essential to identify outbreaks of new emerging viruses or already known diseases and allow countries to take measures against situations that can lead to epidemics or even a pandemic like the current one. They are also useful for following the dynamics of seasonal infections such as the flu, which recur year after year. Traditionally, these epidemiological surveillance systems work with clinical and microbiological data provided by hospitals and laboratories that analyze patient samples for pathogens. But, whether due to lack of resources at the local level or due to other factors, it has been estimated that there is usually a delay of one to two weeks from the moment an outbreak occurs until it is reported by an official body such as the European Center for the Control and Disease Prevention (or its American or Chinese equivalents). In order to shorten this time lag, data analysis experts have set their sights on the Internet for more than a decade. During this time, initiatives such as Google Flu Trends have been developed, which aimed to predict flu epidemics based on searches carried out on Google.
Information systems such as HealthMap (created in 2006) and ProMED (created in 1994) integrate large amounts of data that they use to monitor infectious disease outbreaks and to provide real-time information to both local public health agencies, the WHO or the CDC as well as the general public. But in general, they still do not use information from social networks, unless it is from well-identified public health expert sources. For example, HealthMap analyzes and filters information from news published on the Internet and from government sources. It also draws on citizen science projects such as Flu Near You and Outbreaks Near Me, in which millions of users selflessly contribute their flu or covid symptoms, or the results of their tests, to be able to collaboratively track the outbreaks of these infections.
Although we are still at the dawn of digital epidemiology, the large amount of data and the speed with which it can be transmitted and analyzed will be very useful to anticipate and track future epidemics, as we are already beginning to see. Even so, there are a series of limitations that must be taken into account. Chief among them are the reliability and accuracy of the data and models generated, as seen in the case of Google Flu Trends, and issues relating to the privacy and security of users who contribute and provide their data.
Reliability and limits
For models based on Internet search engines to be reliable, they must test sets of keywords to choose those that best represent the evolution of real data. For example, the economist and professor at the University of Navarra, Francesc Pujol, in a recent article published on his blog, has used the public access tool Google Trends (Google Search Trends), which shows the evolution of search terms. most popular searches in a specific period of time, to see if there are correlations between searches and flu or covid cases. In his analysis, Pujol uses simple terms such as 'flu', 'covid', or 'covid symptoms' in isolation, and finds a series of illustrative correlations with waves of flu, in the past, or of covid, from 2020. These are still curiosities that, as was revealed with the much more complex Google Flu Trends model (based on 45 key phrases), reality may sooner or later end up disproving.
For example, some predictions may overestimate the number of cases because users search online for information about the disease after it has been reported in the news, and not because they actually have symptoms. In other situations the explanation is more fun; like in 2007, when in the United States there was a spike in Google searches for the word 'cholera' not because of an outbreak of the disease, but because the television presenter Oprah Winfrey had recommended Gabriel García Márquez's novel 'Love in the times of cholera' in his book club.
Thus, to avoid these confusions, more reliable models will be necessary that use artificial intelligence algorithms that learn from large amounts of data and adapt to the real dynamics they try to predict, which are complex. HealthMap already uses them and, from a database of millions of articles describing real infection outbreaks labeled and categorized by its team of researchers, its system learns to distinguish useful information from spurious one.
Another limitation is that the information collected is not representative of the entire population. Depending on the search engine or social network used, groups of people may be left out of the models due to their age, sex, language or socioeconomic level, among others, and, therefore, the models derived will be condemned to failure. In certain social networks it has been seen that the majority of contributions come only from a small fraction of people, around 10%, which represents the most active users. Social networks are also a source of fake news and hoaxes that can introduce background noise to predictive models.
Another problem is the risk to the privacy of users, for example when geolocation data from mobile phones is used to track cases and contacts in an epidemic or the 'digital crumbs' that we leave on our walks through the city are used. grid. The aggregate and anonymous use of this data must be guaranteed and that the rights of users, including their privacy, are preserved.