A new artificial intelligence-powered tool has predicted the 2014 Ebola epidemic three months before it was officially declared, with researchers currently applying it to assist in predicting the spread of the coronavirus in Australia. 

Developed by CSIRO’s Data61 in collaboration with University of NSW, the tool combines natural language processing, data science and statistical time series modelling to identify specific syndrome keywords and their context mentioned in Twitter posts, facilitating the early detection of outbreaks despite expected daily, weekly and seasonal influences. 

The number of alerts that monitor the COVID-19 keywords in comparison to the number of new cases per day in Australia.

In the case of the 2014 Ebola epidemic, the tool gathered all relevant Tweets in Liberia, Guinea and Sierra Leone that mentioned ‘fever’ and ‘rash’ between 2011 and 2014, obtaining an illness alert for December 2013.  

The keywords ‘fever’, ‘cough’, ‘headache’, and ‘head cold’ are currently being used to determine the spread of COVID-19 across Australia, with the tool already generating results. 

“At the moment, we are monitoring flu like symptoms and will soon add ‘breathing difficulties’ and ‘wheezing’ to that list,” says Dr Ross Sparks, modelling and monitoring expert at Data61. 

“Currently, there are lots of coughs and head colds occurring throughout Australia, but mostly from Queensland, New South Wales and Victoria,” he says. These alerts are being passed to medical professionals across Australia as part of CSIRO’s national response to the coronavirus pandemic. 

“Any early warning monitoring system can be crucial to allow hospitals and health institutions to be better prepared,” says Dr Cecile Paris, Data61’s Chief Scientist. 

Its advantage is that it can generate alerts prior to an influx in hospitals or clinics, thus providing authorities and institutions with some lead time for better preparation, and, ultimately, fewer deaths. “

We are also hoping to enhance the system to provide an early identification of when case occurrences occur at a slower pace than expected, which might indicate a flattening of the curve. 

An extension of the approach used to predict Melbourne’s thunderstorm asthma outbreak, this methodology is a world-first in using social media monitoring for the detection of the Ebola epidemic.  

Daily counts in the aggregated dataset for both Ebola symptoms.

According to Dr Aditya Joshi, co-researcher of both projects, the approach for the asthma project used a single channel for each symptom and was unable to combine different symptoms together. However, this updated work can compare two-channel variations of the approach that combine different symptoms.

“A syndrome is characterised by a collection of symptoms,” explains Dr Joshi. “Past work in social media-based epidemic intelligence either monitors symptoms in isolation, which is what we did in our thunderstorm asthma paper or puts together tweets associated with different symptoms.”

Watch the Flu’s live dashboard

“In this paper, we compare the two variations, isolation versus putting together tweets. Our results show that, while both work, one of them does better than the other in terms of the number of alerts that it generates. In this case, we monitor tweets for two symptoms of Ebola: an early symptom: fever; and a late symptom: rash. When monitoring tweets, we observe alerts three months before the official announcement of the Ebola epidemic.” 

For example, the Victorian asthma version of the tool would produce an alert saying, ‘The program thinks there is an outbreak of asthma because social media posts reporting breathing difficulties are more frequent than expected’, while the Ebola version would say ‘The program thinks there is an outbreak of Ebola because social media posts reporting rash are more frequent than expected AND/OR social media posts reporting fever are more frequent than expected’.

The main component of these social media monitoring and prediction tools is the combination of two fields of AI – natural language processing and statistical time series modelling – and a four-step process to ensure the tweets containing the keywords were indeed reports of health conditions and to remove duplicates where an individual might tweet more than once about their condition. 

Natural language processing, or NLP, is the ability of a computer program to process human language. The tools use NLP based on word embeddings, to distinguish between reports of symptoms and unrelated mentions of the keywords. 

“Being able to distinguish between tweets that are reports of illness versus tweets which only contain certain keywords is the greatest strength of this tool, because past work in epidemic intelligence has not been able to do this,” says Dr Joshi.

Adapted architecture using Data Aggregation used in the Ebola tool.

Traditionally, hospitals would report an influx of patients suffering from similar symptoms, however, tweets assessed during this case study focused on early symptoms, such as a cough, which individuals would most likely not consider serious enough to visit a doctor or hospital, but impactful enough to publicly post about on social media. 

“The motivation [behind these tools] was that public health authorities and other health professionals can use these signals to support the decision to declare a public health emergency, notes Dr Joshi.  

“This could also be a useful tool for emergency services to better plan their ambulances and emergency staff if they are aware of an impending outbreak.”

 
For commercial enquiries about this tool, please contact us here