Forecasting the Pandemic from the Lens of Data Science

article on how to do pandemic forecasting with data science and AI techniques
Reading Time: 3 mins

The COVID-19 pandemic has had a tremendous impact on the world, from changing the way we live to drive extraordinary acts of human compassion. In addition to the medical and public health aspects, it has also had a profound economic and social impact. It’s because coronavirus is a novel virus, and we do not know enough about it. This means that our ability to respond to it has been colossally impacted.

Our data science teams at Gramener built a decision support system for an Indian state government to better forecast pandemic with data science initiatives and act accordingly. The Composite Forecast Model aids decision-makers with a 14-day forecast of daily counts, infrastructure, and logistic requirements.

This two-part blog describes our journey through this pandemic in understanding it better and to have a clearer picture of what to expect. In the first part, I will explain what models we tried, and the second blog will be about which methods worked.

Compartmental Models

Mathematical modeling in epidemiology has a long and rich history, dating back to the 1920s with the Kermack-McKendrick theory. The basic idea looks deceptively simple: divide the population into different compartments representing different stages of the disease and see how the numbers evolve over time.

SEIR Model

One of the widely used epidemic models is the classical SEIR Model. The SEIR model simulates the time-histories of an epidemic phenomenon. It models the mutual and dynamic interaction of people between four different groups, the Susceptible (S), Exposed (E), Infective (I), and Recovered (R).

  • Susceptible: The section of the population that could be potentially infected.
  • Exposed: The fraction of the population that has been infected, but does not show symptoms yet.
  • Infected: Those capable of transmitting the disease.
  • Recovered: Those who have become immune to the disease

A characteristic of this model is that the sum of the four categories is equal to the total population (N) at any point in time (t):

N=S(t)+E(t)+I(t)+R(t)

As evident, it does not consider natural births and deaths of the population during the time span of the disease.

Model Parameters

As it is an epidemiological model, it depends on a number of disease parameters as follows:

Reproduction Number

The basic reproduction number, usually denoted as R0, defines the average number of secondary infections caused by an individual in an entirely susceptible population. This number indicates whether the infection will spread through the population or not. R0 depends on different characteristics of the virus. It plays a fundamental role in determining the course of the epidemic.

Incubation Period

The incubation period is the period between exposure to an infection and the appearance of the first symptoms. The current understanding of the incubation period for COVID-19 is limited.

Infectious Period

The infectious period is the time interval during which a host is infectious. The infectious period can start before, during, or after the onset of symptoms and it may stop before or after the symptoms stop showing.

Contact Rate

Contact rate is the probability of disease transmission per contact (dimensionless), multiplied by the number of contacts per unit time.

Social Distancing Factor

The social distancing parameter factors in the effect of lockdown/quarantine. 0 indicates everyone is locked down and quarantined, while 1 is equivalent to the case where there is no lockdown

The Practicality of Implementing SEIR Model

Epidemic models focus on the ideal scenario of fitting exponential curves as a simple way of trying to forecast the course of the epidemic.

Unfortunately, the real world is significantly more complex in a variety of ways.

The root of the limitations in applying SEIR models in practical scenarios like COVID-19 stems from the fact that it is based on a few unrealistic assumptions.

  1. Difficulty in estimating the percentage of exposed populations.
  2. There is limited support for many of the key epidemiologic features such as the incubation period, infectious period, and basic reproduction number.
  3. Asymptomatic and mildly infectious cases – The model assumes that there is a single type of infectious individual. In the real world, different immune systems respond differently to the virus, resulting in some people being completely asymptomatic or mildly infectious.
  4. Infection Rate – Transmission is extremely variable, dependent on all kinds of social behaviors, local environmental details, and political decisions. Because of this, modeling potential outcomes for coronavirus means trying out a lot of different transmission scenarios. Even those scenarios are not exact, they are more like a range of estimates.
  5. Contact Rate – Average rate of contact is not uniform, it differs from person to person.

Conclusion

To make the model, we must assemble all these different parameters, accounting for their uncertainty. It can really get messy, due to the novel nature of the virus – no one has had it before.

Our experiments with the SEIR model were like cooking a complicated dish with a multitude of ingredients and complex steps. It did not fare well for our purpose of understanding and explaining the pandemic scenario accurately. Hence, we investigated other models that could potentially give us more accurate results. Stay tuned for the second part of the blog to know what and how we figured it out in the end.

2 thoughts on “Forecasting the Pandemic from the Lens of Data Science”

  1. N D Hari Dass

    hi steni

    enjoyed your article on predicting the pandemic. i have followed all the models and i agree with your assessments. would love to chat with you more on this.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Share via
Copy link
Powered by Social Snap