Artificial Intelligence

Forecasting the Pandemic from the Lens of Data Science

Reading Time: 3 mins

The COVID-19 pandemic has had a tremendous impact on the world, from changing the way we live to drive extraordinary acts of human compassion. In addition to the medical and public health aspects, it has also had a profound economic and social impact. It’s because coronavirus is a novel virus, and we do not know enough about it. This means that our ability to respond to it has been colossally impacted.

Our data science teams at Gramener built a decision support system for an Indian state government to better forecast pandemic with data science initiatives and act accordingly. The Composite Forecast Model aids decision-makers with a 14-day forecast of daily counts, infrastructure, and logistic requirements.

This two-part blog describes our journey through this pandemic in understanding it better and to have a clearer picture of what to expect. In the first part, I will explain what models we tried, and the second blog will be about which methods worked.

Compartmental Models

Mathematical modeling in epidemiology has a long and rich history, dating back to the 1920s with the Kermack-McKendrick theory. The basic idea looks deceptively simple: divide the population into different compartments representing different stages of the disease and see how the numbers evolve over time.

SEIR Model

One of the widely used epidemic models is the classical SEIR Model. The SEIR model simulates the time-histories of an epidemic phenomenon. It models the mutual and dynamic interaction of people between four different groups, the Susceptible (S), Exposed (E), Infective (I), and Recovered (R).

  • Susceptible: The section of the population that could be potentially infected.
  • Exposed: The fraction of the population that has been infected, but does not show symptoms yet.
  • Infected: Those capable of transmitting the disease.
  • Recovered: Those who have become immune to the disease

A characteristic of this model is that the sum of the four categories is equal to the total population (N) at any point in time (t):

N=S(t)+E(t)+I(t)+R(t)

As evident, it does not consider natural births and deaths of the population during the time span of the disease.

Model Parameters

As it is an epidemiological model, it depends on a number of disease parameters as follows:

Reproduction Number

The basic reproduction number, usually denoted as R0, defines the average number of secondary infections caused by an individual in an entirely susceptible population. This number indicates whether the infection will spread through the population or not. R0 depends on different characteristics of the virus. It plays a fundamental role in determining the course of the epidemic.

Incubation Period

The incubation period is the period between exposure to an infection and the appearance of the first symptoms. The current understanding of the incubation period for COVID-19 is limited.

Infectious Period

The infectious period is the time interval during which a host is infectious. The infectious period can start before, during, or after the onset of symptoms and it may stop before or after the symptoms stop showing.

Contact Rate

Contact rate is the probability of disease transmission per contact (dimensionless), multiplied by the number of contacts per unit time.

Social Distancing Factor

The social distancing parameter factors in the effect of lockdown/quarantine. 0 indicates everyone is locked down and quarantined, while 1 is equivalent to the case where there is no lockdown

The Practicality of Implementing SEIR Model

Epidemic models focus on the ideal scenario of fitting exponential curves as a simple way of trying to forecast the course of the epidemic.

Unfortunately, the real world is significantly more complex in a variety of ways.

The root of the limitations in applying SEIR models in practical scenarios like COVID-19 stems from the fact that it is based on a few unrealistic assumptions.

  1. Difficulty in estimating the percentage of exposed populations.
  2. There is limited support for many of the key epidemiologic features such as the incubation period, infectious period, and basic reproduction number.
  3. Asymptomatic and mildly infectious cases – The model assumes that there is a single type of infectious individual. In the real world, different immune systems respond differently to the virus, resulting in some people being completely asymptomatic or mildly infectious.
  4. Infection Rate – Transmission is extremely variable, dependent on all kinds of social behaviors, local environmental details, and political decisions. Because of this, modeling potential outcomes for coronavirus means trying out a lot of different transmission scenarios. Even those scenarios are not exact, they are more like a range of estimates.
  5. Contact Rate – Average rate of contact is not uniform, it differs from person to person.

Conclusion

To make the model, we must assemble all these different parameters, accounting for their uncertainty. It can really get messy, due to the novel nature of the virus – no one has had it before.

Our experiments with the SEIR model were like cooking a complicated dish with a multitude of ingredients and complex steps. It did not fare well for our purpose of understanding and explaining the pandemic scenario accurately. Hence, we investigated other models that could potentially give us more accurate results. Stay tuned for the second part of the blog to know what and how we figured it out in the end.

Steni Sebastian

Steni is a data scientist at Gramener and loves to explore insights from complex data sets. She dives into numbers to discover and speak insightful stories with an interactive touch.

Leave a Comment

View Comments

  • hi steni

    enjoyed your article on predicting the pandemic. i have followed all the models and i agree with your assessments. would love to chat with you more on this.

Share
Published by
Steni Sebastian

Recent Posts

Generative AI in Pharma Regulation: Insights from FDA, EMA, and Health Canada

The U.S. Food and Drug Administration's (FDA) stance on GenAI is clear: it's a groundbreaking… Read More

7 days ago

AInonymize – AI for Secure Health Data and Innovation

Executive Summary In healthcare, protecting patient information is not just a legal requirement; it's a… Read More

1 week ago

How Demand Forecasting Turns Supply Chains into Mind Readers?

Demand forecasting in the supply chain is crucial for optimizing inventory levels and ensuring efficient… Read More

2 weeks ago

LLM Numerology: We Experimented with 3 LLMs to Find Out Their Favorite Numbers

Hi, I am ChatGPT 3.5 Turbo. Do you know what my favorite number is? Do… Read More

4 weeks ago

Data-Driven Sustainability: Achieve Business Value from ESG Data

After a successful webinar on digital transformation and sustainability, we organized a sequel titled “Data-Driven… Read More

4 weeks ago

Top 6 Most Popular Generative AI Use Cases to Watch in 2024

As the technology matures, Generative AI (GenAI) use cases for various industry verticals are becoming… Read More

1 month ago

This website uses cookies.