Spring 2020 brought with it the arrival of the celebrity statistical model. As the public tried to gauge how big a deal the coronavirus might be in March and April, it was pointed again and again to two forecasting systems: one built by Imperial College London, the other by the Institute for Health Metrics and Evaluation, or IHME, based in Seattle.
But the models yielded wildly divergent predictions. Imperial warned that the U.S. might see as many as 2 million Covid-19 deaths by the summer, while the IHME forecast was far more conservative, predicting about 60,000 deaths by August. Neither, it turned out, was very close. The U.S. ultimately reached about 160,000 deaths by the start of August.The huge discrepancy in the forecasting figures that spring caught the attention of a then 26-year-old data scientist named Youyang Gu. The young man had a master's degree in electrical engineering and computer science from the Massachusetts Institute of Technology and another degree in mathematics, but no formal training in a pandemic-related area such as medicine or epidemiology. Still, he thought his background dealing with data models could prove useful during the pandemic.
In mid-April, while he was living with his parents in Santa Clara, Calif., Gu spent a week building his own Covid death predictor and a website to display the morbid information. Before long, his model started producing more accurate results than those cooked up by institutions with hundreds of millions of dollars in funding and decades of experience.
"His model was the only one that seemed sane," says Jeremy Howard, a renowned data expert and research scientist at the University of San Francisco. "The other models were shown to be nonsense time and again, and yet there was no introspection from the people publishing the forecasts or the journalists reporting on them. Peoples' lives were depending on these things, and Youyang was the one person actually looking at the data and doing it properly."
The forecasting model that Gu built was, in some ways, simple. He had first considered examining the relationship among Covid tests, hospitalizations, and other factors but found that such data was being reported inconsistently by states and the federal government. The most reliable figures appeared to be the daily death counts. "Other models used more data sources, but I decided to rely on past deaths to predict future deaths," Gu says. "Having that as the only input helped filter the signal from the noise."
The novel, sophisticated twist of Gu's model came from his use of machine learning algorithms to hone his figures. After MIT, Gu spent a couple years working in the financial industry writing algorithms for high-frequency trading systems in which his forecasts had to be accurate if he wanted to keep his job. When it came to Covid, Gu kept comparing his predictions to the eventual reported death totals and constantly tuned his machine learning software so that it would lead to ever more precise prognostications. Even though the work required the same hours as a demanding full-time job, Gu volunteered his time and lived off his savings. He wanted his data to be seen as free of any conflicts of interest or political bias.
While certainly not perfect, Gu's model performed well from the outset. In late April he predicted the U.S. would see 80,000 deaths by May 9. The actual death toll was 79,926. A similar late-April forecast from IHME predicted that the U.S. would not surpass 80,000 deaths through all of 2020. Gu also predicted 90,000 deaths on May 18 and 100,000 deaths on May 27, and once again got the numbers right. Where IHME expected the virus to fade away as a result of social distancing and other policies, Gu predicted there would be a second, large wave of infections and deaths as many states reopened from lockdowns.
IHME faced some criticism in March and April, when its numbers didn't match what was happening. Still, the prestigious center, based at the University of Washington and bolstered by more than $500 million in funding from the Bill & Melinda Gates Foundation, was cited on an almost daily basis during briefings by members of President Donald Trump's Administration. In April, U.S. infectious-disease chief Anthony Fauci told an interviewer that Covid's death toll "looks more like 60,000 than the 100,000 to 200,000" once expected~CHECK~a prediction that reflected IHME forecasts. And on April 19, the same day Gu cautioned about a second wave, Trump pointed to IHME's 60,000-death forecast as an indicator that the fight against the virus would soon be over.
IHME officials also actively promoted their numbers. "You had the IHME on all these news shows trying to tell people that deaths would go to zero by July," Gu says. "Anyone with common sense could see we would be at 1,000 to 1,500 daily deaths for a while. I thought it was very disingenuous for them to do that."
Christopher Murray, the director of IHME, says that once the organization got a better handle on the virus after April, its forecasts radically improved.
But that spring, week by week, more people started to pay attention to Gu's work. He flagged his model to reporters on Twitter and e-mailed epidemiologists, asking them to check his numbers. Toward the end of April, the prominent University of Washington biologist Carl Bergstrom tweeted about Gu's model, and not long after that the US Centers for Disease Control and Prevention included Gu's numbers on its Covid forecasting website. As the pandemic progressed, Gu, a Chinese immigrant who grew up in Illinois and California, found himself taking part in regular meetings with the CDC and teams of professional modelers and epidemiologists, as everyone tried to improve their forecasts.
Traffic to Gu's website exploded, with millions of people checking in daily to see what was happening in their states and the US overall. More often than not, his predicted figures ended up hugging the line of actual death figures when they arrived a few weeks later.