“Pandemic development can and should be predicted”: FEFU student Maxim Shulga on COVID-19 Data Challenge

Maxim Shulga, a student at the Far Eastern Federal University, entered the top ten at the international online competition to build a predictive model of coronavirus spread factors in different countries - COVID-19 Data Challenge. The intellectual competition was attended by about 560 young professionals from around the world. In an interview with RT, Shulga explained the essence of his work and spoke about the practical benefits of building a model.

20-year-old Maxim Shulga is a third year student at the FEFU School of Natural Sciences. His specialization is computer security. As a child, a young man became interested in programming, and after entering a university he began to actively study machine learning methods, including neural networks. Participating in the competition, Maxim compiled one of the most accurate analytical scenarios on the dynamics of the incidence of COVID-19 in the world. The studies were based on data collected by Johns Hopkins University.

- Why did you decide to take part in this competition? Did you find out yourself, or was it a teacher’s proposal?

- I learned about the competition from the director of the FEFU School of Digital Economics. I decided to take part in it for several reasons. Firstly, the topic with coronovirus is very important at the moment. Secondly, to solve the problem, it was necessary to build a mathematical model using machine learning methods, that is, put into practice the knowledge that I received at the university.

- What was the challenge for the contestants?

- It was about forecasting numerical data: how many sick and dead will be in different countries, as well as in a particular region of Russia - a forecast of the dynamics of the disease in the near future.

Maxim Shulga. Photos from the personal archive

- What is the essence of your work?

- To write the model that I used at the competition, I decided to use the TensorFlow library in the Python programming language. This library is designed specifically for writing neural networks. I used a recurrent neural network in my work. It is used to work with sequences and is mainly used for tasks related to text analysis: determining the subject of a text, generating new texts, highlighting headings from some arbitrary text. Since it works with sequences and well remembers previous values, it can also be used to solve this problem.

Hopkins University has collected data on the number of sick and dead over several months of the development of the COVID-19 pandemic. This data can be divided into segments, for example, weekly. And based on the past week, predict the number of cases for the next few days.

Due to the fact that the data could be represented in the form of such sequences, it was possible to train the model. In the end, I loaded into the model data on the number of cases for the last week and received a prognosis for the incidence of the next day. I added this day to all the data and thus got the forecast further, for the next days.

- Your research is based on information from Johns Hopkins University. Why did you take these data for analysis?

- The organizers provided them as the main source of data, and one of the conditions of the competition was the use of data published in the official channel of the competition.

- Who won the competition?

- When summarizing the forecast on the competition website, a list of the accuracy of the forecast of all participants from best to worst was displayed. This list included the names of the profiles of participants, not the real name and surname. Therefore, I cannot know who is the winner.

- There is an opinion that the epidemic began much earlier, back in November. Is there any guidance on this version in your research?

- The problem that I solved during the competition did not intersect with this question, so I can not answer.

But I believe that the development of a pandemic can and should be predicted. We cannot know where, when and how a virus mutates here, but we can be prepared for how to cope with it.

- How do you personally, as a researcher, predict the spread of the virus?

- Mathematical forecasting models do poorly with long-term forecasts.

Since the daily increase in patients depends on so many factors, a large number of accidents arise that make drastic changes in the dynamics of the disease.

For this reason, mathematical models are more suitable for forecasting for short periods of time, for example, for a week, as in this competition.

- What practical benefits do you see from these studies?

- The practical benefit of such studies is to draw the attention of researchers to the really important problem of the worldwide distribution of COVID-19. Due to the fact that a large number of participants are involved in solving the problem, the chance increases to develop the most accurate forecasting model.

- What are you currently doing?

- I'm getting ready for the next contest. It will be held in six months and will be devoted to the analysis of data on the exam. More precisely, an essay on the written part of the unified state exam in the English language. For this work you need to collect a large amount of data.

“Pandemic development can and should be predicted”: FEFU student Maxim Shulga on COVID-19 Data Challenge

You may like

Trends 24h

Latest