(Fighting New Coronary Pneumonia) Big Data Traceability: The U.S. New Coronary "Patient Zero" has a high probability of appearing in April 2019

  China News Agency, Beijing, September 22 (Reporter Sun Zifa) The Chinese Academy of Sciences' pre-publishing platform for scientific and technological papers (ChinaXiv) published a traceability result based on a new big data analysis method on September 22, showing that the United States' new crown "patient zero" The high probability appears around September 2019. The earliest date of the first 50% infection in Rhode Island is April 26, 2019, which is much earlier than the official U.S. date of the first confirmed case in the United States, January 20, 2020. day.

  At present, tracing the source of the new coronavirus is a common challenge facing all mankind.

A series of studies have shown that the United States, Spain, France, Italy, Brazil and many other countries have been affected by the virus long before the outbreak in China.

In order to promote the traceability quickly and accurately, mathematicians began to try traceability methods based on big data analysis, and worked with biologists to find "patient zero".

  In the latest big data analysis work, researchers have established an optimized model based on published data and according to infectious disease models and statistical methods, and inferred the origin time of the epidemic in some states in the United States, Wuhan, Zhejiang and other places in China. .

The research paper proposes that combining mathematical models and artificial intelligence technology to carry out qualitative and quantitative analysis of infectious diseases can reveal the epidemiology of infectious diseases.

At present, there are many studies on epidemic prediction based on infectious disease models and data, but there are relatively few studies on the use of big data analysis to establish mathematical models to "reverse" epidemic changes.

  In the thesis, researchers mainly based on classic infectious disease models and statistical methods to establish a "model and data-driven epidemic spread model", and apply least square estimation and kernel density estimation methods to obtain model parameters.

They used the daily epidemic data released by the 12 northeastern states of the United States to find the parameters corresponding to the initial epidemic transmission model of the 12 northeastern states.

On this basis, infer the infection time of their first case, 50 cases and 100 cases and their corresponding probabilities.

  The calculation results show that for the 12 states in the northeastern United States, the first infection of the new crown epidemic has a high probability of occurrence around September 2019, and the earliest date of the first infection in Rhode Island with a probability of 50% is April 26, 2019. , The latest date for the first case of Delaware with a 50% probability of infection is November 30, 2019, which is earlier than the officially announced date of the first confirmed case in the United States on January 20, 2020.

  In addition, in order to verify this new method, the thesis research team also used the same model and public data from China to extrapolate the infection time of the first, 50 and 100 cases in Wuhan and Zhejiang, China.

The date of the first 50% infection in Wuhan City was December 20, 2019, and the date of the first 50% infection in Zhejiang Province was December 23, 2019.

Based on this inference, there is a high probability that the new crown epidemic in China will begin to spread in late December 2019. This conclusion is basically consistent with the epidemiological survey results, which proves that the calculation method is accurate and reliable.

  The research paper shows that if the detection data of the initial stage of the epidemic spread in other countries or regions is more accurate, this method can be used to infer the origin time of the epidemic, and the infection time of the first case and several cases can be calculated with a given probability.

(over)