The conclusion of big data research on Tang poetry and Song poetry is very subversive

  Using big data analysis: Tang Dynasty poet Bai Juyi has the largest volume of works, but the most influential works are not Su Shi and Xin Qiji, but Zhou Bangyan.

  Using big data to analyze Tang and Song poems, the conclusion may be beyond your imagination-Bai Juyi, who ranks first among Tang Dynasty poets, ranks out of ten in terms of influence; Su Shi and Xin Qiji are not the ones with the most poems included in Song Poems. It is Zhou Bangyan; the comprehensive influence index shows that Du Fu is stronger than Li Bai, Xin Qiji is stronger than Su Shi...

  The above new findings were analyzed by Wang Zhaopeng, chief expert of the National Social Science Fund's major project "Construction of Information Platform for the Chronicle of Tang and Song Literature" and chair professor of the School of Literature and Journalism of Sichuan University.

  Tang poetry is the first peak in the history of Chinese poetry.

There are more than 50,000 poems in the Tang Dynasty and more than 3,000 poets, and the poets and poems have reached an unprecedented level.

There were nearly 1,500 lyricists in the Song Dynasty, with over 21,000 poems written.

  From the perspective of individual poets, who has the most works in Tang poetry and Song poetry?

According to Wang Zhaopeng's big data, Bai Juyi topped the list of Tang poetry works, with nearly 3,000 poems; Du Fu and Li Bai followed closely with more than 1,000 poems.

In Song Ci, Xin Qiji's ci composition ranks first, with more than 600 syllables, followed by Su Shi and Liu Chenweng.

The number of Song poems is dominated by Lu You, with more than 9,000 poems, followed by Liu Kezhuang and Yang Wanli.

  According to the ranking of the comprehensive influence index, the most influential poet of the Tang Dynasty was Du Fu, followed by Li Bai and Wang Wei, while Bai Juyi, the most influential poet, ranked outside the tenth.

Xin Qiji ranks first in the volume and influence of Song Dynasty poets, with Su Shi and Zhou Bangyan ranking second and third respectively.

Su Shi tops the Song Poetry Influence List, followed by Lu You, who has the largest volume of works.

  Referring to the famous masters of Tang poetry and Song poetry, people often call them "Li Du" and "Su Xin". It seems that Li is better than Du and Su is better than Xin.

However, the comprehensive influence index shows that Du Fu is higher than Li Bai and Xin Qiji is stronger than Su Shi.

What is even more surprising is that the most sought after poets are not Su and Xin but Zhou Bangyan.

Among the 100 and 300 Song Poems, Zhou Bangyan accounted for 15 and 40 respectively, and the share was much higher than that of Su and Xin.

  Is it scientific and feasible to use objective data to measure and analyze the rather subjective appreciation of poetry?

In an exclusive interview with a reporter from Beijing Youth Daily, Wang Zhaopeng emphasized that although the data can describe the development and progress of literary history to a certain extent, it also has obvious limitations.

  Research started 30 years ago

  Accumulated millions of pieces of data

  Q: What was the original intention of the project "The World of Tang and Song Poetry in Big Data"?

  A: I have been doing quantitative analysis of Tang and Song poetry since 1992.

The original intention is that everyone has their own famous poems in Tang and Song Dynasties.

Exactly which poems of Tang and Song Dynasties are regarded as famous in history, I want to use statistical data to analyze and measure.

  Q: So how did you use big data to measure the quality of Tang and Song poetry?

How are these figures calculated?

  A: The quality of Tang and Song poetry works has not yet been found to be effective data to evaluate and measure.

I'm currently trying to build an indicator system for evaluating the quality of literary works in order to collect data.

This requires a relatively long process.

In addition, the evaluation index system established by individuals requires the recognition and consensus of the academic community.

  Q: Regarding the literature index system, what is the current research status of the academic circle?

  A: Literary data in the era of big data needs to be classified and layered to establish an indicator system for literary history data to ensure the reliability and validity of the data.

However, there are not many scholars who use big data to study Tang and Song poetry, and the big data of Tang and Song poetry shared by the academic community is also quite limited.

  From 1992 to the present, although I have accumulated more than one million pieces of data related to Tang poetry and Song poetry, it is still incomplete and uneven.

Some time periods have more data, and some time periods have less data; some have more data of this type and less data of that type; some poets have more data, and some poets have less data.

We often sigh with emotion that "the book will be less when it's used up", and the data is even more so.

When analyzing Tang poetry and Song poetry in an all-round way, it is often felt that the data is not enough.

  In my opinion, the literary evaluation index system should be established with works as the center.

The influence of a writer is premised on the influence of the work.

The evaluation of works can be divided into two dimensions, one is the internal literary value of relatively stable works, and the other is the external influence of dynamic works.

Its literary value can be considered from two aspects of content and form to evaluate.

  The influence of works is measured from three levels: creators, critics, and ordinary readers.

The first is the influence on the creator, including citation, transformation, imitation, adaptation, translation, etc., which reflects the model and attractiveness of the work; The degree of reputation and attention at the academic research level; the third is the degree of circulation and awareness among ordinary readers.

After determining the value of the work, the basic elements and structure of its impact, a calculation model is constructed, and then the computer runs on the relevant resource base, corpus and network, mines and extracts relevant data, and finally calculates the score of each work.

  Data cannot measure artistic content

  and aesthetic value

  Question: You mentioned in your topic that according to statistics, from the Eastern Han Dynasty to the end of the Sui Dynasty, there were only more than 5,000 poems in total, but in the Tang Dynasty, the number of poems exceeded 10,000 for the first time and directly jumped to more than 50,000.

Tang poetry has increased by more than seven times compared with the previous eight generations of poetry, and the number of poets has increased from more than 600 to more than 3,000, and both poets and poems have reached an unprecedented level.

Where does this data come from, and are there any important literature references?

  A: The data comes from two papers written by my old friend Professor Shang Yongliang: "Quantitative Analysis of the Distribution and Development Trend of Poetry in the Eight Dynasties" and "Quantitative Analysis of the Hierarchical Distribution and Development of Generation Groups of Well-known Poets in Tang Dynasty".

  Q: Bai Juyi's poems are the most numerous, but their influence is outside the top ten. How is this determined?

  Answer: It is determined by the data.

We used a variety of data to rank the influence of Tang Dynasty poets.

Bai Juyi's influence is greater in modern times than in ancient times.

His comprehensive influence is far less than that of Li Bai and Du Fu.

  Q: What is your basis for judging the quality of Tang poetry and Song poetry through big data?

  A: At present, only big data can be used to measure the influence of Tang poetry and Song poetry—including the attractiveness to the creation of later generations of poets, the reputation among later generations of lyric critics, and the popularity among ordinary writers, etc.

At present, it is not possible to use data to measure the artistic content and aesthetic value of Tang poetry and Song poetry.

  The literary center in the early Northern Song Dynasty

  move completely to the south

  Q: Have you encountered any academic difficulties in using big data to study Tang and Song poems, and how did you overcome them?

  A: Literary research has never been data-conscious. The difficulty lies not only in where to find data, but also in what kind of data to look for.

What kind of data is useful and effective requires both theoretical support and practical testing.

Theoretically, we continue to seek the enlightenment of theories and methods from statistics, quantitative informatics and quantitative history; in practice, we try again and again after failure.

The most painful thing is that, after the database has been established and the article has been written, it is suddenly found that the source of the data is not complete, so I have to fill in the data from scratch, and then rewrite the finished paper.

  Q: What other new discoveries have you made in your specific research on big data?

  A: The meaning of data can not only confirm traditional conclusions, but also revise traditional conclusions, discover new problems, and change traditional cognition.

For example, there is a famous conclusion in Chinese cultural geography that the Chinese cultural center is gradually moving from the northern Central Plains to the south. It was the Jingkang Rebellion in the Song Dynasty.

The three wars pushed the cultural center to the south. After the Jingkang Rebellion, the cultural center was completely moved to the south.

Our big data found that the literary center was completely moved to the south at the beginning of the Northern Song Dynasty, and the number of authors in the south surpassed that in the north, and there was no need to wait until the Jingkang Rebellion.

Moreover, the war was not the only factor driving the southward movement of the cultural center.

  We also found that the literary center of the Song Dynasty gradually moved to the southeast coast.

According to the statistics of today's prefecture-level administrative divisions, Nanping, Fujian Province in the Song Dynasty had the largest number of authors, ranking first, and Fuzhou ranked second, which is surprising.

Related to this, the number of Jinshi in the Song Dynasty ranked first in Fuzhou and second in Nanping.

It can be seen that the education in Nanping and Fuzhou was developed at that time, there were many scholars, and there were many poets.

Education and literature are highly reciprocal.

  In addition, we also found that the peak of Su Dongpo's poetry creation was in Huangzhou. One third of his poems were written during his exile in Huangzhou, and half of his famous works were written in Huangzhou.

For example, the first piece of Song Ci "Nian Nujiao, Chibi Nostalgia" was written in Huangzhou.

Huangzhou achieved the glory of Su Shi's poetry.

  Text/Coordinator by reporter Zhang Enjie/Liu Jianghua