"Kurushi character" AI is decoded. Ramen discrimination method is also applied! December 2, 19:21

"Kushizushi" was widely used in Japan from the Heian period to the beginning of the Meiji period. There are characters that are very different in shape from the current characters, and there are many cases where multiple characters are written in succession, and it is necessary for researchers to read them carefully one by one.

Now, AI is about to enter this “craftsmanship”. Instead of reading, it uses image recognition to instantly replace scrap characters with modern characters. In the fall, an international competition for system development was held, and about 300 teams competed for high accuracy.

(Ryaku Tomita, Faculty of Science and Culture)

Decoding "Kusushi" is better for "science" than "literacy"! ?

Decoding of kanji characters is an area of ​​“literary” specialists such as literature and history, but AI development is a specialty of “science” researchers who are involved in everyday programming and system development.

One of them visited Kenji Doi from Fukuoka Prefecture who participated in the international competition. Doi is an IT company engineer who is involved in the construction of a system that judges whether a product exhibited in an online auction is genuine or fake from the posted image.

The kanji characters were “just enough to know their existence” and cannot be read by themselves. When I asked why Doi participated in this competition, I found out that there was an unexpected opportunity.

He thought that he could apply “a system that hits a store based on ramen images”.

surprise! Application of ramen discrimination technique

Doi operates a system that AI predicts based on tens of thousands of image data where ramen images posted to SNS were posted, and is popular among fans. It has become.

The AI ​​accurately predicts the store name by capturing each feature from limited information such as the arrangement, soup color, and table material in the image of the seemingly similar chain store ramen.

Doi thought that it was possible to distinguish trash characters in the same way, and learned AI for three months. As a result, in the competition, they succeeded in deciphering over 94% of the crumbs that were presented and became the top third in the Japanese.

(Kenji Doi)
“I really felt the significance of being able to read the kanji characters on a machine. If there were many unread documents, I felt that the social significance would be great if the decoding progressed further.”

What are the points to improve AI decoding ability?

This international competition, “Humanities Open Data Joint Usage Center” was held for the first time.

The Center has developed a new system that uses AI to instantly replace scrap characters with the current characters, but the aim is to further improve the accuracy by adding new ideas.

The system was something I couldn't understand at first, but after listening to the person in charge over and over again, I realized that the following two features were characteristic.

(1) "Image recognition" instead of "decoding"

First, AI recognizes where and what characters are in an old document as an image.

Then, we will replace the character characteristics of the recognized character with the current character by comparing it with the data of about 1 million characters written in advance.

(2) We do not read sequentially from the beginning

Until now, it was common to read characters in order from the beginning, but there were cases in which decoding was stopped halfway without being able to determine the breaks in the characters.

Therefore, the new system does not interpret the characters in order, and distinguishes them based only on the shape characteristics without considering the meaning of words or the connection before and after.

According to the center, a one-page comb character that takes at least 10 minutes by human hands can be decoded in just a few seconds.

Hundreds of millions of difficult-to-read materials

On November 11, the center and others held a symposium titled “The time has come for AI to read crap”.

In addition to the international competition award ceremony, experts who tried to decipher kanji characters using various approaches stood on the stage and introduced the results of data collection using digital technology and the results of citizens' deciphering decipherment projects.

What I felt at the venue was the expectation that the use of AI would help improve the decoding of trash characters. One of the experts who gave the lecture said, “AI should play the role of a bicycle support wheel that helps citizens to read scraps instead of reading scraps.”

The background of this is the fact that the decoding of crap characters has not progressed easily.

Depending on the era and writer, there are many different forms of the same letter, but there are many different forms, and multiple letters are written in succession, so reading requires “craftsmanship”.

According to the center, it is estimated that thousands of people can read the kanji characters accurately throughout the country. On the other hand, historical materials written in kanji characters are said to be left on a scale of several hundred million points, and their contents cannot be read and their value is not known and discarded. It is the current situation that some records are left undeciphered.

(Director, Asahi Kitamoto, Humanities Open Data Joint Usage Center)
“There should still be a lot of valuable information in disaster and recovery records that were not known before, but you can't understand without reading it. AI will help you find that information and interest in these areas. I expect it will be an opportunity to increase the number of people with it. ''

Voices of expectations from researchers of historical materials

Researchers in the field who read and understand the Kanji characters are also expected to use AI to decipher historical materials.

Associate Professor Daito Sato of Tohoku University. As an NPO secretary general working on the restoration and preservation of historical materials, he is also responsible for the rescue of old documents damaged by Typhoon No. 19.

Mr. Sato emphasizes that old documents are packed with hints that can be used in the present. For example, disaster descriptions can be used for disaster prevention and mitigation.

To that end, we must increase the speed of reading and interpreting old documents.

Associate Professor Sato said that even if the use of AI spreads, verification and confirmation by experts are indispensable. "By using AI, a basic environment for sharing past information with everyone. I wish I could proceed with making. "

With the further use of AI, we may be able to discover unknown information buried in old historical materials. Such a future is steadily approaching.

Science and Culture Department reporter Ryo Tomita joined in 2013. After working at the Kanazawa Bureau, he worked at the Nagasaki Bureau in 2016, covering issues related to wars and issues related to cultural assets, centering on the atomic bomb. In charge of literary arts and academics at the Faculty of Science and Culture from summer