Nov. 11 at 19:59

An international competition was held to develop a system that replaces handwritten characters "Kushikushi" that have been used in Japan for a long time with AI = artificial intelligence, and about 300 teams are more accurate. I tried to decipher.

`` Kuzushi '' was widely used from the Heian period to the beginning of the Meiji period, but it is precisely different from the current letter and the shape is written continuously, so it is accurate There are a limited number of people who can decipher.

For this reason, the “Humanities Open Data Joint Use Center” and others have developed a system that uses AI to instantly replace trash characters with the current characters, and then conducted an international competition to add new ideas and improve accuracy. It was.

In the competition, by developing a new method based on the center system, it competes for how accurately it can recognize the character written on thousands of images, domestic and overseas companies and researchers, etc. About 300 teams participated.

On the 11th, the top 10 teams in Tokyo were commended, and the winning Chinese team was able to correctly decode the scraps at 95%.

“We were able to gather a lot of ideas from people all over the world, including those who had never been interested in kanji characters,” said Kitamoto Asahi Center Director of the Humanities Open Data Joint Usage Center that hosted the convention. We can expect further research by sharing opinions. "

Deciphering hundreds of millions

According to the Humanities Open Data Joint Usage Center, it is said that there are still hundreds of millions of past materials written in trash characters in Japan, but the amount that people can read is limited. The problem is that a lot of undeciphered materials remain.

There are cases where the written contents cannot be read and the value is not understood and discarded, or the history and disaster records of unknown areas are left undeciphered.

Kitamoto Asahi Center Director said, “By reading records of disasters and reconstructions that were not known before, valuable information can be obtained. It should help you. "

Recognized as an image rather than a meaning

The decoding system developed by the Humanities Open Data Joint Usage Center allows AI to recognize where and what characters are in an old document as an image.

By comparing the characteristics of the recognized character shape with the data of about 1 million characters that have been learned in advance, it will be replaced with the current character.

Characters are not deciphered in order, but are discriminated only by the characteristics of the shape without considering the meaning of words or the connection before and after.

According to the center, a single-page comb character that can take at least 10 minutes by human hands can be decoded in seconds.

On the other hand, at this stage, AI is not able to correctly decode all characters and cannot determine the order in which the characters are read. Therefore, it is necessary to check with an expert to understand the meaning of the sentences accurately.

In this international competition, about 300 teams try to improve accuracy based on this system, and the methods of the top teams are published on the web.

The winners of the competition

In the competition, with overseas teams occupying the top, Kenji Doi from Fukuoka Prefecture, who developed alone, won the third place.

Doi is an IT company engineer who is involved in the construction of a system that judges whether a product on an online auction is genuine or fake from posted images.

Also, as a hobby, we operate a system that predicts the location of ramen images posted on SNS based on tens of thousands of image data, and cultivated through daily work and hobbies This means that the method has been applied to improve the accuracy of identifying the characteristics of sushi characters.

The system developed by Mr. Doi succeeded in decoding over 94% of the characters.

Mr. Doi said, “I knew the existence of kuzushi characters, but I really felt the significance of being able to read them on a machine. I thought it would be useful for learning about it, and I felt it had great social significance. "