In April 2023, a "strange" hot search attracted the attention of many netizens - #Because the surname is too rare, the whole village collectively changed the surname to duck#. The surname "nia" in Yongsheng County, Yunnan Province caused an exclamation: "So there is still this pronunciation of the word? "What a magical, what a romantic surname".

This surname represents the clan cultural tradition of the Lisu clan, but because it is too remote and causes various inconveniences to users, it is a helpless move to change it to "duck". This kind of thing is by no means an isolated case: "(龙天)", "𬱖(By Page)", "韡(Weihua)", "𬍛(Wang Le)"... These are not common words, or the family name is passed on, or it expresses the good wishes of parents. Their users, however, are not accepted by information systems because of unusual words, and they are "unable to move an inch" in the information world.

Question mark in the name - the trouble of reality

There is a rare word for the surname and first name, how many troubles will you encounter in life?

"I couldn't type the third word on the admission ticket for the college entrance examination, and all my certificates, certificates and files, and the third word of my name were missing. The school registration system cannot type my full name, and on my child's school status information, there is a substitute symbol in the mother's name column. ”

"My bank cards used to be handled in pinyin and various symbols. One year, real-name bank cards were required to apply for tax refunds, and real-name mobile phone cards were required for real-name bank cards, and real-name mobile phone cards could not be applied for in several companies, and the cycle was endless. ”

"The font libraries used in the network registration systems of major hospitals are different, some hospitals can be certified, some cannot, and finally can only go to the window to register. It will also be impossible to do the inspection because the real-name authentication is not passed. ”

In a "name exchange group", friends opened the chatterbox and couldn't wait to talk.

A group friend with "𬸣" in her name recounted her "grudge entanglement" with a rare name. The family chose this character from Li Bai's "Autumn Night in Anfu to Send Mengzanfu Brothers Back to the Capital" "Hong𬸣 Fengli, not following the usual flow" to name her, hoping that she will pursue excellence and spread her wings like a bird.

The ensuing troubles are numerous: the company's payroll system does not recognize this word and cannot pay wages; WeChat often triggers "need to upload ID documents", because the ID name cannot be recognized, and repeated manual authentication is required. The CPF problem is the most troublesome, "I have worked in 3 companies, and each company has a different name when reporting the CPF, so there are multiple CPF accounts. I contacted several HRs, contacted the CPF office counter countless times, and finally merged into one account. It can be deposited at the moment, but it is uncertain whether it can be withdrawn. ”

When real-name authentication and online services provide more convenience for people, "names cannot be entered, recognized and displayed" have become obstacles. Users of rare characters cannot enjoy the benefits of informatization, but are constrained everywhere in their lives: sometimes they spend longer and take "artificial channels" to solve problems, and sometimes they are completely cut off from the possibility of using certain services.

Rare words – puzzles with solutions

In August 2022, the state issued the mandatory national standard "Information Technology Chinese Coding Character Set" (GB8-18030), which has been officially implemented in August this year. The standard includes 2022,8 Chinese characters, covering most of the rare characters used in personal names and place names in China, as well as words used in professional fields such as literature and science and technology. The implementation of this mandatory national standard can solve the most urgent problem of rare words.

GB18030-2022 is a Chinese character encoding standard that adds 1,7 Chinese characters to the previous version, which determines a unique code for each Chinese character, which is regarded as "hukou" for them.

Liu Huidan, senior engineer of the Spatiotemporal Data Management and Data Science Research Center of the Institute of Software, Chinese Academy of Sciences, said that the principle of Chinese character informatization is roughly as follows: the user selects the Chinese character in the input method, and the operating system finds its glyph in the font library according to the Chinese character encoding and "draws" it on the display screen. Unfamiliar characters that were not used normally before may be because there was no place in the coding system at that time, and they could not be input, stored and output.

Beijing Peking University Founder Electronics Co., Ltd. is one of the units involved in the drafting of GB18030-2022. According to Zhang Jianguo, general manager of Fangzheng Character Library, part of the Chinese characters supplemented by the "General Specification Chinese Character List" is to solve the problem of rare characters in personal names. "'𬱖', there is a beautiful meaning, some parents will give their children this name, and girls more '𬎆', and '𮧵', these words have been expanded in the new 2022 version of the standard."

Announcement on approving the issuance of two mandatory national standards, including the "Information Technology Chinese Coding Character Set", source: National Standard Information Public Service Platform

GB18030-2022 is a mandatory national standard. According to the relevant provisions of the Standardization Law of the People's Republic of China, mandatory standards must be implemented; All technical products with Chinese information processing and exchange functions produced, sold, imported and provided in China shall meet the requirements of the standard.

The standard also states that "products used for government services and public services shall meet the requirements of level 3", that is, products in relevant industries must support all Chinese characters specified in the document. The implementation of standard requirements in accordance with regulations can solve the problems of people with rare names and characters, so that technological achievements can truly benefit the people.

Accumulated shortcomings - complex status quo

The problem should have been solved, but the reality is not.

Liu Huidan has long been concerned about the problem of rare characters, he is the initiator of the "name rare character processing platform" and the leader of the "name rare character exchange group", there are currently a total of 6700 people in the two large groups.

When he first learned about the difficulties of the rare character group, Liu Huidan was very surprised: "Technically, we have done the work of processing information in Chinese characters and minority languages for many years, and we did not expect that there were people who encountered difficulties in daily life because of rare characters. ”

This reflects the real status quo: the difficulty of the problem is not the technology itself, but the application and promotion. According to Wen Chen, an old member of the exchange group, the core of solving the problem is the ID card and population information management system. When these two and other service systems adopt mandatory national standard Chinese character coding, the interconnection of rare characters in different fields can be realized.

Before the new standard was published, rare characters in people's names were stored in the system in the form of non-standard encodings, known as "PUA codes". In the past, the use of PUA encoded words was a temporary strategy to store rare words; It is different from the Chinese character encoding system stipulated by the national mandatory standards, which cannot be typed with the current input method, and will be displayed as spaces, asterisks or question marks in the new system.

PUA codes, which should have been withdrawn from the historical stage, are still widespread. Without a special name change procedure at the police station and a replacement of the identity card, it is not possible to change the PUA code to the official code of the new standard. The formal coding words typed during real-name authentication naturally cannot be "verified" with PUA coding words.

"Sometimes we know the problem of PUA code and want to take the initiative to replace it, but we still can't replace it," Wen Chen introduced the experience of some members of the exchange group. Sometimes grassroots police stations are not upgraded to fontbanks that meet national standards, and official coded characters cannot be displayed in grassroots windows. Sometimes the problem arises in the higher-level certification center, "the font library has not been upgraded, the data of the official code cannot be displayed in the certification center, and it is always impossible to write a box when the certificate cannot be made." So some 'well-wishers' manually changed it to a PUA code and made the ID card. Speaking of this, he couldn't help but laugh helplessly.

Other service systems also have coding irregularities. Taking the website of the "National Professional and Technical Personnel Qualification Examination Registration Service Platform" as an example: on the registration page and the examination information filling system, the "Rare Character Input" button is set in the column of "Candidate's Name". This shows the care for users of rare characters and shows a strong sense of service. But the rare words generated by the website have hidden secrets.

The rare character input interface of the national professional and technical personnel qualification examination registration service platform

Copy the "" generated by "rare word input" on the platform to the computer document, and display it as blank; Converting the encoding of "space" through shortcut keys is also inconsistent with the official encoding. Can the test information and certificate information generated by it be "mutually recognized" with the real-name system that uses official codes? It is not yet known.

Many service systems still use the 1995 "GBK" code, containing only 21003,<> Chinese characters, which is no longer enough to meet the current needs. Some systems use various PUA codes to "stitch and patch" to meet the needs of users of rare characters on the surface. In fact, behind the same glyph, different Chinese character encodings cannot be mutually recognized, which will inevitably lead to verification failure.

In real life, the coding combination of "GBK+PUA" has been used for too long and too widely, and there is still a long way to go to eradicate the "sequelae" of non-standard character encoding.

Standardization of Chinese character coding – a common expectation

The members of the exchange group reached a consensus in actual experience: the financial industry is currently the industry that best handles the problem of rare words.

As early as June 2022, 6, People's Bank of China published the "Financial Services Rare Word Processing Guide". According to the guidelines, the input support range of rare characters can meet the current needs for the use of rare characters in personal names and place names, and pay special attention to the processing of PUA-encoded Chinese characters.

A person familiar with the matter said: "China Merchants Bank, China CITIC Bank, and Industrial and Commercial Bank of China have basically improved, and some banks are still in progress." Previously, some banks received rectification letters from the People's Bank of China Science and Technology Department because of the problem of rare word support, and the rectification was very fast. ”

There are several technicians from different banks in the exchange group who will help solve specific problems. "Even if the top design is done, the endings may not be conducted. It is a distant process from the head office to the grassroots level, and the teachers of the bank help in the group, flattening it and directly reaching the front-line users," Wen Chen said. When the core system is updated, the grassroots service window only needs to download and upgrade on site, and the previous difficulties are solved.

Since the issuance of the mandatory national standard GB18030-2022, many companies have responded positively and upgraded accordingly according to the requirements.

According to the "Guidelines for the Processing of Rare Characters in Information Technology" issued by the Character Set and Coding Subcommittee of the National Beacon Committee, the manufacturers that currently provide commercial font libraries for rare characters include China Electronics Standardization Institute, Founder Electronics, Hanyi, Zhuoma Zhiyu, etc. Input methods that support rare characters include Tencent Sogou Input Method, Zhuoma Input Method, Founder Super Large Font Library Input Method, Baidu Input Method, etc.

Beijing Peking University Founder Electronics Co., Ltd. has upgraded 28 glyphs, while "Founder Population Information Font Database Software and Its Input Method Software" has been upgraded to include all Chinese characters in the standard according to regulations.

Zhang Jianguo introduced, "Founder has been tracking and solving the problem of rare characters, and we have been involved in the formulation of national standards, so as soon as the standards are upgraded, we will upgrade the plan as soon as possible." ”

He also paid attention to the results of the financial industry in dealing with rare words, which led him to see a way out of the problem: "We are optimistic. As long as everyone pays attention to this matter, especially the management department, if the People's Bank of China issues a document to demand, the speed of progress will be accelerated. ”

Liu Huidan followed up on the latest situation on the problem of rare characters. As a member of the GB18030-2022 No. 1 Amendment List Working Group, he introduced that the new Chinese characters added to the international standard and the "supplementary Chinese characters to the special font database for public security population information" will be added to the mandatory national standards as part of the amendment list.

On the road to overcoming the difficulties of rare characters, people of different identities and different industries are doing their part.

For users of rare characters such as Wenchen, the biggest demand is "information accessibility". When the problem of rare characters is systematically solved, they can handle various affairs in their own names without hindrance, without worrying about being turned away from a certain service, and without spending extra time and energy on "special affairs".

For industry practitioners such as Liu Huidan and Zhang Jianguo, solving the problem of rare characters also involves a special responsibility. Chinese characters are an important carrier of Chinese culture, and the standardized inclusion of Chinese characters also means the "recovery" of Chinese traditional culture. With the improvement of the level of informatization of Chinese characters, the written content in ancient Chinese books, the historical information retained in place names and personal names can be better inherited and protected.

Author: Yu Yi