Young people who teach AI to recognize sign language

They use the power of technology to try to make more deaf people "heard" and "understand"

  Our reporter Lei Kun, Li Ting, and Sister Liang

  In May of this year, at the World Intelligence Conference site, a "sign language corner" attracted many people to stop.

As long as you type sign language in front of the camera, a semantically coherent text translation will appear on the display behind you.

This set of "Chinese Sign Language Real-time Translation System in Complex Scenarios" (hereinafter referred to as "Sign Language Real-time Translation System") jointly developed by the School of Deaf and Artificial Intelligence and the School of Computer Science and Technology of Tianjin University of Technology has covered education, legal consultation, catering, and transportation. In the application scenario, the recognition rate can reach up to 95% in a room with sufficient and stable light, and it can achieve "second turn" in some scenes.

  Wang Jianyuan is a member of the R&D team.

He is a deaf child who grew up in a deaf family. He has severe hearing loss and very difficult to speak and speak. Sign language is the first and most effective way for him to interact with the world.

Growing up to 22 years old, his status as a deaf person never made him feel inferior; he could not speak, and it did not delay him from growing into a proper "student bull"; but if you ask him, because of hearing impairment, he has encountered it since childhood. After experiencing any inconveniences, he will calmly type four words on the phone: all aspects-because there are too few people who know sign language.

  One of the most typical examples is seeing a doctor. If there is no sign language interpreter to accompany you, even if it’s just a headache and brain fever, it is difficult for a “sign language group” like Wang Jianyuan to seek medical treatment alone-most doctors do not know sign language, and you cannot ask a patient to be in the emergency department. In the room, use handwriting or typing to quickly describe his symptoms one by one.

  Yuan Tiantian is Wang Jianyuan's teacher and the leader of the "Sign Language Real-time Translation System" project team.

After graduating from postgraduate in 2006, she worked in the Computer Department of the College of Deaf Artificial Intelligence.

In 15 years, Yuan Tiantian can't remember how many times she received a call from the college at one or two o'clock in the morning, asking her to accompany students to see a doctor and help with sign language interpretation.

Nowadays, she is the deputy dean of the college, and it is still the responsibility of her and all the teachers of the college to take the hearing impaired students to the hospital.

  She is not afraid of hard work, but she is afraid that her sign language level will delay things.

Yuan Tiantian is a hearing person (abbreviation for a person with sound hearing). Sign language was taught by herself after she took up a teaching position. She often laughed at herself and said that she had limited talent in language. , Not proficient in) sign language translation. As soon as the situation described by the student is complicated, or the speed of using sign language is too fast, I can only understand part of it."

  Hearing-impaired college students are self-reliant and self-reliant, trying every means to overcome physical inconvenience. Special education teachers are dedicated to their duties and practice sign language selflessly for the cause of the disabled. If you want to tell an inspirational story, such a plot is enough.

But Yuan Tiantian and Wang Jianyuan obviously want to tell a "sci-fi story": they want to teach computers to learn sign language, so that machines can replace people and become on-call, timely and accurate "sign language interpreters."

  Yuan Tiantian has a computer background and Wang Jianyuan's major is network engineering. They believe that relying on the power of technology can help China's 27.8 million deaf people, and even more people in need, overcome various inconveniences in life.

They want the sign language real-time translation system to become a bridge, so that both the hearing impaired and the hearing person can communicate with each other without barriers.

  This goal sounds ambitious, and it is difficult to achieve it, but they decided to give it a try.

New bridge

  To help deaf people communicate with hearing people, the previous "technical bridge" that has been built is speech recognition.

  In 2007, Fu Zhiwei, who was the former vice chairman of the Chinese Association of the Deaf, published an academic article entitled "I see "Accessibility for the Deaf".

He wrote in the article: "I hope that in the future a machine can be developed, which can be the same size as the current miniature video camera, one end can input spoken words, and the other end will display text on a small screen... When this kind of machine comes out, deaf people information The barrier-free environment will be greatly improved."

  Looking back at this passage more than ten years later, you will find that the machine Fu Zhiwei is looking forward to is almost exactly the same as the common voice recognition application on smart phones today.

The beneficiaries of speech recognition technology are no longer limited to the hearing impaired group-just look at how many people around you are using the voice input method.

  The "sign language real-time translation system" developed by the team led by Yuan Tiantian is similar to the "voice input method" of the sign language version.

It's just that the latter is the input of speech, machine recognition, and conversion into text, while the former is the input of sign language actions and expressions, and machine recognition, and then translated into text-"Old Bridge" and "Xinqiao" are similar but different.

  Yuan Tiantian did not expect that the existence of the "old bridge" would become the "resistance" for the project team to build the "new bridge".

  "Why do you have to engage in sign language translation? Deaf people can't hear it, and now there is speech recognition! Some deaf people can't speak, can't you just use a mobile phone to type?" Some companies once asked Yuan Tiantian about the development Real-time sign language translation system in China, I want to talk about investment.

But after talking about it, they questioned the necessity of the project and felt that sign language translation had no meaning other than "doing good deeds."

I heard too many similar words, and Yuan Tiantian, who was quick to speak, was a little anxious, "Well, what kind of WeChat did you do at the beginning? Isn’t it the same as using text messages? What about voice recognition, just type? Many innovations are there. If there are alternatives to the traditional form, then technology shouldn’t improve?"

  Yuan Tiantian said that to this day, sign language is still the most natural and efficient way of expression for many hearing impaired people, "faster than writing and typing." But in order to promote barrier-free communication, hearing people are required to learn sign language, obviously not. Reality.

To master a language, human beings are always limited by cognitive level, memory and other aspects. "Our real-time sign language translation system is to solve this problem." Yuan Tiantian has a typical engineering thinking and encounters difficult problems. One reaction is to think about whether we can break through from the technical level.

"I feel that if the technology gets there, the machine's memory and its learning ability are much better than people. So if the machine can learn sign language, it will definitely be more useful than my "half-trembling" sign language interpreter."

  Currently, the sign language real-time translation system is still in the trial stage.

Ideally, when the research results are truly implemented, hearing people can directly communicate with the "sign language family" as long as they open the sign language translation program.

  But for Yuan Tiantian and the young R&D team, it is not easy to achieve this "ideal state".

As hearing-impaired students who are deeply involved in the project, Wang Jianyuan and Wu Lijie from the Institute of Deaf and Artificial Intelligence are investing more and more energy in sign language recognition and translation technology.

They opened a WeChat public account that promoted and popularized sign language to the whole society, and they have not updated it for a year and a half.

When the official account was first opened, their idea was to open a window so that the deaf could be "seen".

And now, they are busy building bridges, wanting the hearing impaired to be "understood".

The beauty of sign language

  In addition to hearing impaired students, there are also hearing people in the "Sign Language Real-time Translation System" project team.

Whether it is a daily user of sign language or not, there is a consensus among them: sign language is a beautiful language.

  Yuan Tiantian's intuitive feeling of the beauty of sign language comes from her students.

Yuan Tiantian’s alma mater is Tianjin Normal University. Being a teacher is her only career goal, but she does not have a special education background. She usually uses sign language, part of which she learns from books, and the other part is with hearing-impaired students. Gradually mastered in daily communication.

When she first started, the old teachers of the college praised her, "It's amazing, I dare to'compare' with the students as soon as I get on the stage!"

  Yuan Tiantian has a typical old Tianjin personality, warm-hearted and straightforward. She confessed that she practiced sign language hard at the beginning and did not have the lofty ideal of "contributing to the education of the disabled" in her heart.

I just feel that since you want to be a teacher and teach hearing-impaired students, sign language is an indispensable tool in class. "If you don't dare to'compare', you will never learn it, and you will never be able to communicate with children without barriers, right? "

  Using and learning in this way, she found that the natural sign language used by deaf people is much more than simply "comparing pictures with books".

It has its own word order and grammar. In colloquialism, “fire” is put out before the fire, and when students use sign language, they will first compare “fire”-when a “fire” is started, then “extinguish”; it has The unique sense of space, the same palm down, the five fingers move from gathering to scattered, simulating the gesture of the light source, lighting indoors means lights, lighting outdoors can refer to the sun; in the same sentence, students from all over the world may play well. There are several different versions of "dialects"; the expressions of sign language are even richer. It is necessary to type a sentence with a complete meaning, not only by hand gestures, but also with corresponding expressions, as well as body movements...

  "It's really a beautiful language!" Yuan Tiantian said. To this day, when she talks with students in sign language, she sometimes froze, thinking that they "have light on them."

Those who have seen Wang Jianyuan and Wu Lijie use sign language will admit that Yuan Tiantian's feelings are not exaggerated.

  Wang Jianyuan was born in a deaf family in Qingdao. His parents are hearing impaired.

Before entering the network engineering major of the Deaf Artificial College in 2018, he was educated at the school for the deaf all the way.

  When I was young, my father took Wang Jianyuan to practice sign language as seriously as other parents taught his children to speak.

My father didn't think it was a "special" language. "He felt that Mandarin can be rounded up and down, and sign language can also be open and humorous." His parents' calm attitude towards sign language affected Wang Jianyuan.

He never shy away from using sign language in public, and he never hides his identity as a deaf person.

In his view, sign language is sign language, not some kind of "disability manifestation."

Deaf people can use it to express their inner thoughts, and hearing people, as long as they have mastered this language, they can also use it to communicate-sign language, like any other language, is a tool to break down barriers and communicate with each other. It can be a bridge , Is a bond, but should not be the obstacle itself.

  Wu Lijie is a Mongolian guy from Qinghai, "home is on the edge of the Chaka Salt Lake"-in addition to being proficient in sign language, he can also communicate in spoken language, but his voice is a little hoarse and low.

  Compared with Wang Jianyuan, who enrolled in the same year, Wu Lijie's school experience is more complicated.

In elementary school, he relied on hearing aids and lip-reading skills, and went to general school with hearing children for three years, "three years are the first in the class".

That experience exercised his adaptability and oral expression skills.

Later, he left Qinghai, completed high school in Wuhan Second School for the Deaf, and came to Tianjin University of Technology through a single exam.

This kind of achievement is "smiling proud" among the children of the same generation in the family, and he firmly believes that "the deaf is no worse than anyone else."

He once directly used "Deaf-Mute" as his WeChat nickname-in English, Deaf means deaf.

  Whether it's learning professional courses, promoting sign language or doing scientific research, this bronze-skinned Mongolian youth has the aggressiveness to "take the lead", "Why can't we do what Helen Keller can do?"

Technical difficulty

  Wang Jianyuan and Wu Lijie were invited by Yuan Tiantian to join the project team.

In 2019, when they were only in their sophomore year, they were entrusted with the important task of collecting sign language corpus and composing sentences that conform to the grammar and word order of natural sign language.

Sign language is a visual language. Based on this feature, the process of "writing" is not done by handwriting or typing, but by recording video-Wang Jianyuan, Wu Lijie and other hearing impaired students in the team One of the important tasks is to use sign language repeatedly at the camera.

  Why are they?

  Because of the difficulty of real-time sign language translation technology, it is precisely due to the beauty of sign language:

  The independent grammar system means that the system developed by the team not only recognizes the meaning of a single sign language vocabulary, but also converts the sign language word order into the Chinese word order that the hearing person is accustomed to. ", the translation is completed; the unique sense of space means highly similar sign language actions. In different environments, there may be different translation methods. The computer must learn to distinguish between "the sun outside the house and the lamp in the house"; rich The way of expression means that sign language recognition is not like speech recognition. It only needs to collect the "learning materials" of "sound", and to train artificial intelligence into a qualified sign language interpreter, it is necessary to use gestures, facial expressions, and general body movements. Everything is converted from video to data, and then "taught" to the computer.

Therefore, hearing impaired students who can fully understand and demonstrate the beauty of sign language are the most suitable people to be "teachers" for artificial intelligence.

  Wang Jianyuan and Wu Lijie's ability to "teach machines" to learn sign language comes from "teaching people."

  Despite their young age, their sign language teaching experience can be described as rich.

As soon as freshman year, they discovered that there are not a few people who are biased in sign language.

Not only hearing people, but even among the hearing impaired students, there are many classmates, because they have been influenced by the viewpoints of “sign language is to admit that you have a disability” and “speak like a normal person” since childhood.

  Therefore, not long after enrollment, Wang Jianyuan and Wu Lijie began to teach through various offline and online channels through the school’s sign language club, through their WeChat public account, through the short video platform.

Like all teachers who teach languages, they speak grammar, vocabulary, sentence patterns... The two even started to study more "super-class" sign language linguistics: in Wang Jianyuan's schoolbag, a book of "Study on Sign Language Verbs" and professional courses The textbooks were kept together, and he took out to look at them when he had time.

Wu Lijie simply ran to participate in the National General Sign Language Key Teacher Training Program hosted by the China Disabled Persons’ Federation. When he graduated, he scored first in the overall score.

  In the beginning, they did all this, just to "correct the name" of the language, and to tell everyone: sign language has its characteristics, but it is by no means "special". Sign language, like its users, is common and normal. .

They did not expect that the "extra homework" that they did to teach people to learn sign language would be combined with their respective professional knowledge and become a powerful tool for teaching "AI" to learn sign language.

  "Our hearing-impaired students know the grammatical structure of sign language, as well as some of its basic elements, such as the level of expression on the face that is considered adequate, and when it is time to use body language, they all understand. So they do it. There are natural advantages in data collection and processing.” Yuan Tiantian values ​​the role of deaf people in the project team. “If we ask hearing people who have never used sign language to collect data and build a corpus, that’s OK. It's really hard!"

  Because of the lack of participation of deaf people, in the field of sign language recognition and translation technology, academic circles in various countries have gone through detours: for example, trying to build a sign language corpus through data gloves.

As a common sensor, the application of data gloves in gesture recognition technology has been relatively mature-although the cost is high, using gloves to collect "gestures" seems natural, and the accuracy should be guaranteed.

It's a pity that sign language is not a simple gesture: aside from facial expressions and big movements, even the same gestures with slightly different directions may have different meanings. Such subtle differences cannot be recognized by gloves.

  Realizing this, in recent years, most of the sign language recognition and translation projects promoted by various countries have adopted computer vision methods, using cameras to collect data and build corpora, as did Yuan Tiantian and her team.

The more common problems at this stage are that the collected samples are too few and the data set is not large enough.

The second is that the data set is built, but the corpus cannot be filtered and annotated with high quality.

To put it bluntly, the corpus is "not easy to use".

  Yuan Tiantian said that the process of artificial intelligence in-depth learning is a bit like teaching children how to speak. A word must be played by the whole family and repeated over and over. Only after enough times can the child build an impression, "Oh, this word is called mother, whether it is from Say it out of my father’s mouth, say it out of my grandma’s mouth, or say it out of my uncle or aunt, who are all called mothers.” Similarly, for the machine to recognize a hand sentence, there must be enough people facing the camera in different styles. The computer can only "remember" by repeating the same set of actions.

  The sign language data sets established by many foreign teams cannot support sign language translation in complex scenarios. A direct reason is that they cannot find enough people to collect sign language in a natural state.

"Our team is backed by the Artificial Institute for the Deaf. Many members are sign language users and have an advantage in the acquisition of natural sign language." Yuan Tiantian said.

  "But to tell you the truth, we have also built a data set that is not easy to use." Yuan Tiantian directly defined the gesture Chinese data set established by the team in 2018. "Failed"-the point of failure is that the corpus they collected that time was gesture Chinese instead of natural sign language.

Sign Chinese is based on the Chinese word order used by hearing people, rather than the sign language word order conjunctions to form sentences.

  "For example,'Love is our common language'. If this sentence is typed in gesture Chinese, it is to compare the words one by one in order. There are also corresponding gestures for'de', which must be typed out. But the deaf people use it daily. In sign language, the usual word order is: love, us, common, language, yes, no need to type'of'. My'half-trembling' sign language, I won't type all'of' when I teach. , It does not conform to the expression habit of natural sign language." Yuan Tiantian explained.

  "No matter how accurate you translate gesture Chinese, it is useless. Deaf people usually do not like this!" After learning the lesson, we will build a sign language data set this time. Yuan Tiantian said that they do not seek quick results, but only want every corpus to be completed. It is the original natural sign language.

Recognition according to the sign language word order first, and then translation according to oral habits, although there is one more technical link, the difficulty of the team's research and development is also longer, Yuan Tiantian insisted on choosing the "difficult but correct" path.

  "We do this research. We don't want to tell people how big our data set is, and we don't want to say how big a paper we have published. We have only one purpose: to be able to use it." Yuan Tiantian said firmly, "This system is out, deaf. People must really be able to use it."

Harvest year

  For Yuan Tiantian and her team, 2019 and 2021 are two key time nodes.

In 2019, the year of their start-up, the "Real-time Sign Language Translation System" was selected as the new-generation artificial intelligence industry innovation project of the Ministry of Industry and Information Technology of the People's Republic of China, and received 20 million financial support.

Yuan Tiantian is very happy. On the one hand, she is gratified because the country attaches importance to barrier-free construction. On the other hand, data collection, technology research and development, and achievement of results are indeed costly everywhere.

  2021 can be said to be a harvest year for the team.

  Yan Siyi still remembers how she felt when she ran through the code framework of the sign language translation system for the first time at the beginning of this year.

"It's as if you are assembling a machine, and the parts are all assembled, but because of various minor problems, such as a few screws that are not tightened, it just can't operate normally. I'm in the laboratory every day, one by one. Tighten the screws. After all the adjustments were done, suddenly one day, as soon as I pressed the switch, the machine turned around. That sense of accomplishment..."

  Yan Siyi is a second-year graduate student in the School of Computer Science at Tianjin University of Technology. He is a hearing member of the sign language real-time translation project team. He is mainly responsible for the "backstage" work-building a framework model for sign language recognition algorithms.

If you continue to use the analogy of teaching children to learn to speak, the preschool stage has completed the steps of repeatedly deepening the impression. Once in elementary school, children will start to learn pinyin and grammar, and learn to compose the words learned and heard according to the rules summarized in the textbook. Sentences and texts are written on homework books and papers-Yan Siyi is one of the "teachers" who compiled "textbooks" to help AI "classmates" summarize the rules of sign language.

  "This part of the more technically demanding work is mainly done by the teachers and students of our school's computer school." Yuan Tiantian said that the process of "editing teaching materials" for artificial intelligence is very difficult.

In the field of sign language recognition and translation, there is too little experience to learn from, and the progress of peers is similar.

The project team referred to the algorithm of the German weather forecast natural sign language translation system, and also looked for the source code of similar systems publicly released on the Internet through various channels, and then tried and revised it over and over.

  Yan Siyi and the others plunge into the computer room at 8:30 in the morning. When they realize that it is dark, they should go back to the dormitory, and then look at the watch at 10 o'clock in the evening.

"I think our project is meaningful, so I want to produce results quickly and achieve results." She said that she is a typical girl in science and engineering, and she likes computers. The mentor asks her to do projects from Monday to Friday, and she does it on Saturdays and Sundays. Can't help but run the code.

In this way, a little bit of groping, a little bit of "tightening the screws", the morning when the code ran through for the first time, the computer "classmates" in front of its human teachers wrote an imperfect but qualified sign language interpreter. Test paper.

  In May of this year, Wang Jianyuan and Wu Lijie brought their research results to the World Intelligence Conference; in October, they formed an entrepreneurial team with other students from the Institute of Deaf Artificial Intelligence to represent Tianjin University of Technology and won the 7th China The gold medal of the Higher Education Master Circuit in the finals of the International "Internet +" University Student Innovation and Entrepreneurship Competition.

  Their "Golden Winning" project is called "Whale Keyu", and its full name is "Whale Keyu Multimodal Continuous Sign Language Automatic Marking and Recognition System".

Automatic label recognition is the initial step of sign language translation, and "Whale Language" was born on the basis of "sign language real-time translation system"-they transformed their two-year experience of "teaching machine learning sign language" into their own scientific and technological innovation project. He personally built a "bridge foundation" for the "bridge of the deaf and health integration" they imagined.

  On the day of the finals, Yuan Tiantian praised her students one by one in the circle of friends.

As the instructor of "Whales", she likes this gentle and powerful name.

  Whale can speak from the story of "52 Hertz Whale". In the ocean, this whale, which cannot communicate with the same species because of its distinctive call frequency, is called "the loneliest existence".

But in fact, if you can crack the 52 Hz code, you will find that it is not an island, it can also sing, and it has its own language.

Wang Jianyuan felt that this whale resembled a hearing-impaired group, "Silence in the crowd, eager to communicate with the outside world all the time, eager for a response."

  The logo designed by several hearing-impaired young people for "Whale Keyu" resembles a white whale floating in a blue ocean, and also resembles the shape of a hand when people put their thumb and index finger together to "compare their hearts".

"When we built the sign language corpus, we wanted to summarize the gestures of sign language into simple lines like this. It is these lines that support the'whale language' system and realize the recognition of sign language." Wang Jian The source said.

  And their original intention in developing the sign language recognition system was to help the "52 Hertz Whale" gain the ability to speak and the right to be "understood"...

  (Participate in the collection and writing: Wu Zeyun)