“I feel that part-time labeling will be more and more easily replaced.” Ximei (a pseudonym), a 27-year-old mother from Heyuan, Guangdong, is slightly pessimistic about his data labeling part-time jobs.

From the part-time job data in 2018, Ximei's income has been much lower than before.

  Du Minxu, who lives in the small town of Bainiaohe in Guizhou, is the project team leader of the data service provider Mengdong Technology. In addition to cooperating with the team members to complete the data labeling project every day, he also takes time to learn python language by himself.

  Both Xi Mei and Du Minxu are AI data annotators, a profession that has huge demand but has been controversial.

  The labeling industry provides a large amount of training data to the algorithm.

According to IDC statistics, the amount of data produced every year in the world will increase sharply from 16.1ZB in 2016 to 163ZB in 2025. 80% to 90% of these data are unstructured data. These data are cleaned and marked into structured data. Understood by artificial intelligence.

In February 2020, "artificial intelligence trainer" has officially become a new profession and included in the national occupational classification catalog.

  The industry often says, "As many intelligences as there are, there are as many people behind them."

With a low threshold and a lot of repetitive work, this industry is born labor-intensive.

In order to reduce labor costs, it mostly exists in some remote areas, so it is often associated with industrial poverty alleviation.

Repeated and boring labor, coupled with low wages, has been called "AI Foxconn."

  After the development of machine intelligence reaches a certain stage, will labor be eliminated? If it is eliminated, where can the related labor be placed?

Since the birth of the data labeling industry, worries and disputes have never stopped.

Nowadays, with the iterative upgrade of the data annotation industry, these data annotators with different work forms are all facing the test of ability upgrading.

  Low barriers to entry

  Former data labeling salesperson Huang Ming (pseudonym) told China Business News that as a labor-intensive industry, employees in labeling bases are actually indistinguishable from those working in factories.

In his opinion, sitting in front of the computer all day to make annotations is not only tedious and repetitive, but also causes great damage to the human body, especially the eyes, and the salary is not high.

  For respondents who are still engaged in data labeling, their views are different from Huang Ming.

Although this job may seem boring to many people, they can still find a combination with their own lives and preferences.

  Two years ago, Xi Mei was unable to go to work normally due to pregnancy.

She did not want to follow her husband in the aquaculture industry. After quitting her job as a Taobao customer service, she found a website called "Love Biaoke" while searching for "part-time job" on the Internet. Since then, Ximei has been exposed to data annotations. industry.

  Ai Biao Ke is a service-oriented crowdsourcing platform that connects bidders and task management users under the iFLYTEK University of Science and Technology. In order to meet the simple tasks of its resource department or other partners, such as data labeling and collection, it issues part-time jobs.

  When I first entered the industry in 2018, Aibiike made some simple boxing and transcribing calibration items. The hourly salary was between 25 and 40 yuan. After one month, Ximei's income was higher than that of full-time Taobao customer service.

"I prefer boxing. This process can be performed while listening to music. Dialect transliteration calibration is really a test of people's patience, and I am not sensitive to sound." Xi Mei told the CBN reporter.

  A reporter from China Business News logged on to the website of "Biaoke" and found tasks such as dialect transliteration calibration, mathematics solution question checking, and rare word screening. You need to apply to join the team before you can receive it.

Among them, the sample video shows that the transcription calibration is mainly to adjust the speech frequency spectrum and tone spaces to calibrate the accuracy of machine transcription.

  Ximei said that with more and more people doing part-time labeling, Party A kept lowering prices. At present, most projects of Aibike are only paid at most 10-15 yuan per hour, and sometimes it may not even reach 10 yuan.

  Due to the decrease in hourly salary, she began to join various QQ groups and look for other outsourcing jobs. The current hourly salary is about 20 yuan, and she can earn 2,000 to 3,000 yuan a month.

"It is impossible to support a family, but at least I can do something to make some money." Ximei said.

  Compared with Ximei’s part-time labeling work, Du Minxu, as the project team leader of Mengdong Technology, has a monthly salary ranging from 3,000 to 4,000 per month, which does not seem to be much higher.

But as an official employee of the company's data labeling, Du Minxu's mental state appears more confident and full.

  As early as when he was studying history at Guizhou University for Nationalities and was approaching graduation, Du Minxu had already practiced in Mengdong Technology.

After graduation, he chose to become a data annotator at Mengdong Technology, on the one hand because of his curiosity about artificial intelligence, on the other hand, because he applied for the project team leader, so he could accumulate management experience.

  In Mengdong Technology, Du Minxu's main work is project testing, communicating with customers, and training the team's annotators to solve some problems in the project.

Usually the work intensity is not high, weekends and occasional overtime work, the company not only provides free accommodation, management accommodation also provides air-conditioning, refrigerators, washing machines, so that he has a guarantee in life.

  Du Minxu told CBN that the mental states of colleagues engaged in this work at Mengdong Technology are different.

About 30% of intern colleagues are interested in artificial intelligence projects and big data industries.

However, as a labor-intensive industry, the process of repetitive labeling or recording is relatively boring, which also makes some colleagues fail to adapt to their careers and leave. The job turnover rate is about 10%.

  Du Minxu said that these people often have a relatively simple understanding of the industry when doing data labeling. Some are just for making money, but they pay less attention to industry trends.

  Huang Ming later jumped to a start-up company doing lidar for sales.

In his view, one of the benefits of the data labeling industry is that it can connect with many high-end companies and access the field of artificial intelligence with the lowest threshold.

  Their consensus is that for mothers with children, unemployed people in rural areas, and even some people with disabilities, data taggers are an acceptable job.

  Increased quality and rights issues under the crowdsourcing model

  China's data labeling industry can be traced back to 2005 when Zhu Chunsong, a well-known computer vision expert and artificial intelligence expert, returned from the United States to his hometown of Ezhou, Hubei, and founded the Lianhuashan Research Institute, which was said to be the world's earliest big data labeling team at that time.

  In 2015, with the rise of artificial intelligence giants, the demand for data annotation and collection surged, and the market began to take shape in a real sense.

Many data service companies have entered the expanding market as Party B, serving large Internet companies such as Baidu and Ali, as well as AI unicorn companies.

  Up to now, the data labeling industry has spread all over the country, presented in the form of third-party data service providers, giant deployment bases and crowdsourcing models, such as Baidu’s AI data labeling base in Shanxi, Mengdong Technology in Bainiao River, Guizhou, and Data Hall in Hebei and Anhui Base, as well as Qianji Data, Ruijin Technology in Henan, Dongtuanbao Village, Laiyuan County, Hebei, etc.

  Zeng Yun, director of the data service division of Mengdong Technology, introduced to a reporter from China Business News that Mengdong is an independent data service provider and cooperates with Guizhou Shenghua Vocational College to teach and train students in data labeling through the integration of production and education. A large number of interns began to gradually select regular employees and managers who can adapt to the work of data taggers.

  And like Ximei's "part-time job", the wild development of data labeling started from the "crowdsourcing" model.

One end of these crowdsourcing platforms connects with the project demand company, and the other end connects with a large number of volunteers (part-time staff) with free time.

The advantage of this crowdsourcing structure is that it can organize a large number of part-time workers in the society to mark, saving the company's operating costs.

  The disadvantages of crowdsourcing are also more obvious. The scattered part-time personnel have uneven professional backgrounds and work abilities, high communication costs, and relatively difficult data confidentiality.

Once the demanding company has to adjust the original labeling demand, the part-time staff is highly mobile and cannot flexibly serve the demanding company.

  Ximei told CBN that at the peak of the epidemic in March this year, she had an appointment with an outsourcing company. It was said that the data output value could reach 200 yuan a day at that time, and the salary was about five or six thousand a month.

But when the test really started, the other party was constantly urging the production. Later, due to the data acceptance failure, he returned to work and reworked twice in a month and a half. In the end, Xi Mei only got 400 yuan.

  It is understood that the resource docking of the crowdsourcing model is often through some WeChat groups or QQ groups.

The reporter searched for "data labeling" on QQ, and found part-time groups, project resource docking groups, experience exchange sharing groups and other large and small data labeling groups.

After joining a few groups, the reporter found that the groups are more active: group friends often post projects, look for part-time jobs, and new members will join every once in a while.

At the same time, reporters from time to time in the group saw complaints about labeling staff being owed wages by the project party.

  The crowdsourcing model is also one of the reasons why a large amount of data labeling staff is generally low.

Huang Ming told China Business News that after multiple intermediaries, each intermediary is making a price difference, which results in higher costs for Party A. In fact, the people who are actually labeled earn less.

  Some well-done data annotators are more inclined to plan and organize teams and look for resources to do it alone.

The more these situations, on the one hand, the rapid development of the data annotation industry, on the other hand, the industry's crowdsourcing intermediary cascade is becoming more and more serious.

  For data annotators, the transformation and upgrading of the industry also means the transformation and upgrading of their own capabilities.

A report from the China Academy of Information and Communications Technology pointed out that at the current stage of AI application research and development, data labeling is fundamental, and it will rely on labeling data within 10 years.

  As for when machines can replace manual data labeling, no one has yet given a clear answer.

However, CBN reporters have seen that in the various QQ groups with data annotations, new project requirements and personal part-time jobs are still emerging every day, and there are still a lot of discussions focusing on the data annotation industry.

  Author: Yi Bai Ling