Large model "draws the dragon", small data "finishes the eye"

Enterprises accelerate intelligent transformation

  ◎Reporter Zhai Dongdong

  The application of small data and high-quality data has its premise, that is, it is necessary to fine-tune the model through small data on top of the large basic model (pre-training model), so that the model can serve specific application scenarios more accurately.

From this perspective, small data will play a key role in the future when the basic model completes downstream tasks.

  Wang Jinqiao

  Researcher, Institute of Automation, Chinese Academy of Sciences

  Today, big data has become the "standard" of artificial intelligence.

In the process of training artificial intelligence models, if you want to make them smarter, a large amount of diverse data is necessary.

But recently, the famous artificial intelligence scholar Wu Enda expressed a different view when looking forward to the development direction of artificial intelligence in the next 10 years.

He believes that the application of small data and high-quality data may be the future trend.

  Wang Jinqiao, a researcher at the Institute of Automation, Chinese Academy of Sciences, said that the application of small data and high-quality data has its prerequisites, that is, it is necessary to fine-tune the model through small data on top of the large basic model (pre-training model), so that the model can serve the specific needs more accurately. application scenarios.

From this perspective, small data will play a key role in the future when the basic model completes downstream tasks.

It is difficult to obtain high-quality big data in most application scenarios

  Algorithms (models), computing power and data can be said to have become the three major elements to promote the development of artificial intelligence, of which data is particularly important.

In many connected consumption scenarios, we are often "hit" by precise AI pushes.

Through the analysis of consumers' consumption habits and shopping preferences, the platform system can judge and guide consumers' potential needs, and the basis of all this is based on a large number of rich data samples.

Using big data, the platform builds a special model suitable for this field to achieve accurate push.

  These experiences may be one of the most direct impressions of ordinary consumers about big data and artificial intelligence.

Andrew Ng also said in the interview that in the past 10 years, consumer-facing enterprises have obtained very large data sets due to their large user bases (sometimes even as many as billions), so that artificial intelligence can carry out deep learning and give artificial intelligence. The enterprise has brought a lot of economic benefits.

But he also stressed that this rule does not apply to other industries.

The reason is that not all scenarios can generate rich big data samples.

  In fact, "80-90% of the problems in life are small sample problems." Wang Jinqiao said that in many application scenarios, due to the difficulty of obtaining training samples, there is only a very small amount of data, and defect detection is one of them. Typical Case.

Defect detection, that is, using machine vision technology to detect and identify a specific defect.

This kind of detection has applications in many fields such as aerospace, railway transportation, and smart cars.

Because in actual production and life, there are always a few defective products, so the number of training samples for defect detection is very small.

  Even for scenes with rich samples, there is a problem that the labeling of training data becomes more and more difficult.

According to Wang Jinqiao, the training data currently used by artificial intelligence is still dominated by manual annotation. In the face of massive data, manual annotation often requires industry experience, and it is difficult for ordinary people to identify the marked area.

In addition, artificial intelligence experts are required to design algorithm models for each application requirement. The more models, the more development costs will continue to increase.

  Wu Enda also said that in the consumer Internet industry, we only need to train a few machine learning models to serve 1 billion users.

In manufacturing, however, 10,000 manufacturers are building 10,000 custom models.

And to do so often requires a large number of AI experts.

  Judging from the current industry development trend, the basic model may be a direction to solve the above problems.

Use the base model as a "base" to fine-tune with small data

  "In recent years, the industry has begun to pay attention to the development of basic models or general-purpose models to solve the above problems." Wang Jinqiao said, first pre-train a model with a large amount of data.

In the pre-training of these models, being well-informed is the top priority.

During training, the model sees a wide variety of data in the field, increasing its knowledge to deal with various situations that arise in the future.

Afterwards, the specific scene data is used for fine-tuning in downstream tasks.

  Such as a large model in the field of natural language processing (NLP), if you want to use it to complete downstream tasks such as dialogue, question answering, etc., you only need to use a small amount of data in this downstream task, and fine-tune on this large model, you can achieve Nice effect.

Some research results also show that only 5% to 10% of the data samples of the original dedicated model are needed for data fine-tuning of the large model, and the same accuracy as the dedicated model can be obtained.

  "From a large model to a small model, it can be said that one model can do multiple tasks, which can be said to be a transformation in the current development of the industry." Wang Jinqiao said, this not only reduces the development difficulty, but also greatly reduces the development cost.

In the past, each algorithm required a deep learning expert to design and train, but now it only needs to be fine-tuned under the large model, and the design and architecture of the model has become relatively simple.

Small and medium-sized enterprises only need to upload data on the large model and complete it.

  Furthermore, with this approach, the false positive rate of the model is also reduced.

The basic model has seen a variety of data and scenarios, and when dealing with specific tasks, it has a massive knowledge reserve and is more prepared to deal with specific small-scene applications.

  However, Ng also said in the interview that pre-training is only a small part of the problem. The bigger problem is to provide a tool that allows users to select the correct data for fine-tuning and label the data in a consistent way.

When faced with the application of large data sets, the usual response of developers is that it does not matter if the data is noisy, all the data are collected as they are, and the algorithm will average them.

But if researchers could develop tools to flag inconsistencies in data, giving users a very targeted way to improve data quality, that would be a more efficient way to achieve high-performance systems.

The future development direction of multimodal or large model

  The performance of the base model is particularly important as the "base" for the production of many smaller models.

The closer its cognitive ability is to humans, the better the performance of the small model generated on top of it.

  When exploring the external environment, human beings have a variety of cognitive means such as vision, hearing, touch, etc., and realize interactive communication through language dialogue and other forms.

Among them, visual information accounts for about 70%, and auditory, tactile and other information accounts for about 30%.

"Similarly, to make the performance of the large model more excellent and closer to human cognitive ability, it involves the problem of data fusion in training." Wang Jinqiao pointed out that the well-known language generation model GPT-3 can generate fluent and natural language. and complete a series of NLP tasks such as question answering, translation, novel creation, and even simple arithmetic operations.

However, the main way of interacting with the outside world is text communication, which lacks multi-modal fusion such as images and videos.

  Each source or form of information can be called a modality.

For example, people have touch, hearing, sight, and smell; the media of information include voice, video, and text.

The human cognitive model can be said to be a multimodal collection.

  To make the pre-training of the base model closer to the human cognitive model, multimodal fusion is also required.

That is, let the model realize the ability to process and understand multi-source modal information through machine learning methods, such as multi-modal learning between images, videos, audio, and semantics.

The multimodal pre-training model is widely regarded as an exploration of the path from weak artificial intelligence in a limited field to general artificial intelligence.

  "In the past two years, the number of large models has shown explosive growth, and there is a trend from single-modal models to multi-modal models." Wang Jinqiao said that basic models with multi-modal capabilities can be used in specific application scenarios. The robustness is better, and the survivability of the system is stronger in abnormal and dangerous situations. In the future, the multimodal basic model may become an important direction for the development of the basic model in the future.