On the Internet, everyone will more or less, or actively or passively disclose certain pieces of information.

If such information is mined by big data, there is a risk of privacy leakage, causing information security issues.

In the face of the turbulent 5G era, the public feels more and more confused about their privacy protection, and even a little bit at a loss.

So, how does big data know about your privacy?

How should everyone protect themselves?

1. "Known and unknown" big data are known

  In the era of big data, everyone has the potential to become the emperor in new clothes in Hans Christian Andersen's fairy tales.

In front of big data, it knows what you have said; it knows what you have done; it knows what hobbies you have; it knows what diseases you have had; it knows where you live; your relatives and friends Whoever your friends are, it also knows... In short, it knows almost everything you know, or it can know everything, at least it can be said that it will know sooner or later!

  Even, big data may know things that you don't even know.

For example, it can discover many of your subconscious habits: where do you like to stand in group photos, do you prefer to step left or right when crossing a threshold, what kind of people do you like to deal with, what are your personality traits? Ah, which friend has a different point of view than yours...

  Furthermore, it is still possible for big data to know what will happen in the future.

For example, based on information such as your "eating more and exercising less", it can infer that you may have "three highs".

When you and many people are buying cold medicine independently, big data knows: the flu is about to break out!

In fact, big data has successfully predicted the results of the World Cup, stock fluctuations, price trends, user behavior, traffic conditions, etc.

  Of course, "you" here does not just refer to "your individual", including but not limited to, your family, your unit, your ethnic group, and even your country.

As for the private information that you know, don't know, or will only know in the future, what will make you a hero or a bear?

This is unpredictable.

2. Data mining is like “garbage disposal”

  What is big data?

To put it figuratively, the so-called big data consists of a lot of strange data piled up in a haphazard manner.

For example, what you say on the Internet, the WeChat you send, the emails you send and receive, etc., are all part of big data.

A lot of information collected without knowing it, such as passive information such as video captured by road cameras, route maps left by mobile phone positioning systems, and driving navigation signals, are also components of big data.

In addition, information about temperature, humidity, speed and other things that are automatically collected by various sensor devices is still an integral part of big data.

In short, every person, every communication and control device, whether it is software or hardware, is actually a source of big data.

  Big data utilizes a technology called "big data mining", using methods such as neural networks, genetic algorithms, decision trees, rough sets, covering positive examples to exclude negative examples, statistical analysis, fuzzy sets, etc. to mine information.

The process of big data mining can be divided into eight steps: data collection, data integration, data specification, data cleaning, data transformation, mining analysis, model evaluation, and knowledge representation.

  However, these high-sounding big data industries are almost equivalent to garbage disposal and waste recycling.

  This is no joke.

Waste acquisition and garbage collection can be counted as "data collection"; sending waste and garbage to centralized processing sites can be counted as "data integration"; preliminary classification of waste and garbage can be counted as "data protocol"; Proper cleaning and sorting of garbage can be counted as "data cleaning"; dismantling broken sofas into raw materials such as wood, iron, and cloth can be counted as "data transformation"; careful analysis of how to sell these raw materials at a good price can be counted as "data transformation" Data analysis”; constantly summing up experience, selecting and fixing upstream and downstream sellers and buyers, can be counted as “model evaluation”; finally, organizing these skills into formulas can be counted as “knowledge representation”.

  Look at the raw material structure.

Big data has heterogeneous characteristics, just like garbage.

If you have to find out the essential difference between garbage and big data, it is that garbage is physical and can be reused for a limited number of times, while big data is virtual and can be processed and reused repeatedly.

For example, big data experts can hand over passenger travel patterns excavated from the data (waste) to airlines, and sell the consumption habits of a certain group to department stores, etc.

In short, big data experts can "eat more than one dish" and use it repeatedly, and the longer the time, the greater the value.

In other words, big data is very valuable "junk".

3. Big data mining will never end

  Although big data mining can create value positively, it also has negative effects, that is, there is a risk of privacy leakage.

How is privacy leaked?

This is actually very simple, let's first break down how "human flesh search" violates privacy!

  A large group of netizens, for a certain purpose, use all their own resources and channels to collect as much information as possible about the parties or things; then, refine the information into new information according to their own purposes, and feed it back online to share with others.

This completes the first "human flesh iteration".

  Then, on the basis of the first human flesh iteration, everyone learned from each other, made persistent efforts, and cross-repeatedly collected, processed, and sorted information, so the second "human flesh iteration" was born.

This cycle goes on and on, and after many unremitting iterations, the portraits of the parties or things will appear on the paper.

If the material that constitutes the "satisfactory portrait" has indeed been verified, at least the subject is a fact, the "human flesh search" will be successful.

  It can almost be concluded that as long as there are enough netizens participating in the "human flesh search" for a long enough time and everyone's perseverance is strong enough, then anyone may have nowhere to hide.

  In fact, the so-called big data mining, in a sense, is just a special "human flesh search" automatically completed by machines.

It's just that the purpose of this kind of search is no longer limited to discrediting or praising someone, but has a wider purpose, such as finding the best buyer for commodity sellers, finding patterns for certain types of data, and searching for certain things. looking for correlations etc.

In short, as long as the purpose is clear, then big data mining will be useful.

  If we compare "human flesh search" with big data mining, netizens are replaced by computers; the information collected by netizens is replaced by massive heterogeneous data in databases; the skills of netizens to find connections between various characters are replaced by corresponding intelligent algorithms Replacement; the practice of netizens learning from each other and inspiring each other has been replaced by various synchronous operations.

  Each iteration process is still carried out as usual, but the number of iterations of the machine is more and the speed is faster. Each iteration is actually a "learning" process of the machine.

Netizens' final "satisfied portraits" were replaced by temporary excavation results.

The reason why we say it is temporary is because there is no end to big data mining, and the results will become more accurate and smarter. Users only need to choose satisfactory results at any time according to their own standards. .

  Of course, in addition to the similarities, there must be many important differences between "human flesh search" and "big data mining".

For example, machines will not be tired, they will collect more and faster data, and the sources of data channels will be wider.

In short, the "human flesh search" of netizens will eventually lose to the "big data mining" of machines.

4. Privacy protection and data mining "danger" and "opportunity" coexist

  It must be admitted that, as far as the current reality is concerned, the "lethal power" of big data privacy mining has far exceeded the ability of big data privacy protection; in other words, in the face of big data mining, humans are currently at a loss.

This is indeed an accident.

Since the birth of the Internet, in the past few decades, people have spared no effort to keep fragmented information online forever.

Although each of the fragments is completely harmless, no one has ever realized, at least not deliberately paid attention to, when many harmless fragments are fused together, there will be endless troubles!

  However, there is no need for you to worry too much.

In human history, similar passive situations have appeared more than once.

Judging from past experience, privacy protection and data mining are always in rotation like a "revolving lantern" - through the "mining" of privacy, human beings have gained unprecedented benefits, resulting in more "privacy" that needs to be protected, so , I have to go back and seriously study how to protect these privacy.

When more and more privacy is accumulated, "mining" them will become more and more profitable, so a new round of "mining" begins again.

Historically, human beings have an overall advantage in terms of their own privacy protection. Before network big data mining, "privacy leakage" was not a prominent problem.

  However, now humans need to face a thorny problem-how to protect the privacy of the massive fragments of information left on the Internet in the past?

Relying on technology alone is obviously not enough, and even the more "protected", the more "disclosed privacy".

  Therefore, a multi-pronged approach must be taken.

For example, from a legal perspective, big data mining for the purpose of "human flesh search" is prohibited; from a management perspective, malicious big data search behaviors are found, and necessary supervision and control are carried out.

In addition, when necessary, the concept of "privacy" needs to be reshaped. After all, "privacy" itself is a conventional concept related to time, place, nationality, culture, etc.

  For personal online behavior, how to protect privacy in the era of big data?

Or, at least don't leave too much fragmented information containing personal privacy on the Internet?

The answer is only two words: Anonymous!

As long as you do a good job of anonymity, you can protect your privacy to a certain extent.

That is to say, before the emergence of big data technology, privacy means hiding "privateness" and personal identity can be disclosed. Hide personal identity, that is, anonymity.

(Authors: Yang Yixian, Niu Xinxin, both professors of Beijing University of Posts and Telecommunications)