AI has prejudice and discrimination, and algorithms make users' preferences converge? Scientists give evidence

  You may have noticed that after you rate the movie you just watched on the movie scoring website, the movie style recommended by the website will be similar to the movie you watched. To give a more common example, when you search for a certain item on a shopping website, the recommendation page displays similar items the next day.

  Artificial intelligence can help businesses gain customer preferences, but at the same time, they are gradually forming preference biases based on user feedback to assimilate user needs. Moreover, in the field of face recognition, the problems caused by the discrimination and prejudice inherent in the algorithm have caused many controversies.

  Recently, research results from scholars from many universities have provided evidence for the above-mentioned prejudice and discrimination. Their research papers are currently published on the preprint website Arxiv.

Algorithmic recommendation system will amplify bias and make user preferences converge

  The essence of recommendation system is a kind of information filtering based on product content or user behavior. Nowadays, many applications and websites we use have embedded algorithmic recommendation systems. If you give a high score to a movie on a video website, the system will recommend more movies of the same type for you. If you also score the movies recommended by the system, the system will add your feedback behavior to the system, which is a feedback loop.

  However, the recommendation algorithm will be affected by popularity bias. Popularity bias means that some popular items will be frequently recommended, while others will be ignored. In the above example, some movies are loved by more people and get higher ratings, which are popular items, or they can be called popular items, and these items will be more recommended to users. This is the popularity bias.

  The popularity bias is partly due to the different popularity of the training data itself, and partly due to the recommendation algorithm. Over time, this popular prejudice will be strengthened. This is because if users continuously give high scores to popular movies in the feedback loop, these movies will become more popular and have a greater chance of being recommended.

  In order to study the influence of feedback loops on the amplification bias and other aspects of the recommendation system, researchers from Eindhoven University of Technology, DePaul University, and University of Colorado Boulder conducted simulations using three recommendation algorithms on a movie dataset , To simulate the interactive process of the recommendation system.

  As research data, the MovieLens 1M dataset contains 1000,209 ratings given by 6,040 users to 3706 movies, with a score range of 1-5. The three recommendation algorithms used by researchers are: user-based collaborative filtering (UserKNN), Bayesian personalized ranking (BPR), and MostPopular, an algorithm that recommends the most popular products to everyone.

  By using these data and algorithms to iterate-the system continues to generate recommendation lists for users, and users continue to score items in the recommendation list. The researchers found that over time, the average popularity of the data under the three algorithms is all There has been an increase, but the overall diversity has declined, which also proves that the bias of the recommender system after the feedback loop is amplified.

  The amplification of the prevalence bias has also changed the system's judgment of user interest. In all recommendation algorithms, the deviation between the user's preference and its initial preference increases over time. In other words, this will cause the recommendations made by the recommendation system for users to deviate more and more from the users' true preferences, and the movies recommended to you by the system will no longer meet your tastes.

  In addition, due to the prejudice of the recommendation system being magnified, users almost only have access to highly popular items, and only those popular movies that have been scored high by more people. Therefore, their preferences in the recommendation system will be concentrated in a common range, which is manifested in the homogeneity of user preferences. The prejudice caused by the feedback loop has a greater impact on minority users.

  "The method of solving algorithmic bias becomes crucial. Because if not handled properly, a small deviation in the recommender system may be extremely magnified over time." The researcher wrote at the end of the paper.

The data used for face recognition for training has huge deviations

  The prejudice brought about by face recognition algorithms has attracted more and more attention. For example, the PULSE algorithm, which can clarify blurred photos, "restores" the blurred photos of former U.S. President Barack Obama into a white face. Under the background of the BLM movement (Black Lives Matter, black lives are also fate) in the United States, Caused a huge controversy.

  An important reason for algorithmic deviation and discrimination in the field of face recognition is that the data set used for training has a large deviation. Researchers from the University of Cambridge and the Middle East University of Science and Technology found evidence from two datasets used to recognize facial expressions.

  These two data sets are: RAF-DB and CelebA. Among them, RAF-DB contains tens of thousands of pictures from the Internet, these pictures include facial expressions and attribute annotations, while CelebA has 202,599 images, including 10,177 people with 40 attribute annotations.

  In order to determine the degree of deviation between the two data sets, the researchers sampled a random subset and cropped the image to keep the face consistent in orientation. Then, they use classifiers to measure accuracy and fairness.

  In theory, in order to keep the algorithm accurate and fair, the classifier should provide similar results for different population groups throughout the process. But this is not the case.

  In the RAF-DB database, the vast majority of pictures are from whites aged 20-39. From specific data, 77.4% of these pictures are from whites, 15.5% are from Asians, and only 7.1% are from African Americans; in terms of gender, 56.3% are females and 43.7% males; in terms of age, more than Half of the pictures are from young people aged 20-39, and less than 10% of those under 3 and over 70.

  To further study the degree of bias in the database, the researchers used three algorithms to evaluate the accuracy and fairness of the database. The results found that in terms of accuracy, the RAF-DB database is less accurate in identifying minorities than whites; in terms of fairness, gender attributes are relatively fairer at 97.3%, and the fairness of race and age is relatively low. 88.1% and 77.7%.

  In the source of pictures in the CelebA database, the proportion of women is 61.4%, while that of men is only 38.6%. In terms of age, young people accounted for 75.7%, significantly surpassing the elderly who accounted for 24.3%.

  In terms of accuracy, the accuracy of the CelebA database for young women is 93.7%, but the accuracy for older men is low, at 90.7%. The fairness performance of the database in terms of gender and age are both good, 98.2% and 98.1% respectively.

  Many companies have used facial recognition software to score interviewers' emotions. If the entire system is biased, it means that it is unfair to interviewers. The existence of prejudice in the facial expression dataset also highlights the need for supervision. How to use laws to prevent technology abuse has become one of the issues worth considering in this field in the future.

  The Paper News reporter Wang Xinxin and intern He Qingyi