When a bank suspects that something is not right, they must prepare a report, a so-called Suspicious Activity Report, SAR, or a suspicion report in Swedish.

The report is sent to the US Financial Crimes Investigation Agency, FinCen.

The reports often follow a similar pattern, but during the project it was clear that different bank officials write the reports in different ways.

Common to all reports, however, is that a selection of the suspicious transactions is described in a running text.

For all members of the project, Swedish as well as foreign, it was of great interest to create a database of all customers, banks and transactions mentioned in the more than 2,000 reports that were included in the leak.

SVT's data journalism team tried to find a way to identify text in each suspicion report - and from that text then extract the various parts, such as companies or individuals who were senders and recipients, account numbers, which banks sent or received the transaction, how much money sent and when the transactions took place.

Programs process the documents

SVT's computer journalists chose to use a technology called machine learning, a kind of artificial intelligence where you first make a model.

The model includes a small number of documents that you go through manually and divide the text into the various components, ie transmitter, receiver, bank etc.

The computer program then learns to find the relevant information by practicing the model.

When the result is good enough, you let the program process the other documents.

The first thing the program learned was to find the sentences that would then be divided.

The definition was that the sentence should contain at least one sender or one recipient and an amount.

To optimize, some transactions were selected and the journalists went back to the original document and evaluated what the program had come up with.

The program was written and then adapted after each such check.

After this step, they also learned the program to identify the various components - and here too they made checks back to the original document and fine-tuned the program after these checks.

The method helped to find relevant documents

The end result was a list of 13,500 transactions with all companies, banks, account numbers, amounts and dates that the program found among the suspicion reports.

The computer program identified nearly 19,000 different banks, companies and organizations among the reports.

The program was not 100 percent correct, but still saved a lot of time for the journalists who wanted to compile all transactions for a certain company, or a certain bank.

Through the list, they could easily find the documents that were relevant to their review.