The company «NVIDIA», the largest global company working in the production of independent processing units, devoted to processing graphics inside computers known as «GBU», for a new generation of these processors working with 54 billion transistors, and a new mathematical situation, and electronic RAM »The capacity of 1.2 terabytes, which makes it performs the equivalent of five million billion calculations per second, and increases the performance of computers 10 times. The company said that the new processor is intended to run applications of artificial intelligence and machine learning, and not run games as usual, which was considered by observers a major shift in the march of this type of processor.
The unveiling of the new processor came during the speech of the company’s founder and CEO, Jensen Hwang, at the “Graphics Processors” conference, which was organized by the company last week via the Internet, in which Hwang appeared in his home kitchen, via video conferencing technology, to present the new processors.
And called "NVIDIA" on the new generation "A100", and she said that it built it according to a new architecture, bearing the name "AMP" dedicated to high-performance computing, and that it replaced the processor "V100", which was working with the current "Volta" architecture. The new processor contains 54 billion transistors, which is 2.5 times the number of transistors working on "V100" graphics processors.
Also, in the new processor, the performance of the part known as the "tensioner", which is very vital in applications of artificial intelligence and machine learning, has been improved. This amendment allowed to increase the number of calculations that are measured by the "floating point calculation" criterion, from 2.5 beta-flop in the " At 100 », to five beta flop in the new processor. And (beta flop) is a measure, which means the computer's ability to perform (quadrillion) arithmetic operations per second. And (quadrillion) is equivalent to one million billion, i.e. one to the right of 15 zeros, and therefore the number achieved in this processor is five million billion calculations per second.
And NVIDIA added that the new architecture of the processor allowed the operation of a new "sports mode" called "TF32", which achieves an acceleration of up to 10 times the current sports situations with the processors working with the current "Volta" architecture, and this is very vital In applications of artificial intelligence and machine learning. Because the applications of artificial intelligence and machine learning perform two functions, namely, reasoning, and deduction, through rapid and profound exercises. The current processors used two chips to do each job separately, but the architecture of the new processor implements the two functions on one chip, which makes the part of inference done in a simpler and faster manner, and it consumes less energy.
And NVIDIA greatly improved the memory performance, increasing the built-in memory capacity in the processor core template, and providing it with an additional 40 gigabytes of HPM2 class RAM or high performance RAM.
This step raised the total memory capacity of the processor to 1.2 terabytes, a step similar to that of the Japanese company "Fujitsu", in its new processor chip "E64FX", where it placed high-performance RAM chips, on the template Chair for the processor chip.
NVIDIA also raised the bandwidth by which data is exchanged between the memory and the processor to 1.6 terabytes per second, through the third generation of the "NVLink" technology, for high-speed connectivity, which was developed by "NVIDIA" to double the rate of signals circulating through This technology is from 25.78 GB per second in the current second generation, to 50 GB per second in the new third generation. The company also cut the number of lanes required by half to achieve the same speed, which in turn allows it to double the amount of productivity by the same number of lanes.
The company «NVIDIA» added a new feature for the processor, called «multi-processor graphics processing units», and this feature allows the division of the processor «A100» into up to seven virtual graphics processors, each of which gets its own share of the power of the internal axes of the processor, and a share From the cache, and a share of the memory and random access controllers, which makes it able to perform seven tasks in parallel at one time, without affecting one of the other.