Articles

World’s Fastest AI Super Computer by META

World’s fastest AI supercomputer by META

Why in news:

  • Meta(formerly known as Facebook), says it has built a research supercomputer that is among the fastest on the planet.
  • According to Meta, it will become the world’s fastest supercomputer by the end of 2022.
  • Meta’s AI Research SuperCluster  (RSC) is not completely built yet.
  • When it is complete in mid-2022, it might become the world’s fastest.
  • Meta is not the only company to be working on its supercomputers.
  • Google already has its supercomputer called Sycamore which runs on 53 qubits. Nvidia and Microsoft are also working on their AI supercomputers as machine learning becomes the main avenue for these companies to explore.

About the new supercomputer :

  • RSC is meant to address a critical limit to this growth, the time it takes to train a neural network.
  • Generally, training involves testing a neural network against a large data set, measuring how far it is from doing its job accurately, using that error signal to tweak the network’s parameters, and repeating the cycle until the neural network reaches the needed level of accuracy.
  • It can take weeks of computing for large networks, limiting how many new networks can be trialed in a given year. Several well-funded startups, such as Cerebras and SambaNova, were launched in part to address training times.
  • Among other things, Meta hopes RSC will help it build new neural networks that can do real-time voice translations to large groups of people, each speaking a different language.
  • Ultimately, the work done with RSC will pave the way toward building technologies for the next major computing platform—the metaverse, where AI-driven applications and products will play an important role.

Important :

  • Compared to the AI research cluster Meta uses today, which was designed in 2017, RSC is a change in the number of GPUs involved, how they communicate, and the storage attached to them.
  • The old system connected 22,000 Nvidia V100 Tensor Core GPUs.
  • The new one switches over to Nvidia’s latest core, the A100, which has dominated in recent benchmark tests of AI systems.
  • At present the new system is a cluster of 760 Nvidia DGX A100 computers, with a total of 6,080 GPUs.
  • The computer cluster is bound together using an Nvidia 200-gigabit-per-second Infiniband network.
  • The storage includes 46 petabytes (46 million billion bytes) of cache storage and 175 petabytes of bulk flash storage.
  • Compared to the old V100-based system, RSC marked a 20-fold speedup in computer vision tasks and a 3-fold boost in handling large natural-language processing.

What researchers say:

  • The launch of RSC also comes with a change in the way Meta uses data for research:
  • Unlike with our previous AI research infrastructure, which leveraged only open source and other publicly available data sets, RSC also helps us ensure that our research translates effectively into practice by allowing us to include real-world examples from Meta’s production systems in model training.
  • The researchers write that RSC will be taking extra precautions to encrypt and anonymize this data to prevent and chance of leakage.
  • Those steps include that RSC is isolated from the larger internet—having neither inbound nor outbound connections.
  • Traffic to RSC can flow in only from Meta’s production data centers.
  • In addition, the data path between storage and the GPUs is end-to-end encrypted, and data is anonymized and subject to a review process to confirm the anonymization.

About Meta:

  • Meta (formerly known as facebook) is an American multinational technology conglomerate based in Menlo Park, California.
  • The company is the parent organization of Facebook, Instagram, and WhatsApp
  • Founded January 4, 2004
  • Founders :
  • Mark Zuckerberg
  • Eduardo Saverin
  • Andrew McCollum
  • Dustin Moskovitz
  • Chris Hughes