Banner_viho - 副本

Embracing ASIC, AMD releases accelerator card

When talking about why it acquired Xilinx at a valuation of US$35 billion, AMD once said that in the next ten years, high-performance computing will be at the core of almost all major trends that will affect the future world.

“While CPUs and GPUs will remain the key computing components of these devices, in a world where algorithms are advancing and new standards are emerging, it is critical to accelerate these emerging and changing workloads, and we foresee a significant impact on adaptive The demand for computing power will continue to grow.” AMD further pointed out. ”Xilinx’s leading FPGAs, adaptive SoCs, artificial intelligence engines and software expertise will empower AMD to bring a powerful portfolio of high-performance and adaptive computing solutions and help us in cloud computing, edge computing and smart devices. A larger share of the market opportunity,” added AMD Chairman and CEO Lisa Su.

Judging from the development of AMD in recent years, they are fulfilling their original commitments at the time of acquisition. The recently launched media accelerator Alveo MA35D, which is designed to promote a new era of large-scale live interactive streaming services, is one of them.

 

In the future streaming media market, the CPU is “exhausted”

According to Sean Gardner, Director of AMD Video Strategy and Market Development, the reason why the company will launch such a product has a lot to do with the current status of the streaming media market.

He first said that in the current live broadcast market, both in terms of revenue and in terms of infrastructure deployment, the growth is very rapid. According to survey data in 2021, more than 70% of the global video market is dominated by live content. The current traditional broadcast streaming media is mainly supported by software and CPU. Traditional live broadcast activities also mainly adopt a one-to-many mode. Since the number of video streams is relatively small and the delay is relatively controllable, more traditional existing network forms can be used to support live broadcast services.

He also pointed out that the next generation of live broadcast scenarios will mainly be a many-to-many model (that is, everyone is an anchor). At that time, these anchors will be both data sources and receivers. Such scenarios also include online viewing, live shopping, and online auctions. and social streaming, etc. Such application scenarios also require data processing to be closer to users, and require such processing to be transferred to the edge of the network.

“Processing these application scenarios at the edge means that the economic benefits that can be obtained through cloud centralization no longer exist, so it is necessary to completely change the infrastructure deployment model. In other words, with the current live streaming media The requirements for latency are getting higher and the cost of deploying at the edge is also increasing, which in turn drives us to develop a new generation of live interactive streaming solutions.” Sean Gardner said. In his view, such real-time, interactive streaming media application scenarios require low latency and large capacity. The new architecture can adapt to the cost pressure brought about by these changes. In response to these needs, AMD has brought the company a new generation of products – AMD Alveo MA35D media accelerator.

AMD 

5nm ASIC, power media accelerator card

Before introducing this product in detail, we must first point out that although this is the first media accelerator under the AMD brand, it is actually a follow-up to Xilinx’s previous release.

Back in 2018, Xilinx debuted Alveo, a powerful accelerator card designed for data centers. Users are expected to achieve breakthrough performance improvements with lower latency when running real-time machine learning inferences and key data center applications such as video processing, genomics, and data analysis through Alveo. It can be seen from the relevant introduction that the Alveo series has released a variety of products for various fields, and streaming media acceleration is undoubtedly a market that it pays more attention to.

According to reports, Alveo MA35D uses a dedicated video processing unit to accelerate the overall video processing. By performing all video processing functions on the Video Processing Unit, data migration between the CPU and the accelerator is minimized, reducing overall latency and maximizing channel density of up to 32-ch 1080p60, 8-ch 4Kp60 or 4-ch per card Transcoding density of 8Kp30. The platform also offers ultra-low latency support for popular H.264 and H.265 codecs and features a next-generation AV1 transcoder engine that saves up to 52% in bandwidth.

Like its predecessor, the Alveo U30, the MA35D is a video-only encoding card designed for the data center. As a relatively simple product, the MA35D aims to encode video more optimally and efficiently by focusing on it. From the test data, MA35D has also obtained more dimensional improvements than the previous generation.

 AMD2

AMD said that compared to the previous generation Alveo U30 media accelerator, the Alveo MA35D has 4 times the channel density, 2 times the power consumption per channel, and 4 times the latency. In addition, the Alveo MA35D has excellent performance in all aspects, and there are many additional functions and new capabilities. In terms of power consumption, the card has an official TDP of 50 watts, but in reality AMD has found that the typical power consumption of the card is closer to about 35 watts, or a little over 1 watt per “steam” at 1080p60. Power consumption per “stream” is reduced by 66% compared to the U30, which consumes just over 3W for a single 1080p stream.

According to Sean Gardner, the reason why Alveo MA35D can achieve such excellent performance is firstly attributed to its integration of two new dedicated video processing units (VPU).

According to reports, the MA35D integrates two 5nm VPUs, each with its own 8GB LPDDR5 memory pool and a PCIe 5.0 x4 connection back to the host processor. As shown in the figure above, there are four separate encoder (MP) unit modules that support the AV1 compression standard at the four corners of the chip, which allows customers to enjoy maximum flexibility when deploying applications. When deploying a new compression standard, customers can also use the old standard while adding the new standard. AMD also stated that by performing all video processing functions on the video processing unit, data migration between the CPU and the accelerator can be minimized, thereby reducing overall latency and maximizing channel density, up to 32 channels per card 1080p60, 8 Transcoding density of 4Kp60 or 8Kp30. The platform also offers ultra-low latency support for popular H.264 and H.265 codecs and features a next-generation AV1 transcoder engine that saves up to 52% in bandwidth.

AMD has also integrated AI-enabled intelligent video processing into this product. The accelerator has an integrated artificial intelligence (AI) processor and a dedicated video quality engine, which can improve the quality of experience with lower bandwidth. An AI processor evaluates content frame by frame and dynamically adjusts encoder settings to improve perceived visual quality while minimizing bitrate. Optimization techniques include region-of-interest (ROI) encoding for text and face resolution, artifact detection for correcting violent motion and complex scenes, and content-aware encoding for predictive insights for bitrate optimization.

In order to expand high-capacity streaming media services, it is necessary to maximize the number of channels per server and minimize power consumption and bandwidth per stream. AMD delivers up to 32 lanes of 1080p60 transcoding density per card at 1 watt per stream, a 1U rackmount server with 8 cards can deliver 256 lanes per server, per rack or per data center The transcoding density is maximized.

“So we introduced artificial intelligence to analyze video content during the Alveo MA35D innovation process. Coupled with Alveo MA35D’s artificial intelligence and machine learning capabilities, we can better understand the characteristics of the video, such as the complexity of the video. , what type, is it a synthetic computer game, or some natural content. With the insights and intelligence gained from artificial intelligence and machine learning, we can transmit this dynamic content to the encoder with higher efficiency. Through this This approach allows us to increase efficiency while reducing bandwidth and storage requirements when doing dynamic video processing,” Sean Gardner added.

It is understood that the platform can be accessed through the AMD Media Acceleration Software Development Kit (SDK), supports the widely used FFmpeg and Gstreamer video frameworks, and is easy to develop.

 

In Sean Gardner’s view, Alveo MA35D is not a competing product for AMD’s CPU and GPU, but a complementary product. Because all these products have their own strengths, and they are all very efficient.

He points out that CPUs can provide very high performance compression. But if you’re dealing with millions of streaming videos, it’s not very economical. If you want to require the application scenario of image rendering, GPU is the best tool. There are also some applications that require all three to work together to provide a very cost-effective and high-performance solution. For example, in cloud e-sports or cloud games, the GPU presents as much game content as possible, the Alveo MA35D completes all low-latency high-quality encoding, and the EPYC CPU can complete all application-level system processing. Such a combination can provide customers with the highest density at a very favorable price point and very low power consumption.

The author believes that this 5nm VPU launched by AMD also marks the transition of the Alveo video encoder series to fully ASIC-based products. Xilinx is known for its programmable FPGAs, and the previous generation of the Alveo U30 processor used hard logic for its video encoding block, but it was combined with an FPGA fabric network. So the product is still a hybrid of ASIC and FPGA design. But the VPU on board the MA35D is a true and tried-and-true ASIC, and because there are no FPGA components, the company can take full advantage of the energy efficiency of dedicated products using fixed-function logic.


Post time: Apr-11-2023