feature article
Subscribe Now

Intel’s latest version of oneAPI takes advantage of new Intel Xeon improvements, supports AMD and Nvidia

In its quest to make oneAPI a viable alternative to Nvidia’s CUDA for parallel-processing software development, Intel has released the 2023.1 version of its oneAPI tools. Last August in EEJournal, I wrote:

“Nvidia has something that Intel and AMD covet. No, it’s not GPUs. Intel and AMD both make GPUs. However, they don’t have Nvidia’s not-so-secret weapon that’s a close GPU companion: CUDA, the parallel programming language that allows developers to harness GPUs to accelerate general-purpose (non-graphics) algorithms. Since its introduction in 2006, CUDA has become a tremendous and so-far unrivaled competitive advantage for Nvidia because it works with Nvidia GPUs, and only with Nvidia GPUs. Understandably, neither Intel nor AMD plan to let that competitive advantage go unchallenged.”

(See “Intel oneAPI and DPC++: One Programming Language to Rule Them All (CPUs, GPUs, FPGAs, etc)”)

James Reinders published a blog in April titled, “2023.1: Refining Intel® oneAPI 2023 Tools” that described many of the improvements made to this latest version of Intel oneAPI tools. Reinders retired from and then rejoined Intel and is author of the book titled “Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL.” According to Reinders’s blog, improvements to the Intel oneAPI toolkit include:

·         Compiler support for automatically enabling bfloat16 (Brain Floating Point, 16-bits) when available.  The bfloat16 format was developed by Google Brain, an artificial intelligence research group at Google, and is used in Google Tensor Processing Units (TPUs). The 16-bit format represents a wide dynamic range of numeric values using a truncated (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format. It’s become widely used for accelerating machine-learning (ML) algorithms. This format approximates the wide dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits but employs 8-bit precision instead of the 24-bit significand of IEEE 754. The bfloat16 format reduces storage requirements and speeds execution for ML applications. The 4th Gen Intel Xeon Processor’s version of Advanced Matrix Extension (AMX) acceleration directly supports bfloat16 dot-product calculations.

·         The 2023.1 oneAPI toolkit update supports new Codeplay oneAPI plugins for NVIDIA and AMD. (Intel acquired CodePlay last year.) The AMD plugin now works with AMD’s ROCm 5.x driver. ROCm is AMD’s own answer to Nvidia’s CUDA. This new plugin support reinforces Intel’s plan to make oneAPI the preferred alternative for heterogeneous, parallel programming. Last October, James Reinders at Intel had this to say about Intel’s acquisition of CodePlay: “The company Codeplay became available, and Intel decided to acquire them. I was thrilled. I’ve worked with the people at Codeplay and have loved working with them. They’ve been working on Nvidia and AMD GPUs for a while, but, as a commercial company, they were always looking for someone to underwrite their work. Will a customer want it? Some of the labs sometimes gave them seed money, but not enough to fully productize their work. I hesitate a little to say, “blank check,” but they essentially now have a blank check from Intel to productize their work, and they don’t need to worry about anyone else paying for it. You should see results from this acquisition later this year. You’ll see their tools integrate with Intel’s releases of SYCL so that SYCL/DPC++ ends up being able to target all GPUs from Intel, Nvidia, and AMD. People in the know could build this sort of software using open-source tools over the last year. But let’s face it, most of us want to be as lazy as we can be. I really like just being able to download a binary with a click, install it, and have it just work, instead of building it from open-source files and reading lots of instructions to turn the files into usable tools.” (See “Intel’s Gamble on oneAPI and DPC++ for Parallel Processing and Heterogeneous Computing: An Interview with Intel’s James Reinders.”)

  • The Intel VTune Profiler can automatically highlight profiles that improve performance by exploiting high-bandwidth memory (HBM) on the recently introduced Intel Xeon Processor Max Series, which the company introduced early this year. The Xeon CPU Max Series incorporates 64 gigabytes of HBM2e high-bandwidth DRAM in the package. With the release of this latest version of the oneAPI toolkit, Intel is making it easier for software developers to unlock the performance potential of that HBM2e memory.
  • This latest version of the oneAPI toolkit delivers performance increases for photorealistic ray tracing and path guiding from the Intel Open Path Guiding Library (integrated in Blender and Chaos V-Ray) on 4th gen Intel Xeon Processors.
  • The 2023.1 version of the oneAPI toolkit includes updates for the latest CUDA headers and libraries to help software developers migrate Nvidia’s CUDA code to SYCL, the heterogeneous, parallel-processing version of the C++ programming language developed by the Khronos Group. SYCL is the core programming language for the Intel oneAPI toolkit. Intel’s DPC++ is the company’s special flavor of SYCL.
  • The version of the oneAPI toolkit adds support for Intel Arc GPUs to the Intel Distribution for the GDB debugger on Windows. Last September, Intel acquired GPU specialist ArrayFire, a small team of four engineers who specialize in GPU software development. Intel has a lot riding on its Arc GPU family, and so this announcement at least shows some continued support for these GPUs. With the departure of the Arc GPU’s chief architect Rajah Koduri in March of this year, the future status of the company’s GPUs has been somewhat cloudy. This latest release of the oneAPI toolkit at least indicates some continued support.
  • Intel® MPI Library enhances performance for collectives using GPU buffers and default process pinning on CPUs with E-cores and P-cores on Intel processors. P-cores are x86 performance cores, and E-cores are smaller, lower-power, lower-performance processor cores found on Intel Core processors. They will be available next year on some Intel Xeon processors.

With this latest release, Intel continues to put corporate energy and muscle into the oneAPI toolkit’s development, signaling the company’s ongoing commitment to oneAPI. Previous improvements and these latest developments underscore Intel’s understanding of the importance of developing software tools that unlock the potential performance enshrined in its latest silicon offerings.

Leave a Reply

featured blogs
Sep 21, 2023
Wireless communication in workplace wearables protects and boosts the occupational safety and productivity of industrial workers and front-line teams....
Sep 26, 2023
Explore the LPDDR5X specification and learn how to leverage speed and efficiency improvements over LPDDR5 for ADAS, smartphones, AI accelerators, and beyond.The post How LPDDR5X Delivers the Speed Your Designs Need appeared first on Chip Design....
Sep 26, 2023
The eighth edition of the Women in CFD series features Mary Alarcon Herrera , a product engineer for the Cadence Computational Fluid Dynamics (CFD) team. Mary's unwavering passion and dedication toward a career in CFD has been instrumental in her success and has led her ...
Sep 21, 2023
Not knowing all the stuff I don't know didn't come easy. I've had to read a lot of books to get where I am....

featured video

TDK PowerHap Piezo Actuators for Ideal Haptic Feedback

Sponsored by TDK

The PowerHap product line features high acceleration and large forces in a very compact design, coupled with a short response time. TDK’s piezo actuators also offers good sensing functionality by using the inverse piezo effect. Typical applications for the include automotive displays, smartphones and tablet.

Click here for more information about PowerHap Piezo Actuators

featured paper

Intel's Chiplet Leadership Delivers Industry-Leading Capabilities at an Accelerated Pace

Sponsored by Intel

We're proud of our long history of rapid innovation in #FPGA development. With the help of Intel's Embedded Multi-Die Interconnect Bridge (EMIB), we’ve been able to advance our FPGAs at breakneck speed. In this blog, Intel’s Deepali Trehan charts the incredible history of our chiplet technology advancement from 2011 to today, and the many advantages of Intel's programmable logic devices, including the flexibility to combine a variety of IP from different process nodes and foundries, quicker time-to-market for new technologies and the ability to build higher-capacity semiconductors

To learn more about chiplet architecture in Intel FPGA devices visit: https://intel.ly/47JKL5h

featured chalk talk

Beyond the SOT23: The Future of Smaller Packages
Sponsored by Mouser Electronics and Nexperia
There is a megatrend throughout electronic engineering that is pushing us toward smaller and smaller components and printed circuit boards. In this episode of Chalk Talk, Tom Wolf from Nexperia and Amelia Dalton explore the benefits of a smaller package size for the SOT23. They investigate how new package sizes for this SMD can lower your BOM, decrease your board space and more.
Oct 20, 2022
39,897 views