feature article
Subscribe Now

Intel’s latest version of oneAPI takes advantage of new Intel Xeon improvements, supports AMD and Nvidia

In its quest to make oneAPI a viable alternative to Nvidia’s CUDA for parallel-processing software development, Intel has released the 2023.1 version of its oneAPI tools. Last August in EEJournal, I wrote:

“Nvidia has something that Intel and AMD covet. No, it’s not GPUs. Intel and AMD both make GPUs. However, they don’t have Nvidia’s not-so-secret weapon that’s a close GPU companion: CUDA, the parallel programming language that allows developers to harness GPUs to accelerate general-purpose (non-graphics) algorithms. Since its introduction in 2006, CUDA has become a tremendous and so-far unrivaled competitive advantage for Nvidia because it works with Nvidia GPUs, and only with Nvidia GPUs. Understandably, neither Intel nor AMD plan to let that competitive advantage go unchallenged.”

(See “Intel oneAPI and DPC++: One Programming Language to Rule Them All (CPUs, GPUs, FPGAs, etc)”)

James Reinders published a blog in April titled, “2023.1: Refining Intel® oneAPI 2023 Tools” that described many of the improvements made to this latest version of Intel oneAPI tools. Reinders retired from and then rejoined Intel and is author of the book titled “Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL.” According to Reinders’s blog, improvements to the Intel oneAPI toolkit include:

·         Compiler support for automatically enabling bfloat16 (Brain Floating Point, 16-bits) when available.  The bfloat16 format was developed by Google Brain, an artificial intelligence research group at Google, and is used in Google Tensor Processing Units (TPUs). The 16-bit format represents a wide dynamic range of numeric values using a truncated (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format. It’s become widely used for accelerating machine-learning (ML) algorithms. This format approximates the wide dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits but employs 8-bit precision instead of the 24-bit significand of IEEE 754. The bfloat16 format reduces storage requirements and speeds execution for ML applications. The 4th Gen Intel Xeon Processor’s version of Advanced Matrix Extension (AMX) acceleration directly supports bfloat16 dot-product calculations.

·         The 2023.1 oneAPI toolkit update supports new Codeplay oneAPI plugins for NVIDIA and AMD. (Intel acquired CodePlay last year.) The AMD plugin now works with AMD’s ROCm 5.x driver. ROCm is AMD’s own answer to Nvidia’s CUDA. This new plugin support reinforces Intel’s plan to make oneAPI the preferred alternative for heterogeneous, parallel programming. Last October, James Reinders at Intel had this to say about Intel’s acquisition of CodePlay: “The company Codeplay became available, and Intel decided to acquire them. I was thrilled. I’ve worked with the people at Codeplay and have loved working with them. They’ve been working on Nvidia and AMD GPUs for a while, but, as a commercial company, they were always looking for someone to underwrite their work. Will a customer want it? Some of the labs sometimes gave them seed money, but not enough to fully productize their work. I hesitate a little to say, “blank check,” but they essentially now have a blank check from Intel to productize their work, and they don’t need to worry about anyone else paying for it. You should see results from this acquisition later this year. You’ll see their tools integrate with Intel’s releases of SYCL so that SYCL/DPC++ ends up being able to target all GPUs from Intel, Nvidia, and AMD. People in the know could build this sort of software using open-source tools over the last year. But let’s face it, most of us want to be as lazy as we can be. I really like just being able to download a binary with a click, install it, and have it just work, instead of building it from open-source files and reading lots of instructions to turn the files into usable tools.” (See “Intel’s Gamble on oneAPI and DPC++ for Parallel Processing and Heterogeneous Computing: An Interview with Intel’s James Reinders.”)

  • The Intel VTune Profiler can automatically highlight profiles that improve performance by exploiting high-bandwidth memory (HBM) on the recently introduced Intel Xeon Processor Max Series, which the company introduced early this year. The Xeon CPU Max Series incorporates 64 gigabytes of HBM2e high-bandwidth DRAM in the package. With the release of this latest version of the oneAPI toolkit, Intel is making it easier for software developers to unlock the performance potential of that HBM2e memory.
  • This latest version of the oneAPI toolkit delivers performance increases for photorealistic ray tracing and path guiding from the Intel Open Path Guiding Library (integrated in Blender and Chaos V-Ray) on 4th gen Intel Xeon Processors.
  • The 2023.1 version of the oneAPI toolkit includes updates for the latest CUDA headers and libraries to help software developers migrate Nvidia’s CUDA code to SYCL, the heterogeneous, parallel-processing version of the C++ programming language developed by the Khronos Group. SYCL is the core programming language for the Intel oneAPI toolkit. Intel’s DPC++ is the company’s special flavor of SYCL.
  • The version of the oneAPI toolkit adds support for Intel Arc GPUs to the Intel Distribution for the GDB debugger on Windows. Last September, Intel acquired GPU specialist ArrayFire, a small team of four engineers who specialize in GPU software development. Intel has a lot riding on its Arc GPU family, and so this announcement at least shows some continued support for these GPUs. With the departure of the Arc GPU’s chief architect Rajah Koduri in March of this year, the future status of the company’s GPUs has been somewhat cloudy. This latest release of the oneAPI toolkit at least indicates some continued support.
  • Intel® MPI Library enhances performance for collectives using GPU buffers and default process pinning on CPUs with E-cores and P-cores on Intel processors. P-cores are x86 performance cores, and E-cores are smaller, lower-power, lower-performance processor cores found on Intel Core processors. They will be available next year on some Intel Xeon processors.

With this latest release, Intel continues to put corporate energy and muscle into the oneAPI toolkit’s development, signaling the company’s ongoing commitment to oneAPI. Previous improvements and these latest developments underscore Intel’s understanding of the importance of developing software tools that unlock the potential performance enshrined in its latest silicon offerings.

2 thoughts on “Intel’s latest version of oneAPI takes advantage of new Intel Xeon improvements, supports AMD and Nvidia”

Leave a Reply

featured blogs
Oct 9, 2024
Have you ever noticed that dogs tend to circle around a few times before they eventually take a weight off their minds?...

featured chalk talk

Industrial Internet of Things
Sponsored by Mouser Electronics and CUI Inc.
In this episode of Chalk Talk, Amelia Dalton and Bruce Rose from CUI Inc explore power supply design concerns associated with IIoT applications. They investigate the roles that thermal conduction and convection play in these power supplies and the benefits that CUI Inc. power supplies bring to these kinds of designs.
Aug 16, 2024
33,507 views