feature article
Subscribe Now

Intel’s latest version of oneAPI takes advantage of new Intel Xeon improvements, supports AMD and Nvidia

In its quest to make oneAPI a viable alternative to Nvidia’s CUDA for parallel-processing software development, Intel has released the 2023.1 version of its oneAPI tools. Last August in EEJournal, I wrote:

“Nvidia has something that Intel and AMD covet. No, it’s not GPUs. Intel and AMD both make GPUs. However, they don’t have Nvidia’s not-so-secret weapon that’s a close GPU companion: CUDA, the parallel programming language that allows developers to harness GPUs to accelerate general-purpose (non-graphics) algorithms. Since its introduction in 2006, CUDA has become a tremendous and so-far unrivaled competitive advantage for Nvidia because it works with Nvidia GPUs, and only with Nvidia GPUs. Understandably, neither Intel nor AMD plan to let that competitive advantage go unchallenged.”

(See “Intel oneAPI and DPC++: One Programming Language to Rule Them All (CPUs, GPUs, FPGAs, etc)”)

James Reinders published a blog in April titled, “2023.1: Refining Intel® oneAPI 2023 Tools” that described many of the improvements made to this latest version of Intel oneAPI tools. Reinders retired from and then rejoined Intel and is author of the book titled “Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL.” According to Reinders’s blog, improvements to the Intel oneAPI toolkit include:

·         Compiler support for automatically enabling bfloat16 (Brain Floating Point, 16-bits) when available.  The bfloat16 format was developed by Google Brain, an artificial intelligence research group at Google, and is used in Google Tensor Processing Units (TPUs). The 16-bit format represents a wide dynamic range of numeric values using a truncated (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format. It’s become widely used for accelerating machine-learning (ML) algorithms. This format approximates the wide dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits but employs 8-bit precision instead of the 24-bit significand of IEEE 754. The bfloat16 format reduces storage requirements and speeds execution for ML applications. The 4th Gen Intel Xeon Processor’s version of Advanced Matrix Extension (AMX) acceleration directly supports bfloat16 dot-product calculations.

·         The 2023.1 oneAPI toolkit update supports new Codeplay oneAPI plugins for NVIDIA and AMD. (Intel acquired CodePlay last year.) The AMD plugin now works with AMD’s ROCm 5.x driver. ROCm is AMD’s own answer to Nvidia’s CUDA. This new plugin support reinforces Intel’s plan to make oneAPI the preferred alternative for heterogeneous, parallel programming. Last October, James Reinders at Intel had this to say about Intel’s acquisition of CodePlay: “The company Codeplay became available, and Intel decided to acquire them. I was thrilled. I’ve worked with the people at Codeplay and have loved working with them. They’ve been working on Nvidia and AMD GPUs for a while, but, as a commercial company, they were always looking for someone to underwrite their work. Will a customer want it? Some of the labs sometimes gave them seed money, but not enough to fully productize their work. I hesitate a little to say, “blank check,” but they essentially now have a blank check from Intel to productize their work, and they don’t need to worry about anyone else paying for it. You should see results from this acquisition later this year. You’ll see their tools integrate with Intel’s releases of SYCL so that SYCL/DPC++ ends up being able to target all GPUs from Intel, Nvidia, and AMD. People in the know could build this sort of software using open-source tools over the last year. But let’s face it, most of us want to be as lazy as we can be. I really like just being able to download a binary with a click, install it, and have it just work, instead of building it from open-source files and reading lots of instructions to turn the files into usable tools.” (See “Intel’s Gamble on oneAPI and DPC++ for Parallel Processing and Heterogeneous Computing: An Interview with Intel’s James Reinders.”)

  • The Intel VTune Profiler can automatically highlight profiles that improve performance by exploiting high-bandwidth memory (HBM) on the recently introduced Intel Xeon Processor Max Series, which the company introduced early this year. The Xeon CPU Max Series incorporates 64 gigabytes of HBM2e high-bandwidth DRAM in the package. With the release of this latest version of the oneAPI toolkit, Intel is making it easier for software developers to unlock the performance potential of that HBM2e memory.
  • This latest version of the oneAPI toolkit delivers performance increases for photorealistic ray tracing and path guiding from the Intel Open Path Guiding Library (integrated in Blender and Chaos V-Ray) on 4th gen Intel Xeon Processors.
  • The 2023.1 version of the oneAPI toolkit includes updates for the latest CUDA headers and libraries to help software developers migrate Nvidia’s CUDA code to SYCL, the heterogeneous, parallel-processing version of the C++ programming language developed by the Khronos Group. SYCL is the core programming language for the Intel oneAPI toolkit. Intel’s DPC++ is the company’s special flavor of SYCL.
  • The version of the oneAPI toolkit adds support for Intel Arc GPUs to the Intel Distribution for the GDB debugger on Windows. Last September, Intel acquired GPU specialist ArrayFire, a small team of four engineers who specialize in GPU software development. Intel has a lot riding on its Arc GPU family, and so this announcement at least shows some continued support for these GPUs. With the departure of the Arc GPU’s chief architect Rajah Koduri in March of this year, the future status of the company’s GPUs has been somewhat cloudy. This latest release of the oneAPI toolkit at least indicates some continued support.
  • Intel® MPI Library enhances performance for collectives using GPU buffers and default process pinning on CPUs with E-cores and P-cores on Intel processors. P-cores are x86 performance cores, and E-cores are smaller, lower-power, lower-performance processor cores found on Intel Core processors. They will be available next year on some Intel Xeon processors.

With this latest release, Intel continues to put corporate energy and muscle into the oneAPI toolkit’s development, signaling the company’s ongoing commitment to oneAPI. Previous improvements and these latest developments underscore Intel’s understanding of the importance of developing software tools that unlock the potential performance enshrined in its latest silicon offerings.

Leave a Reply

featured blogs
Jun 1, 2023
In honor of Pride Month, members of our Synopsys PRIDE employee resource group (ERG) share thoughtful lessons on becoming an LGBTQIA+ ally and more. The post Pride Month 2023: Thoughtful Lessons from the Synopsys PRIDE ERG appeared first on New Horizons for Chip Design....
Jun 1, 2023
It's been 40 years since Jim Solomon, Richard Newton and Alberto Sangiovanni-Vincentelli co-founded SDA Systems, a physical IC design tools company that became Cadence. Most want to measure this year as the 35 th birthday of Cadence, marked by the merger of SDA Systems and EC...
May 8, 2023
If you are planning on traveling to Turkey in the not-so-distant future, then I have a favor to ask....

featured video

Automate PCB P&R Tasks for Designs in Minutes

Sponsored by Cadence Design Systems

Discover how to get a dramatic reduction in design turnaround time by automating your placement, power plane generation, and critical net routing with Cadence® Allegro® X AI technology. Built on and accessed through the Allegro X Design Platform, Allegro X AI reduces P&R tasks from days to minutes with equivalent or higher quality compared with manually designed boards.

Click here for more information

featured paper

EC Solver Tech Brief

Sponsored by Cadence Design Systems

The Cadence® Celsius™ EC Solver supports electronics system designers in managing the most challenging thermal/electronic cooling problems quickly and accurately. By utilizing a powerful computational engine and meshing technology, designers can model and analyze the fluid flow and heat transfer of even the most complex electronic system and ensure the electronic cooling system is reliable.

Click to read more

featured chalk talk

Bluetooth LE Audio
Bluetooth LE Audio is a prominent component in audio innovation today. In this episode of Chalk Talk, Finn Boetius from Nordic Semiconductor and Amelia Dalton discuss the what, where, and how of Bluetooth LE audio. They take a closer look at Bluetooth LE audio profiles, the architecture of Bluetooth LE audio and how you can get started using Bluetooth LE audio in your next design.
Jan 3, 2023