feature article
Subscribe Now

Intel’s latest version of oneAPI takes advantage of new Intel Xeon improvements, supports AMD and Nvidia

In its quest to make oneAPI a viable alternative to Nvidia’s CUDA for parallel-processing software development, Intel has released the 2023.1 version of its oneAPI tools. Last August in EEJournal, I wrote:

“Nvidia has something that Intel and AMD covet. No, it’s not GPUs. Intel and AMD both make GPUs. However, they don’t have Nvidia’s not-so-secret weapon that’s a close GPU companion: CUDA, the parallel programming language that allows developers to harness GPUs to accelerate general-purpose (non-graphics) algorithms. Since its introduction in 2006, CUDA has become a tremendous and so-far unrivaled competitive advantage for Nvidia because it works with Nvidia GPUs, and only with Nvidia GPUs. Understandably, neither Intel nor AMD plan to let that competitive advantage go unchallenged.”

(See “Intel oneAPI and DPC++: One Programming Language to Rule Them All (CPUs, GPUs, FPGAs, etc)”)

James Reinders published a blog in April titled, “2023.1: Refining Intel® oneAPI 2023 Tools” that described many of the improvements made to this latest version of Intel oneAPI tools. Reinders retired from and then rejoined Intel and is author of the book titled “Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL.” According to Reinders’s blog, improvements to the Intel oneAPI toolkit include:

·         Compiler support for automatically enabling bfloat16 (Brain Floating Point, 16-bits) when available.  The bfloat16 format was developed by Google Brain, an artificial intelligence research group at Google, and is used in Google Tensor Processing Units (TPUs). The 16-bit format represents a wide dynamic range of numeric values using a truncated (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format. It’s become widely used for accelerating machine-learning (ML) algorithms. This format approximates the wide dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits but employs 8-bit precision instead of the 24-bit significand of IEEE 754. The bfloat16 format reduces storage requirements and speeds execution for ML applications. The 4th Gen Intel Xeon Processor’s version of Advanced Matrix Extension (AMX) acceleration directly supports bfloat16 dot-product calculations.

·         The 2023.1 oneAPI toolkit update supports new Codeplay oneAPI plugins for NVIDIA and AMD. (Intel acquired CodePlay last year.) The AMD plugin now works with AMD’s ROCm 5.x driver. ROCm is AMD’s own answer to Nvidia’s CUDA. This new plugin support reinforces Intel’s plan to make oneAPI the preferred alternative for heterogeneous, parallel programming. Last October, James Reinders at Intel had this to say about Intel’s acquisition of CodePlay: “The company Codeplay became available, and Intel decided to acquire them. I was thrilled. I’ve worked with the people at Codeplay and have loved working with them. They’ve been working on Nvidia and AMD GPUs for a while, but, as a commercial company, they were always looking for someone to underwrite their work. Will a customer want it? Some of the labs sometimes gave them seed money, but not enough to fully productize their work. I hesitate a little to say, “blank check,” but they essentially now have a blank check from Intel to productize their work, and they don’t need to worry about anyone else paying for it. You should see results from this acquisition later this year. You’ll see their tools integrate with Intel’s releases of SYCL so that SYCL/DPC++ ends up being able to target all GPUs from Intel, Nvidia, and AMD. People in the know could build this sort of software using open-source tools over the last year. But let’s face it, most of us want to be as lazy as we can be. I really like just being able to download a binary with a click, install it, and have it just work, instead of building it from open-source files and reading lots of instructions to turn the files into usable tools.” (See “Intel’s Gamble on oneAPI and DPC++ for Parallel Processing and Heterogeneous Computing: An Interview with Intel’s James Reinders.”)

  • The Intel VTune Profiler can automatically highlight profiles that improve performance by exploiting high-bandwidth memory (HBM) on the recently introduced Intel Xeon Processor Max Series, which the company introduced early this year. The Xeon CPU Max Series incorporates 64 gigabytes of HBM2e high-bandwidth DRAM in the package. With the release of this latest version of the oneAPI toolkit, Intel is making it easier for software developers to unlock the performance potential of that HBM2e memory.
  • This latest version of the oneAPI toolkit delivers performance increases for photorealistic ray tracing and path guiding from the Intel Open Path Guiding Library (integrated in Blender and Chaos V-Ray) on 4th gen Intel Xeon Processors.
  • The 2023.1 version of the oneAPI toolkit includes updates for the latest CUDA headers and libraries to help software developers migrate Nvidia’s CUDA code to SYCL, the heterogeneous, parallel-processing version of the C++ programming language developed by the Khronos Group. SYCL is the core programming language for the Intel oneAPI toolkit. Intel’s DPC++ is the company’s special flavor of SYCL.
  • The version of the oneAPI toolkit adds support for Intel Arc GPUs to the Intel Distribution for the GDB debugger on Windows. Last September, Intel acquired GPU specialist ArrayFire, a small team of four engineers who specialize in GPU software development. Intel has a lot riding on its Arc GPU family, and so this announcement at least shows some continued support for these GPUs. With the departure of the Arc GPU’s chief architect Rajah Koduri in March of this year, the future status of the company’s GPUs has been somewhat cloudy. This latest release of the oneAPI toolkit at least indicates some continued support.
  • Intel® MPI Library enhances performance for collectives using GPU buffers and default process pinning on CPUs with E-cores and P-cores on Intel processors. P-cores are x86 performance cores, and E-cores are smaller, lower-power, lower-performance processor cores found on Intel Core processors. They will be available next year on some Intel Xeon processors.

With this latest release, Intel continues to put corporate energy and muscle into the oneAPI toolkit’s development, signaling the company’s ongoing commitment to oneAPI. Previous improvements and these latest developments underscore Intel’s understanding of the importance of developing software tools that unlock the potential performance enshrined in its latest silicon offerings.

2 thoughts on “Intel’s latest version of oneAPI takes advantage of new Intel Xeon improvements, supports AMD and Nvidia”

Leave a Reply

featured blogs
Apr 24, 2024
Learn about maskless electron beam lithography and see how Multibeam's industry-first e-beam semiconductor lithography system leverages Synopsys software.The post Synopsys and Multibeam Accelerate Innovation with First Production-Ready E-Beam Lithography System appeared fir...
Apr 24, 2024
Diversity, equity, and inclusion (DEI) are not just words but values that are exemplified through our culture at Cadence. In the DEI@Cadence blog series, you'll find a community where employees share their perspectives and experiences. By providing a glimpse of their personal...
Apr 18, 2024
Are you ready for a revolution in robotic technology (as opposed to a robotic revolution, of course)?...

featured video

MaxLinear Integrates Analog & Digital Design in One Chip with Cadence 3D Solvers

Sponsored by Cadence Design Systems

MaxLinear has the unique capability of integrating analog and digital design on the same chip. Because of this, the team developed some interesting technology in the communication space. In the optical infrastructure domain, they created the first fully integrated 5nm CMOS PAM4 DSP. All their products solve critical communication and high-frequency analysis challenges.

Learn more about how MaxLinear is using Cadence’s Clarity 3D Solver and EMX Planar 3D Solver in their design process.

featured paper

Designing Robust 5G Power Amplifiers for the Real World

Sponsored by Keysight

Simulating 5G power amplifier (PA) designs at the component and system levels with authentic modulation and high-fidelity behavioral models increases predictability, lowers risk, and shrinks schedules. Simulation software enables multi-technology layout and multi-domain analysis, evaluating the impacts of 5G PA design choices while delivering accurate results in a single virtual workspace. This application note delves into how authentic modulation enhances predictability and performance in 5G millimeter-wave systems.

Download now to revolutionize your design process.

featured chalk talk

Nexperia Energy Harvesting Solutions
Sponsored by Mouser Electronics and Nexperia
Energy harvesting is a great way to ensure a sustainable future of electronics by eliminating batteries and e-waste. In this episode of Chalk Talk, Amelia Dalton and Rodrigo Mesquita from Nexperia explore the process of designing in energy harvesting and why Nexperia’s inductor-less PMICs are an energy harvesting game changer for wearable technology, sensor-based applications, and more!
May 9, 2023
40,693 views