Full text loading...
-
OpenVec Portable SIMD Intrinsics
- Publisher: European Association of Geoscientists & Engineers
- Source: Conference Proceedings, Second EAGE Workshop on High Performance Computing for Upstream, Sep 2015, Volume 2015, p.1 - 5
Abstract
Today, the widest vector units found on a mass production processor are in the Intel Xeon Phi coprocessor with its 512-bit vector registers. These vector units have a theoretical single precision peak performance gain of 16x for single flop operations. In practice, due to limiting factors like memory access latency, I/O demand, serial code sections, and global synchronization, the real performance improvement number is typically much lower.
In this work, we present a solution to take advantage of vector units across various processor SIMD architectures with a single, portable source code. This is accomplished by just adding a vector type and hardware intrinsics support to C/C++ language through a header file that is compatible with gcc and commercially available compilers in general. We hide different hardware/compiler feature sets under a common portable programming syntax. In addition, the implementation supports a scalar backend alternative to target unknown architectures.
This implementation has been successfully demonstrated on multiple SIMD architectures including Intel SSE/AVX/AVX-512/IMCI, ARM NEON and IBM Power VSX using only a common header file to enable the compiler to generate highly optimized code with proper SIMD instructions for the given underlying architecture.