vir-simd 0.4.189
Parallelism TS 2 extensions and simd fallback implementation
|
Type of the vir::execution::simd execution policy. More...
#include <simd_execution.h>
Static Public Member Functions | |
static constexpr simd_policy< Options..., detail::simd_policy_prefer_aligned_t > | prefer_aligned () |
static constexpr simd_policy< Options..., detail::simd_policy_auto_prologue_t > | auto_prologue () |
static constexpr simd_policy< Options..., detail::simd_policy_assume_matching_size_t > | assume_matching_size () |
template<int N> requires (_unroll_by == 0) | |
static constexpr simd_policy< Options..., detail::simd_policy_unroll_by_t< N > > | unroll_by () |
template<int N> requires (_size == 0) | |
static constexpr simd_policy< Options..., detail::simd_policy_size_t< N > > | prefer_size () |
Type of the vir::execution::simd execution policy.
|
inlinestaticconstexpr |
Unconditionally iterate using smaller chunks, until the main iteration can load (and store) chunks from/to aligned addresses. This can be more efficient if the range is large, avoiding cache-line splits. (e.g. with AVX-512, unaligned iteration leads to cache-line splits on every iteration; with AVX on every second iteration)
|
inlinestaticconstexpr |
Determine from run-time information (i.e. add a branch) whether a prologue for alignment of the main chunked iteration might be more efficient.
|
inlinestaticconstexpr |
Add a precondition to the algorithm, that the given range size is a multiple of the SIMD width (but not the SIMD width multiplied by the above unroll factor). This modifier is only valid without prologue (the following two modifiers). The algorithm consequently does not implement an epilogue and all given callables are called with a single simd type (same width and ABI tag). This can reduce code size significantly.
|
inlinestaticconstexpr |
Iterate over the range in chunks of simd::size() * M
instead of just simd::size()
. The algorithm will execute M
loads (or stores) together before/after calling the user-supplied function(s). The user-supplied function may be called with M
simd
objects instead of one simd
object. Note that prologue and epilogue will typically still call the user-supplied function with a single simd
object.
Algorithms like std::count_if
require a return value from the user-supplied function and therefore still call the function with a single simd
(to avoid the need for returning an array
or tuple
of simd_mask
). Such algorithms will still make use of unrolling inside their implementation.
|
inlinestaticconstexpr |
Start with chunking the range into parts of N
elements, calling the user-supplied function(s) with objects of type resize_simd_t<N, simd<T>>
.