vir-simd 0.4.189
Parallelism TS 2 extensions and simd fallback implementation
Loading...
Searching...
No Matches
vir::execution::simd_policy< Options > Struct Template Reference

Type of the vir::execution::simd execution policy. More...

#include <simd_execution.h>

Static Public Member Functions

static constexpr simd_policy< Options..., detail::simd_policy_prefer_aligned_t > prefer_aligned ()
 
static constexpr simd_policy< Options..., detail::simd_policy_auto_prologue_t > auto_prologue ()
 
static constexpr simd_policy< Options..., detail::simd_policy_assume_matching_size_t > assume_matching_size ()
 
template<int N>
requires (_unroll_by == 0)
static constexpr simd_policy< Options..., detail::simd_policy_unroll_by_t< N > > unroll_by ()
 
template<int N>
requires (_size == 0)
static constexpr simd_policy< Options..., detail::simd_policy_size_t< N > > prefer_size ()
 

Detailed Description

template<typename... Options>
struct vir::execution::simd_policy< Options >

Type of the vir::execution::simd execution policy.

Member Function Documentation

◆ prefer_aligned()

template<typename... Options>
static constexpr simd_policy< Options..., detail::simd_policy_prefer_aligned_t > vir::execution::simd_policy< Options >::prefer_aligned ( )
inlinestaticconstexpr

Unconditionally iterate using smaller chunks, until the main iteration can load (and store) chunks from/to aligned addresses. This can be more efficient if the range is large, avoiding cache-line splits. (e.g. with AVX-512, unaligned iteration leads to cache-line splits on every iteration; with AVX on every second iteration)

◆ auto_prologue()

template<typename... Options>
static constexpr simd_policy< Options..., detail::simd_policy_auto_prologue_t > vir::execution::simd_policy< Options >::auto_prologue ( )
inlinestaticconstexpr
Warning
still testing its viability, may be removed

Determine from run-time information (i.e. add a branch) whether a prologue for alignment of the main chunked iteration might be more efficient.

◆ assume_matching_size()

template<typename... Options>
static constexpr simd_policy< Options..., detail::simd_policy_assume_matching_size_t > vir::execution::simd_policy< Options >::assume_matching_size ( )
inlinestaticconstexpr

Add a precondition to the algorithm, that the given range size is a multiple of the SIMD width (but not the SIMD width multiplied by the above unroll factor). This modifier is only valid without prologue (the following two modifiers). The algorithm consequently does not implement an epilogue and all given callables are called with a single simd type (same width and ABI tag). This can reduce code size significantly.

◆ unroll_by()

template<typename... Options>
template<int N>
requires (_unroll_by == 0)
static constexpr simd_policy< Options..., detail::simd_policy_unroll_by_t< N > > vir::execution::simd_policy< Options >::unroll_by ( )
inlinestaticconstexpr

Iterate over the range in chunks of simd::size() * M instead of just simd::size(). The algorithm will execute M loads (or stores) together before/after calling the user-supplied function(s). The user-supplied function may be called with M simd objects instead of one simd object. Note that prologue and epilogue will typically still call the user-supplied function with a single simd object.

Algorithms like std::count_if require a return value from the user-supplied function and therefore still call the function with a single simd (to avoid the need for returning an array or tuple of simd_mask). Such algorithms will still make use of unrolling inside their implementation.

◆ prefer_size()

template<typename... Options>
template<int N>
requires (_size == 0)
static constexpr simd_policy< Options..., detail::simd_policy_size_t< N > > vir::execution::simd_policy< Options >::prefer_size ( )
inlinestaticconstexpr

Start with chunking the range into parts of N elements, calling the user-supplied function(s) with objects of type resize_simd_t<N, simd<T>>.


The documentation for this struct was generated from the following file: