In this post I will talk about regularity and why
std::regular<std::simd<int>>
needs to be false
in order to preserve
regularity at the level where it matters: equational reasoning. The issue of
regularity came up repeatedly when discussing the design of std::simd
for
C++26. (It also came up in 2017 for std::experimental::simd
.) My goal for
this post is the exploration of options and their consequences. There’s a lot
more to be said, but this post is already too long. In any case, when talking
about regularity, we need start with “Elements of Programming”, the book that
introduced the concept:
A type is regular if and only if its basis includes equality, assignment,
destructor, default constructor, copy constructor, total ordering, and
underlying type. […]
Algorithms are abstract when they can be used with different models
satisfying the same requirements, such as associativity. Code optimization
depends on equational reasoning; unless types are known to be regular, few
optimizations can be performed.
Alexander Stepanov, Paul McJones — Elements of Programming (EoP)
One of the major benefits of type-based vectorization is data-structure
vectorization. I’ll introduce and hopefully motivate the pattern in this post.
Why are operator?:
overloads not allowed in C++? See, basically every
operator in C++ is overloadable. Sure, you can do stupid things with such
power, but that’s a general problem with humans that have power. C++ gives
power to programmers, we need to use it wisely. Apparently operator?:
is not
overloadable because: “There is no fundamental reason to disallow overloading
of ?:
. I just didn’t see the need to introduce the special case of
overloading a ternary operator. Note that a function overloading
expr1?expr2:expr3
would not be able to guarantee that only one of expr2
and
expr3
was executed.” [Stroustrup: C++ Style and Technique
FAQ]
Bob Steagall presented his high-speed UTF-8
conversion at CppCon and C++Now
where he showed that his approach outperformed most existing conversion
algorithms. For some extra speed, he implemented a function for converting
ASCII to char16_t
/char32_t
using SSE intrinsics. This latter part got me
hooked, because:
stdx::simd
(my contribution to the Parallelism TS 2; note that I use
namespace stdx = std::experimental
, because the latter is just way too long.)
was just sent off for publication by the C++ committee and should have made
reliance on intrinsics unnecessary.
- I had no prior experience with vectorizing string operations (which is one of
the reasons my previous vector types library
Vc didn’t have 8-bit integer support). I was
curious, how hard can it be?
- Bob’s presentation made it look like one needs access to special instructions
like
movmskb
to get good performance.
- Scalability to different vector widths is unclear. The SSE intrinsics
certainly won’t scale. But how much can performance actually scale, knowing
that the larger the vector, the lower the chance the full vector of chars is
only made up of ASCII?
- And what about newer ISA extensions such as SSE4.1 which adds instructions
for converting
unsigned char
to short
or int
? Will it help?
- Most important to me, can the code be more readable and portable and at least
as fast at the same time?
- And is there a chance for vectorization of non-ASCII code point conversions?
Since at least KDE 2, the runner (krunner
- per default on Alt-F2
) supports
configurable prefixes that resolve to a URL that is then opened like
kioclient5 exec
would open them. I created the following shortcuts: