Vir's blog — go fast with readable code!

there's too much unused parallelism on a single core

Virtual Trip Report: WG21 Kona 2025

13 Nov 2025

After the CD (committee draft) is out, WG21 is in bug fixing mode. Consider NB comments as the tickets that force us to consider a resolution and provide an answer. Such issues can either be design issues or wording issues (where wording does not match stated design intent). Besides NB comments, many library issues had been filed and prioritized independently, which led to several categorized as must/should fix before the standard is finalized.

Read on …

The story of regularity and std::simd

16 Nov 2023

In this post I will talk about regularity and why std::regular<std::simd<int>> needs to be false in order to preserve regularity at the level where it matters: equational reasoning. The issue of regularity came up repeatedly when discussing the design of std::simd for C++26. (It also came up in 2017 for std::experimental::simd.) My goal for this post is the exploration of options and their consequences. There’s a lot more to be said, but this post is already too long. In any case, when talking about regularity, we need start with “Elements of Programming”, the book that introduced the concept:

A type is regular if and only if its basis includes equality, assignment, destructor, default constructor, copy constructor, total ordering, and underlying type. […]

Algorithms are abstract when they can be used with different models satisfying the same requirements, such as associativity. Code optimization depends on equational reasoning; unless types are known to be regular, few optimizations can be performed.

Alexander Stepanov, Paul McJones — Elements of Programming (EoP)
Read on …

Data-Structure Vectorization

29 Jul 2021

One of the major benefits of type-based vectorization is data-structure vectorization. I’ll introduce and hopefully motivate the pattern in this post.

Read on …

Making the C++ conditional operator overloadable

25 Jul 2019

Why are operator?: overloads not allowed in C++? See, basically every operator in C++ is overloadable. Sure, you can do stupid things with such power, but that’s a general problem with humans that have power. C++ gives power to programmers, we need to use it wisely. Apparently operator?: is not overloadable because: “There is no fundamental reason to disallow overloading of ?:. I just didn’t see the need to introduce the special case of overloading a ternary operator. Note that a function overloading expr1?expr2:expr3 would not be able to guarantee that only one of expr2 and expr3 was executed.” [Stroustrup: C++ Style and Technique FAQ]

Read on …

Vectorized conversion from UTF-8 using stdx::simd

27 May 2019

Bob Steagall presented his high-speed UTF-8 conversion at CppCon and C++Now where he showed that his approach outperformed most existing conversion algorithms. For some extra speed, he implemented a function for converting ASCII to char16_t/char32_t using SSE intrinsics. This latter part got me hooked, because:

  • stdx::simd (my contribution to the Parallelism TS 2; note that I use namespace stdx = std::experimental, because the latter is just way too long.) was just sent off for publication by the C++ committee and should have made reliance on intrinsics unnecessary.
  • I had no prior experience with vectorizing string operations (which is one of the reasons my previous vector types library Vc didn’t have 8-bit integer support). I was curious, how hard can it be?
  • Bob’s presentation made it look like one needs access to special instructions like movmskb to get good performance.
  • Scalability to different vector widths is unclear. The SSE intrinsics certainly won’t scale. But how much can performance actually scale, knowing that the larger the vector, the lower the chance the full vector of chars is only made up of ASCII?
  • And what about newer ISA extensions such as SSE4.1 which adds instructions for converting unsigned char to short or int? Will it help?
  • Most important to me, can the code be more readable and portable and at least as fast at the same time?
  • And is there a chance for vectorization of non-ASCII code point conversions?
Read on …