Posted: 2024-12-18
Last modified: 2024-12-18 @ a29e293

Integer Promotion was a Mistake

At work this week, I had to write a simple function (in C++) that was just supposed to pack eight uint8_ts into a single uint64_t. Extremely simple stuff, just a bunch of bit shifts and bitwise or-operations. But since this is C++-land, there’s footguns here too.

The function

It looked a little something like this:

uint64_t packBytes(/*...*/)
{
    auto packed = 0;

    packed |= byte0 << 0;
    packed |= byte1 << 8;
    packed |= byte2 << 16;
    packed |= byte3 << 24;
    packed |= byte4 << 32;
    packed |= byte5 << 40;
    packed |= byte6 << 48;
    packed |= byte7 << 56;

    return packed;
}

Looks pretty innocent, right? I thought so. But then my tests started failing. I was expecting something like 0x0102030405060708, but only got 0x05060708. It seemed like the top half of my bytes were just going in the bit bucket?

The problem

Well. The thing about C++ is that it has inherited a bunch of baggage from C. One piece of such baggage is integer promotion. Where basically, an integer type can be implicitly converted to a different – larger – integer type.

This, combined with my use of auto for the declaration of the return value, made packed initialize as an int (aka int32_t). Because that’s what C++ defaults to if you just initialize with a simple number.

In each of the lines following, each byte is then implicitly cast to an int, shifted by some amount, then bitwise or-ed into packed. But the last 4 bytes are shifted by more than 32, thus the bits are shifted right off the end of the promoted 32-bit value and thus the function essentially looks like this:

uint64_t packBytes(/*...*/)
{
    auto packed = 0;

    packed |= byte0 << 0;
    packed |= byte1 << 8;
    packed |= byte2 << 16;
    packed |= byte3 << 24;
    packed |= 0;
    packed |= 0;
    packed |= 0;
    packed |= 0;

    return packed;
}

No wonder the top 4 bytes weren’t there.

Finally, at the return statement, the compiler looks at the return type of the function and goes “Huh. That’s a bigger integral type than I have here. Oh well, here I go implicitly converting again!”, casts packed to a uint64_t, and calls it a day.

The fix

So how do you fix it? Simple. Make sure packed gets initialized as a 64-bit unsigned value from the start. There are three ways you could tweak the initialization line above to achieve this:

uint64_t packed = 0; – Just do away with auto.
auto packed = uint64_t{0}; – Be explicit (making auto pointless).
auto packed = 0ULL; – Use an integer suffix.

That’s it. Now, the bytes will get promoted to the proper 64-bit type, won’t get shifted off the end, and will actually end up in the packed value.

I went with option 1 because it is the most explicit. Option 2 just makes auto redundant, and option 3 is ugly. I really dislike integer suffixes in C++.

The hypocrisy

“Aha, Martin! You’re complaining about integer promotion yet your implementation uses integer promotion during the bit-shifting. Curious.”
You, probably

Okay, you got me. I didn’t even consider that initially. But yes, implicit conversion and integer promotion is happening there as well. To be honest, it’s been a bit since I was doing stuff like this in C++ and forgot about the details.

Another argument you can make is that I use auto too much in C++. Sure, it has its place when a function is returning a type whose name is long and tedious to type out, but was it really necessary with a uint64_t? I would still argue yes, since auto is 4 quick characters to type, and keeps it consistent with the rest of my variable declarations. Whereas uint64_t is twice as long, with one of the characters being a special character that I have to use Shift to get to. Ew.

The iron oxide

Basically, I’m spoiled by Rust¹, with the compiler actually caring about types properly. No implicit conversions, let is even shorter and easier, and the compiler will scream at you if you don’t take the necessary step to prevent bugs like this by being explicit with your casts.

Here’s the same function, written in a similar way in Rust (foregoing any niceties like u64::from_be_bytes):

fn pack_bytes(bytes: [u8; 8]) -> u64 {
    let mut packed = 0;

    packed |= bytes[0] << 0;
    packed |= bytes[1] << 8;
    packed |= bytes[2] << 16;
    packed |= bytes[3] << 24;
    packed |= bytes[4] << 32;
    packed |= bytes[5] << 40;
    packed |= bytes[6] << 48;
    packed |= bytes[7] << 56;

    packed
}

Let’s just attempt to compile this.

$ cargo build
   Compiling example v0.1.0 (/home/martin/dev/example)
error[E0308]: mismatched types
  --> src/lib.rs:17:5
   |
5  | pub fn pack(bytes: &[u8; 8]) -> u64 {
   |                                 --- expected `u64` because of return type
...
17 |     packed
   |     ^^^^^^ expected `u64`, found `u8`
   |
help: you can convert a `u8` to a `u64`
   |
17 |     packed.into()
   |           +++++++

For more information about this error, try `rustc --explain E0308`.
error: could not compile `example` (lib) due to 1 previous error

Oh. Looks like packed was deduced to be a u8. That’s not what we want, we want a u64. Let’s fix that.

fn pack_bytes(bytes: [u8; 8]) -> u64 {
-   let mut packed = 0;
+   let mut packed: u64 = 0;

    packed |= bytes[0] << 0;
    packed |= bytes[1] << 8;
    packed |= bytes[2] << 16;
    packed |= bytes[3] << 24;
    packed |= bytes[4] << 32;
    packed |= bytes[5] << 40;
    packed |= bytes[6] << 48;
    packed |= bytes[7] << 56;

    packed
}

And compile again:

$ cargo build
   Compiling example v0.1.0 (/home/martin/dev/example)
error[E0308]: mismatched types
 --> src/lib.rs:8:15
  |
8 |     packed |= bytes[0] << 0;
  |               ^^^^^^^^^^^^^ expected `u64`, found `u8`

error[E0277]: no implementation for `u64 |= u8`
 --> src/lib.rs:8:12
  |
8 |     packed |= bytes[0] << 0;
  |            ^^ no implementation for `u64 |= u8`
  |
  = help: the trait `BitOrAssign<u8>` is not implemented for `u64`
  = help: the following other types implement trait `BitOrAssign<Rhs>`:
            `u64` implements `BitOrAssign<&u64>`
            `u64` implements `BitOrAssign`

... (repeated 7 more times)

Oh right. Those are all u8s and I’m trying to mash them up with a u64. Silly me. Let’s fix that too:

fn pack_bytes(bytes: [u8; 8]) -> u64 {
    let mut packed: u64 = 0;

-   packed |= bytes[0] << 0;
-   packed |= bytes[1] << 8;
-   packed |= bytes[2] << 16;
-   packed |= bytes[3] << 24;
-   packed |= bytes[4] << 32;
-   packed |= bytes[5] << 40;
-   packed |= bytes[6] << 48;
-   packed |= bytes[7] << 56;
+   packed |= (bytes[0] as u64) << 0;
+   packed |= (bytes[1] as u64) << 8;
+   packed |= (bytes[2] as u64) << 16;
+   packed |= (bytes[3] as u64) << 24;
+   packed |= (bytes[4] as u64) << 32;
+   packed |= (bytes[5] as u64) << 40;
+   packed |= (bytes[6] as u64) << 48;
+   packed |= (bytes[7] as u64) << 56;

    packed
}

Compile again:

$ cargo build
   Compiling example v0.1.0 (/home/martin/dev/example)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.07s

There we go! Easy. No footguns here.

The conclusion

I will enable the proper clang-tidy checks to prevent stuff like this in the future. Can you believe that checking implicit conversions is off by default?!

I was really hoping that the type deduction would realize that since I’m returning an auto from a function whose signature declares that it should return a uint64_t, then auto = uint64_t in this case. That, or that it would give me a compiler warning/error. Clearly my hopes were too high.

What’s even funnier is that I ended up completely removing this function in the end because it was unnecessary to do things that way. Oh well.

You didn’t really think I was going to go an entire post without mentioning Rust, did you? ↩︎