As a compiler developer, I see a lot of code that deals with data types and many different ways of using data types.
A problem that I have encountered a few times, despite being rather rare, is using fixed-width integer types to parameterize code on the size of types. When writing compilers or other programs that need to operate on data types directly, you often need code that behaves differently for different static types. For example, a serialization framework might need to change behaviour depending on the size of different integer types. In a compiler, you might encode the size of a type in a data structure like the Intermediate Language. One attempt to implement such behaviour is to overload a function using fixed-width integer types, with each overload being responsible for handling a specific size. However, a few subtleties in the C++ type system make this approach problematic for code maintainability and portability.
In this post, I explain the different problems that can happen when using fixed-width integer types to overload functions and provide alternative approaches for achieving the intended goal.
The naive approach
Let’s consider a simple function, foo()
, with four overload definitions, one for each signed integer bit-width:
|
|
Naively, one might think that foo()
is callable with any integer type. This is, however, not the case.
Trying to compile this simple example:
|
|
we quickly get errors about an ambiguous call:
error: call to 'foo' is ambiguous
foo(x);
^~~
<source>:4:6: note: candidate function
void foo(int8_t) {}
^
<source>:5:6: note: candidate function
void foo(int16_t) {}
^
<source>:6:6: note: candidate function
void foo(int32_t) {}
^
<source>:7:6: note: candidate function
void foo(int64_t) {}
^
1 error generated.
The compiler even helpfully lists all the functions we’ve defined as possible overload candidates.
The call is ambiguous because implicit conversions do not impose an ordering on overload resolution. Since we are passing a uint32_t
to foo()
, and foo()
only has definitions with signed types, the compiler will try to apply an implicit conversion. Unfortunately for us, as far as the compiler is concerned, int8_t
and int64_
(and every int*_t
in between) are equally appropriate targets for an implicit conversion from uint32_t
. The compiler doesn’t consider the fact that int32_t
and uint32_t
have the same bit-width when deciding which overload to pick for the call.
Note that changing all the overloads to use uint*_t
instead won’t help because that will just invert the problem.
The wrong solution
To work around the problem, one might be tempted to define overloads for both signed and unsigned types:
|
|
However, besides possible issues with code duplication, there are still portability problems with this approach.
Let’s consider this example now:
|
|
At first glance, the code seems like it should work fine. And it will… sometimes.
If you compile the code using clang, everything is fine. With gcc, everything is still ok. But with MSVC, you get an error like this:
error C2668: 'foo': ambiguous call to overloaded function
No, this not an MSVC bug -_-
Let’s try changing the unsigned long
to unsigned long long
|
|
Now MSVC accepts the code, but gcc and clang say the call is ambiguous!
So, what’s going on?
The problem
If you’re using fix-width integer types, you probably already know that C++ doesn’t guarantee what the bit-width of primitive integer types should be. For example, long
is 64 bits on some systems but 32 bits on others.
A subtle implication is that multiple primitive integer types can have the same bit-width. For example, in gcc and clang on x86-64, long
and long long
are both 64 bits. However, in MSVC (also on x86-64) int
and long
are 32 bits.
Now, because fixed-width integer types are just typedefs for some primitive type, one of the two types will not have an associated fixed-width typedef. For gcc and clang, int64_t
can only map to one of long
or long long
. Similarly, for MSVC, int32_t
can only map to one of int
or long
.
As a result, calling foo()
with whichever type is not covered by a typedef will require an implicit conversion. And, as we previously saw, implicit conversions are not prioritized, so we get ambiguity.
The special char
problem
The type char
has an additional, subtle oddity that breaks the code. For other integer types like int
, signed
is implied when not explicitly specified (i.e. int
is the same as signed int
). However, C++ requires that char
, signed char
, and unsigned char
all be distinct types. Also, char
can be either signed or unsigned. It’s up to the compiler to decide which it will be.
As a result, the exact behaviour of, for example, foo('a')
will depend on the exact compiler you use. It might call foo(int8_t)
or foo(uint8_t)
… or something else. Using clang as an example, int8_t
is a typedef for signed char
and uint8_t
is a typedef for unsigned char
. So foo('a')
will not match any of the overloads we have defined. However, 'a'
, which is of type char
, is subject to integer promotion, which happens before implicit conversions. So, because clang also happens to define int32_t
as a typedef of int
, foo('a')
will actually call foo(int)
, which is foo(int32_t)
.
You can see this for your self by looking at the un-optimized assembly code clang generates for the main function in:
|
|
which looks something like this (see here):
main:
push rbp
mov rbp,rsp
mov edi,0x61
call 401130 <foo(int)> // <-- this is the call to foo('a')
xor eax,eax
pop rbp
ret
nop WORD PTR cs:[rax+rax*1+0x0]
nop DWORD PTR [rax+0x0]
The proper solution(s)
The basic solution is to define overloads for the primitive integer types and use sizeof()
to get each type’s size.
However, a more compact approach is to use templates and sizeof()
:
|
|
If you want to be sure that only integer types are accepted, add a static_assert
with std::is_integral_v<T>
(or std::is_integral<T>::value
if you only have C++11):
|
|
If you need to implement different behaviour depending on the size, use if constexpr
(a plain if
would also work):
|
|
You can also usestd::enable_if
(yes this is going to get ugly):
|
|
A final note about sizeof()
An astute reader will have noted that using sizeof()
can also pose portability problems.
sizeof()
evaluates to the number of bytes used to store a given type.
While we usually assume that a byte is 8 bits long, C++ doesn’t actually guarantee this. Specifically, the number of bits in a byte is defined as whatever number of bits is used to store a char
; i.e. CHAR_BITS
(see cppreference.com). Because C++ also doesn’t specify the exact number of bits in a char
(only that it must be at least 8), a compiler could define char
as being larger than 8 bits. In fact, every integer type could be 64 bits long and sizeof()
would evaluate to 1 for all of them, and the compiler would still be spec-compliant.
However, in practice, platforms where char
(and therefore, a byte) is not 8 bits are extremely rare. So, for most practical purposes, assuming that a byte is 8 bits longs is ok and sizeof()
will behave as we expect :)