Learn C++ Itanium Symbol Mangling
Lesson 1: Basics
After getting an understanding of how this guide works and learning about the not-mangling of C identifiers, we are ready to dive into C++.
Every C++ mangled symbol is prefixed with the string
_Z
. This signifies that this is a mangled C++ symbol.
_Z
starts with an underscore followed by an uppercase
letter. All symbols of that structures are reserved by the C
standard and cannot be used by programs. This ensures that there are
no name collisions with normal C functions and mangled C++
functions.
After that, the name of the entity is stored. For now, we will only look at functions. For functions, the function type is appended to the name to get the full symbol.
void f() {}
This empty function will be mangled to _Z1fv
. The
1f
signifies the name (we will look at this in more
detail later in this lesson) and the v
signifies the
function type.
We will see the v
function type a lot in the rest of
this guide. It stands for a function that takes no arguments.
Which of these symbols cannot possibly be a mangled C++ symbol? Answer with the name of the symbol.
_ZN3FooIA4_iE3barE
_ZN6System5Sound4beepEv
_RN3FooIA4_iE3barE
For names, there are two cases to consider for now. Either the name is in the global scope, or it is in a namespace.
For global names, we just prefix the name with its length.
void hello_world() {}
This will therefore get mangled as _Z11hello_worldv
.
The length of hello_world
is 11, so we concatenate
11
and hello_world
. This entire thing is
then appended to the previously mentioned prefix _Z
and
then we add the type, which is just v
here, at the end.
What is the mangling of the following identifier?
void meow() {}
Functions that are declared in a namespace get a bit more complicated. They are referred to as nested names, because they are nested in a namespace. They can also be nested in multiple namespaces, the encoding is the same.
Nested names start with an N
and end with an
E
(the E
stands for "end" and is commonly
used to end sequences). Between those two letters, the hierarchy of
the namespace is represented by putting on namespace name after
another, with the function name last. Every name has the leading
length and then the name itself, just like with global names.
namespace outer { void inner() {} }
That means that this function will be mangled as
_ZN5outer5innerEv
. We can decode this into the
following structure
_Z
: PrefixN
: Start of nested name-
5outer
: Outer namespace, name prefixed by length -
5inner
: Inner function, name prefixed by length E
: End of nested namev
: Function type
Nested namespaces follow the same structure.
namespace a { namespace b { namespace c { void inner() {} } } }
This function will mangle as _ZN1a1b1c5innerEv
. We get
all the concatenated names as 1a1b1c5inner
, with the
previously mentioned characters around them.
What is the mangling of the following identifier?
namespace cats { namespace like { void meow() {} } }
Good job! You have successfully answered all the question and now know the basic makeup of an Itanium-mangled C++ symbol.
In the next lesson, we will use this knowledge to look at basic
function types beyond v
. Mangling function types is
important for function overloading, but I don't want to overload you
with information, so feel free to take a break and let the previous
knowledge sink in.
If you want to try out more code and look at its mangling, I recommend using Compiler Explorer on godbolt.org. Under "Output", you can uncheck the box to demangle identifiers to see the mangled identifiers for any C++ code you enter on the left.