Learn C++ Itanium Symbol Mangling

Lesson 1: Basics

After getting an understanding of how this guide works and learning about the not-mangling of C identifiers, we are ready to dive into C++.

Every C++ mangled symbol is prefixed with the string _Z. This signifies that this is a mangled C++ symbol. _Z starts with an underscore followed by an uppercase letter. All symbols of that structures are reserved by the C standard and cannot be used by programs. This ensures that there are no name collisions with normal C functions and mangled C++ functions.

After that, the name of the entity is stored. For now, we will only look at functions. For functions, the function type is appended to the name to get the full symbol.

void f() {}
          

This empty function will be mangled to _Z1fv. The 1f signifies the name (we will look at this in more detail later in this lesson) and the v signifies the function type.

We will see the v function type a lot in the rest of this guide. It stands for a function that takes no arguments.

Which of these symbols cannot possibly be a mangled C++ symbol? Answer with the name of the symbol.

  • _ZN3FooIA4_iE3barE
  • _ZN6System5Sound4beepEv
  • _RN3FooIA4_iE3barE

For names, there are two cases to consider for now. Either the name is in the global scope, or it is in a namespace.

For global names, we just prefix the name with its length.

void hello_world() {}
          

This will therefore get mangled as _Z11hello_worldv. The length of hello_world is 11, so we concatenate 11 and hello_world. This entire thing is then appended to the previously mentioned prefix _Z and then we add the type, which is just v here, at the end.

What is the mangling of the following identifier?

void meow() {}
            

Functions that are declared in a namespace get a bit more complicated. They are referred to as nested names, because they are nested in a namespace. They can also be nested in multiple namespaces, the encoding is the same.

Nested names start with an N and end with an E (the E stands for "end" and is commonly used to end sequences). Between those two letters, the hierarchy of the namespace is represented by putting on namespace name after another, with the function name last. Every name has the leading length and then the name itself, just like with global names.

namespace outer {
  void inner() {}
}
          

That means that this function will be mangled as _ZN5outer5innerEv. We can decode this into the following structure

  • _Z: Prefix
  • N: Start of nested name
  • 5outer: Outer namespace, name prefixed by length
  • 5inner: Inner function, name prefixed by length
  • E: End of nested name
  • v: Function type

Nested namespaces follow the same structure.

namespace a {
  namespace b {
    namespace c {
      void inner() {}
    }
  }
}
          

This function will mangle as _ZN1a1b1c5innerEv. We get all the concatenated names as 1a1b1c5inner, with the previously mentioned characters around them.

What is the mangling of the following identifier?

namespace cats {
  namespace like {
    void meow() {}
  }
}
            

Good job! You have successfully answered all the question and now know the basic makeup of an Itanium-mangled C++ symbol.

In the next lesson, we will use this knowledge to look at basic function types beyond v. Mangling function types is important for function overloading, but I don't want to overload you with information, so feel free to take a break and let the previous knowledge sink in.

If you want to try out more code and look at its mangling, I recommend using Compiler Explorer on godbolt.org. Under "Output", you can uncheck the box to demangle identifiers to see the mangled identifiers for any C++ code you enter on the left.