Compiling boost on QNX: a tale of why modules are needed in C++

Recently I had to compile Boost Program Options 1.65.1 for QNX 6.6 (QNX is a UNIX-like real time operating system). This should have worked:

b2 toolset=qcc target-os=qnx –with-program_options –link=static

but I got this error message:

In file included from ./boost/bind/bind.hpp:29:0,
from ./boost/bind.hpp:22,
from libs\program_options\src\parsers.cpp:19:
./boost/bind/arg.hpp:37:15: error: expected '>' before '(' token
./boost/bind/arg.hpp:43:83: error: '_FCbuild' cannot be used as a function

If we look at arg.hpp:37 we see this template definition:


template < int I >
struct arg
{
    BOOST_CONSTEXPR arg()
    {
    }

    template< class T > BOOST_CONSTEXPR arg( T const & /* t */, typename _arg_eq< I == is_placeholder<T>::value >::type * = 0 )
    {
    }
};

looking at the compiler output: what is _FCbuild? It must come from somewhere… a macro perhaps? Adding the option cxxflags=”-P” to bjam to produce the preprocessed output sheds some light:

template< int _FCbuild(0.0F, 1.0F) > struct arg
{
    constexpr arg()
    {
    }

    template< class T > constexpr arg( T const & , typename _arg_eq< _FCbuild(0.0F, 1.0F) == is_placeholder<T>::value >::type * = 0 )
    {
    }
};

You can see that the template argument I got converted into “_FCbuild(0.0F, 1.0F)”. So is there a macro with the name I somewhere??

Indeed. QNX defines in complex.h:

 #if _HAS_C9X_IMAGINARY_TYPE
 #define imaginary	_Imaginary
 #define _Imaginary_I	((float _Imaginary)_Complex_I)
 #define I	_Imaginary_I

 #else /* _HAS_C9X_IMAGINARY_TYPE */
 #define I	_Complex_I
 #endif /* _HAS_C9X_IMAGINARY_TYPE */

and either the first #if or the second ends up defining _FCbuild

So simply adding
#undef I
after all the #includes in arg.hpp solves the problem.

What we see here is pollution of interfaces, the innocent code in arg.hpp gets polluted by QNX complex.h. Causing I to be replaced and producing code that makes no sense to the compiler.

This is something that Bjarne Stroustrup has been pointing out as a benefit of modules. Code in arg.hpp would have been coded like this modules:

import config; // #define I came from config.hpp
(...)
template < int I >
struct arg
{
    BOOST_CONSTEXPR arg()
    {
    }

    template< class T > BOOST_CONSTEXPR arg( T const & /* t */, typename _arg_eq< I == is_placeholder<T>::value >::type * = 0 )
    {
    }

import config; does not “leak” macros (or in general pre-processing state); modules allow to compile interface declarations in isolation. With modules a developer can use the template argument I without fear that his code would be broken by a macro defined by an OS standard header.

Advertisements

On the memory layout of objects and zero-cost abstractions of C++

When I was writing a financial application that stored millions of vectors some years ago I was intrigued by the overhead of the Visual Studio 2008 implementation of std::vector. Recently I discovered an (undocumented) compiler switch that gave me the answer to why that happened: /d1reportSingleClassLayoutXXX where XXX is a class name. If we compile this file main.cpp:

#include <vector>

std::vector<int> v;

like this:

cl /c /EHsc /nologo /W4 /MT main.cpp /d1reportSingleClassLayout?$vector@HV?$allocator@H@std@@
main.cpp
class ?$vector@HV?$allocator@H@std@@    size(48):
        +---
        | +--- (base class ?$_Vector_val@HV?$allocator@H@std@@)
        | | +--- (base class ?$_Container_base_aux_alloc_real@V?$allocator@H@std@@)
        | | | +--- (base class _Container_base_aux)
 0      | | | | _Myownedaux
        | | | +---
 8      | | | ?$allocator@V_Aux_cont@std@@ _Alaux
        | | | <alignment member> (size=7)
        | | +---
16      | | ?$allocator@H _Alval
        | | <alignment member> (size=7)
        | +---
24      | _Myfirst
32      | _Mylast
40      | _Myend
        +---

we could see std::vector would have at least 48 bytes of overhead over the raw data (in this case I used the mangled name of std::vector when invoking cl)

Fortunately, this changed with time and in VS 2017 you get this output:

class std::vector<int,class std::allocator >       size(24):
        +---
 0      | +--- (base class std::_Vector_alloc<struct std::_Vec_base_types<int,class std::allocator > >)
 0      | | ?$_Compressed_pair@V?$allocator@H@std@@V?$_Vector_val@U?$_Simple_types@H@std@@@2@$00 _Mypair
        | +---
        +---

So it reduced in half: you just need a pointer to the allocation (_Myfirst), a pointer to end of the utilized section of the allocation (_Mylast) and a pointer to the end of the allocation (_Myend). The VS implementation is now truly following the zero-cost overhead principle, it only stores those 3 pointers = 24 bytes.

Looking at the memory layout in VS2008 we could see that the compiler was storing a pointer to the allocator, which is unnecessary and it was removed in later versions. Similarly std::string reduced its overhead from 48 bytes in Vs 2008 to 32 bytes in VS 2017.

The compiler switch has a sister, /d1reportAllClassLayout, which would output the memory layout of all the classes that would be part of the .obj. With this kind of options it is easy to see why it is usually suggested to declare data members by decreasing alignment. E.g.

struct MyClass {
  char a;
  int* b;
  char c;
  };

produces:

class MyClass   size(24):
        +---
 0      | a
        | <alignment member> (size=7)
 8      | b
16      | c
        | <alignment member> (size=7)
        +---

but the chars are 1-byte alignments and the pointers are 8-byte aligned, so ordering them by decreasing aligment as the rule says:

struct MyClass {
  int* b;
  char a;  
  char c;  
};
class MyClass   size(16):
        +---
 0      | a
 1      | c
        | <alignment member> (size=6)
 8      | b
        +---

Saves 8 bytes. It is easy to see why the compiler needs padding. If there was no padding in the last example we would have:

*--------*--------*
|acbbbbbb|bb      *
*--------*--------* 

Accessing b would require fetching two words instead of one which would not be efficient. Padding is inserted so the memory access is aligned:

*--------*--------*
|ac      |bbbbbbbb*
*--------*--------* 

The compiler would not perform any reordering of the data members by itself as the C standard says that data members shall have incresing memory addresses. C was designed for direct memory access and that rule would allow programmers to predict memory layouts and store blocks of data read from devices directly.

But reducing memory consumption may not be the main issue at stake:
– On a 64-bit x86, a cache line is 64 bytes beginning on a self-aligned address, so you may want to store the data members that are frequently accessed together on those lines
– You may also want to prevent false sharing separating concurrently accessed data members so threads running on multiple processors do not invalidate their cache copies constantly
– In embedded systems, the offsets encoded inside instructions to point to addresses could very small, e.g. in a 16-bit ARM Thumb, you would have 5 bits (for offsets from 0 to 127) in some instructions, so you may want to keep frequently used data members at the beginning of a structure’s layout

So with this post I would like people to see of the importance of measuring and not assuming zero-cost abstractions in C++. This compiler option, /d1reportSingleClassLayout, is mentioned in a MSDN blog post on how to debug ODR violations and in one of the great Stephan T. Lavavej’s videos about the STL

Introducing pct, a tool to help reducing C/C++ compilation times

Until C++ modules become widely available (Microsoft released experimental support for the import statement last December: https://blogs.msdn.microsoft.com/vcblog/2015/12/03/c-modules-in-vs-2015-update-1/ ), we still need to resort to precompiled headers as one way to reduce compilation times on Windows.

I have released today the first version of a tool that allows to auto-generate precompiled headers (usually named stdafx.h on Windows). Auto-generate stdafx.h is not as simple as it may seem. One may think that he could just make a grep in search of standard headers and then include all those lines in the stdafx.h of the project; but that does not take into account that some of those lines may be disabled depending on the macro values for instance. The tool uses the Boost wave preprocessor to preprocess the source code of a project and generates a header to be precompiled, including all the standard or third-party library headers referenced in the code.

Using the tool I have been to reduce compilation times on one of my Visual Studio projects by a factor of six. The source code and the binaries are at:

https://github.com/g-h-c/pct