C/C++ – Alfonso Sánchez-Beato's blog

I have some interest in photogrammetry, and, more concretely, in the ways to extract 3D information from a set of photographies of an object. There is excellent open source software that is able to do this, like Meshroom, which is based on the Alice Vision framework. Once you create a 3D object with this tool, you have a set of vertices and textures that can be manipulated with specialized 3D editing software. And here again we have excellent open source software for this task, like MeshLab.

MeshLab can be extended via plugins, of which there is an extensive list. Some of them are integrated in the main MeshLab repository, but some others can be found in the MeshLab extra plugins repo. As I wanted to research what sort of measurements I could take on my 3D reconstructions, it looked like writing a MeshLab plugin was a great way to do this and be able to visualize and interact with the results easily.

I had some fun last year with finding optimal rectangles for videoconferences, so somewhat following that line of thought, I wondered how easy was to put an irregular object into an oriented bounding box (OBB) with minimal size, that is, the rectangular parallelepiped of minimal volume that encloses the object. This is called the optimal OBB in the literature. That seemed like a nice fit to try MeshLab plugins and learn how to do measurements on 3D reconstructions at the same time. Also, optimal OBBs have very important applications, such as fast rendering of 3D scenes, collision detection, and others. And optimal packaging might be an application as well indeed.

I put my hands on the task and started to dig into the literature on the topic. There is an exact algorithm to find optimal OBBs found by O’Rourke¹, but it is known to be slow (complexity is \(O(N^3)\) being \(N\) the number of vertices) and difficult to implement, so I looked at approximate algorithms. The one from Chang et al² looked fast and results were very near to O’Rourke’s solution. Also importantly, the authors published Matlab sources for it, so I chose that one for implementation in MeshLab. They called this algorithm HYBBRID because it mixes genetic algorithms with the Nelder-Mead optimization method, that can be used when derivatives for the target function cannot be calculated.

An OBB is defined by a rotation matrix \(R\), the center of the box, and the size of each dimension. The most important part to obtain an optimal OBB is finding the rotation matrix, as once we have it it is trivial to calculate the other parameters by rotating the vertices with \(R^T\) and then finding the minimum and maximum coordinates in the x, y, and z axis. Without digging too much into the details (I defer to the paper for that), the HYBBRID algorithm applies a genetic algorithm to sets of rotation matrices. With the ones that produce the smaller volumes, it applies the Nelder-Mead algorithm to find a local optimum. Finally, it applies a further refinement to the best solution by applying the 2D rotating calipers algorithm³ to each rotated axis. The genetic algorithm does a global sampling of the space of rotation matrices, while Nelder-Mead and 2D rotating calipers try to find a local minimum. The minimization function is not convex and not differentiable, although continuous, so this combination of algorithms works really well.

My C++ implementation for MeshLab has already been merged to the extra plugins repository, and can be found here (this was the original PR). I used the Matlab code as reference, with some improvements in the 2D calipers algorithm and in the calculation of average rotations. The filter adds two meshes to the object to which it is applied, one with the convex hull of the object and another one that defines the bounding box. The convex hull is not strictly necessary, but it reduces the number of vertices you need to care about, speeding up the algorithm. In my tests, the results looked amazingly accurate, and, furthermore, very fast. It took less than 15 seconds for objects of around 100K vertices on my 4 years old laptop (i7 with 8 cores), and the number of iterations of HYBBRID was usually around 6-10.

To build the plugin in the “build” directory, run these commands:

$ git clone --recurse-submodules https://github.com/cnr-isti-vclab/meshlab-extra-plugins.git
$ cd meshlab-extra-plugins
$ cmake -S . -B build
$ cmake --build build

The recursive clone creates a meshlab subfolder. I recommend to compile MeshLab from there (follow these instructions), as otherwise the plugin might crash because the ABI offered by MeshLab is not really meant to be stable.

After compilation, you will get a shared object (build/meshlab/src/distrib/plugins/libfilter_orientedbbox.so) that you need to manually add as a plugin to MeshLab. For that, you need to go to Help -> Plugin Info -> Load Plugins, as instructed in the meshlab extra plugins repository. Once you have done this, the filter should be available in Filters -> Remeshing, Simplification and Reconstruction -> Oriented Bounding Box.

For testing I used this set of images (which are under Creative Commons). I used Meshroom to get a 3D model for them, and then I edited it in MeshLab. To visualize well the results, I used the xray shader (Render -> Shaders -> xray.gdp) that is included in MeshLab and that allows us to “see through” the different object layers. You can also turn on and off different meshes for better visualization. This process is illustrated in the images below.

Using Meshroom to create a 3D model from photographies

After applying the OBB filter (convex hull hidden)

Below you can see a few more 3D models from this page on which I applied the filter, including of course the mythical tea pot model:

And this is pretty much it. I hope you will enjoy playing with the plugin if you take the time to build it. Or, if you don’t, just playing around with Meshroom and MeshLab is really fun time anyway.

Occasionally I find myself processing input data which arrives as a stream, like data from files or from a socket, but that has a known structure that can be modeled with C types. For instance, let’s say we are receiving from a socket a parcel that consists on a header of one byte, and a payload that is an integer. A naive way to handle this is the following (simplified for readability) code snippet:

int main(void)
{
    int fd;
    char *buff;
    struct sockaddr_in addr;
    int vint;
    char vchar;

    fd = socket(AF_INET, SOCK_STREAM, 0);
    buff = malloc(BUFF_SIZE);
    /* Init socket address */
    ...
    connect(fd, (struct sockaddr *) &addr, sizeof(addr));

    read(fd, buff, BUFF_SIZE);

    vchar = buff[0];
    vint  = *(int *) &buff[1];
    /* Do something with extracted data, free resources */
    ...
    return 0;
}

Here we get the raw data with a read() call, we read the first byte, then we read an integer by taking a pointer to the second read byte and casting it to a pointer to an integer. (for this example we are assuming that the integer inserted in the stream has the same size and endianness as the CPU ones).

There is a big issue with this: the cast to int *, which is undefined behavior according to the C standard ⁴. And it is because things can go wrong in at least two ways, first due to pointer aliasing rules, second due to type alignment.

Strict pointer aliasing tells the compiler that it can assume that pointers to different types point to different places in memory. This allows some optimizations, like reordering. Therefore, we could be in trouble if, say, we take &buff[1] into a char * and use it to write a byte in that location, as reordering could hit us. So just do not do that. Let’s also hope that we have a compiler that is not completely insane and does not move our reading by int pointer before the read() system call. We could also disable strict aliasing if we are using GCC with option -fno-strict-aliasing, which by the way is something that the Linux kernel does. At any rate, this is a complex subject and I will not dig into it this time.

We will concentrate in this article on how to solve the other problem, that is, how to access safely types that are not stored in memory in their natural alignment.

The C Standard-Compliant Solution

Before moving further, keep in mind that it is always possible to be strictly compliant with the standard and access safely memory without breaking language rules or using compiler or machine specific tricks. In the example, we could retrieve vint by doing

    vint  =   buff[1] + (buff[2] << 8)
            + (buff[3] << 16) + (buff[4] << 24);

(supposing stored data is little endian).

The issue here is performance: we are implicitly transforming four bytes to integers, then we have to bit-shift three of them, and finally we have to add them up. Note however that this is what we want if data and CPU have different endianness.

Doing Unaligned Memory Accesses

In all machine architectures there is a natural alignment for the different data types. This alignment is usually the size of the types, for instance in 32 bits architectures the alignment for integers is 4, for doubles it is 8, etc. If instances of these types are not stored in memory positions that are multiple of their alignment, we are talking about unaligned access. If we try to access unaligned data either of these can happen:

The hardware let’s us access it – but always at a performance penalty.
An exception is triggered by the CPU. This type of exception is called bus error ⁵.

We might be willing to accept the performance penalty ⁶, which is mitigated by CPU caches and not that noticeable in certain architectures like x86-64 , but we certainly do not want our program to crash. How possible is this? To be honest it is not something I have seen that often. Therefore, as a first analysis step, I checked how easy it was to get bus errors. To do so, I created the following C++ program, access1.cpp (I could not resist to use templates here to reduce the code size):

#include <iostream>
#include <typeinfo>
#include <cstring>

using namespace std;

template <typename T>
void print_unaligned(char *ptr)
{
    T *val = reinterpret_cast<T *>(ptr);

    cout << "Type is \"" << typeid(T).name()
         << "\" with size " << sizeof(T) << endl;
    cout << val << " *val: " << *val << endl;
}

int main(void)
{
    char *mem = new char[128];

    memset(mem, 0, 128);

    print_unaligned<int>(mem);
    print_unaligned<int>(mem + 1);
    print_unaligned<long long>(mem);
    print_unaligned<long long>(mem + 1);
    print_unaligned<long double>(mem);
    print_unaligned<long double>(mem + 1);

    delete[] mem;
    return 0;
}

The program allocates memory using new char[], which as malloc() in C is guaranteed to allocate memory with the same alignment as the strictest fundamental type. After zeroing the memory, we access mem and mem + 1 by casting to different pointer types, knowing that the second address is odd, and therefore unaligned except for char * access.

I compiled the file with g++ on my laptop, ran it, and got

$ g++ access1.cpp -o access1
$ file access1
access1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=09d0fb19340a10941eef4c3dc4d6eb29383e717d, not stripped
$ ./access1
Type is "i" with size 4
0x16c3c20 *val: 0
Type is "i" with size 4
0x16c3c21 *val: 0
Type is "x" with size 8
0x16c3c20 *val: 0
Type is "x" with size 8
0x16c3c21 *val: 0
Type is "e" with size 16
0x16c3c20 *val: 0
Type is "e" with size 16
0x16c3c21 *val: 0

No error for x86-64. This was expected as Intel architecture is known to support unaligned access by hardware, at a performance penalty (which is apparently quite small these days, see ⁷).

The second try was with an ARM CPU, compiling for arm-32:

$ g++ access1.cpp -o access1
$ file access1
access1: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=8c3c3e7d77fddd5f95d18dbffe37d67edc716a1c, not stripped
$ ./access1
Type is "i" with size 4
0x47b008 *val: 0
Type is "i" with size 4
0x47b009 *val: 0
Type is "x" with size 8
0x47b008 *val: 0
Type is "x" with size 8
Bus error (core dumped)

Now we get what we were searching for, a legitimate bus error, in this case when accessing a long long from an unaligned address. Commenting out the offending line and letting the program run further showed the error also when accessing a long double from mem + 1.

Fixing Unaligned Memory Accesses

After proving that this could be a real problem, at least for some architectures, I tried to find a solution that would let me do unaligned memory accesses in the most generic way. I could not find anything safe that was strictly following the C standard. However, all C/C++ compilers have ways to define packed structures, and that came to the rescue.

Packed structures are intended to minimize the padding that is introduced by alignment needed by the structure members. They are used when minimizing storage is a big concern. But what is interesting for us is that its members can be unaligned due to the packing, so dereferencing them must take that into account. Therefore, if we are accessing a type in a CPU that does not support unaligned access for that type the compiler must synthesize code that handles this transparently from the point of view of the C program.

To test that this worked as expected, I wrote access2.cpp, which uses GCC attribute __packed__ to define a packed structure:

#include <iostream>
#include <typeinfo>
#include <cstring>

using namespace std;

template <typename T>
struct __attribute__((__packed__)) struct_safe
{
    T val;
};

template <typename T>
void print_unaligned(char *ptr)
{
    struct_safe<T> *safe = reinterpret_cast<struct_safe<T> *>(ptr);

    cout << "Type is \"" << typeid(T).name()
         << "\" with size " << sizeof(T) << endl;
    cout << safe << " safe->val: " << safe->val << endl;
}

int main(void)
{
    char *mem = new char[128];

    memset(mem, 0, 128);

    print_unaligned<int>(mem);
    print_unaligned<int>(mem + 1);
    print_unaligned<long long>(mem);
    print_unaligned<long long>(mem + 1);
    print_unaligned<long double>(mem);
    print_unaligned<long double>(mem + 1);

    delete[] mem;
    return 0;
}

In this case, instead of directly casting to the type, I cast to a pointer to the packed struct and access the type through it.

Compiling and running for x86-64 got the expected result: no error, all worked as before. Then I compiled and ran it in an ARM device:

$ g++ access2.cpp -o access2
$ file access2
access2: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=9a1ee8c2fcd97393a4b53fe12563676d9f2327a3, not stripped
$ ./access2
Type is "i" with size 4
0x391008 safe->val: 0
Type is "i" with size 4
0x391009 safe->val: 0
Type is "x" with size 8
0x391008 safe->val: 0
Type is "x" with size 8
0x391009 safe->val: 0
Type is "e" with size 8
0x391008 safe->val: 0
Type is "e" with size 8
0x391009 safe->val: 0

No bus errors anymore! It worked as expected. To gain some understanding of what was happening behind the curtains, I disassembled the generated ARM binaries. For both access1 and access2, the same instruction was being used when I was getting a value after casting to int: LDR, which unsurprisingly loads a 32-bit word into a register. But for the long long, I found that access1 was using LDRD, which loads double words (8 bytes) from memory, while access2 was using two LDR instructions instead.

This all made a lot of sense, as ARM states that LDR supports access to unaligned data, while LDRD does not ⁸. Indeed the later is faster, but has this restriction. It was also good to check that there was no penalty for using the packed structure for integers: GCC does a good job to discriminate when the CPU really needs to handle differently unaligned accesses.

GCC cast-align Warning

GCC has a warning that can help to identify points in the code when we might be accessing unaligned data, which is activated with -Wcast-align. It is not part of the warnings that are activated by options -Wall or -Wextra, so we will have to add it explicitly if we want it. The warning is only triggered when compiling for architectures that do not support unaligned access for all types, so you will not see it if compiling only for x86.

When triggered, you will see something like

file.c:28:23: warning: cast increases required alignment of target type [-Wcast-align]
   int *my_int_ptr = (int *) &buf[i];
                     ^

Conclusion

The moral of this post is that you need to be very careful when casting pointers to a type different to the original one ⁹. When you need to do that, think about alignment issues first, and also think on your target architectures. There are programs that we want to run on more than one CPU type and too many times we only test in our reference.

Unfortunately the C standard does not give us a standard way of doing efficient access to unaligned data, but most if not all compilers seem to provide ways to do this. If we are using GCC, __attribute__((__packed__)) can help us when we might be doing unaligned accesses. The ARM compiler has a __packed attribute for pointers ¹⁰, and I am sure other compilers provide similar machinery. I also recommend to activate -Wcast-align if using GCC, which makes easier to spot alignment issues.

Finally, a word of caution: in most cases you should not do this type of casts. Some times you can define structures and read directly data onto them, some times you can use unions. Bear in mind always the strict pointer aliasing rules, which can hit back. To summarize, think twice before using the sort of trick showed in the post, and use them only when really needed.

Category: C/C++

A MeshLab plugin to find optimal oriented bounding boxes

How to Access Safely Unaligned Data

The C Standard-Compliant Solution

Doing Unaligned Memory Accesses

Fixing Unaligned Memory Accesses

GCC cast-align Warning

Conclusion