Overview of cross-architecture portability problems
Ideally, you’d want your program to work everywhere. Unfortunately, that’s not that simple, even if you’re using high-level “portable” languages such as Python. In this blog post, I’d like to focus on some aspects of cross-architecture problems I’ve seen or heard about during my time in Gentoo. Please note that I don’t mean this to be a comprehensive list of problems — instead, I’m aiming for an interesting read.
What breaks programs on 32-bit systems?
Basic integer type sizes
If you asked anyone what’s the primary difference between 64-bit and 32-bit architectures, they will probably answer that it’s register sizes. For many people, register sizes imply differences in basic integer types, and therefore the primary source of problems on 32-bit architectures, when programs are tested on 64-bit architectures only (which is commonly the case nowadays). Actually, it’s not that simple.
Contrary to common expectations, the differences in basic integer types are minimal. Most importantly, your plain int is 32-bit everywhere. The only type that’s actually different is long — it’s 32-bit on 32-bit architectures, and 64-bit on 64-bit architectures. However, people don’t use long all that often in modern programs, so that’s not very likely to cause issues.
Perhaps some people worry about integer sizes because they still foggily remember the issues from porting old 32-bit software to 64-bit architectures. As I’ve mentioned before, int remained 32-bit — but pointers became 64-bit. As a result, if you attempted to cast pointers (or related data) to int, you’d be in trouble (hence we have size_t, ssize_t, ptrdiff_t). Of course, the same thing (i.e. casting pointers to long) made for 64-bit architectures is ugly but won’t technically cause problems on 32-bit architectures.
Note that I’m talking about System V ABI here. Technically, the POSIX and the C standards don’t specify exact integer sizes, and permit a lot more flexibility (the C standard especially — up to having, say, all the types exactly 32-bit).
Address space size
Now, a more likely problem is the address space limitation. Since pointers are 32-bit on 32-bit architectures, a program can address no more than 4 GiB of memory (in reality, somewhat less than that). What’s really important here is that this limits allocated memory, even it is never actually used.
This can cause curious issues. For example, let’s say that you have a program that allocates a lot of memory, but doesn’t use most of it. If you run this program on a 64-bit system with 2 GiB of total memory, it works just fine. However, if you run it on 32-bit userland with a lot of memory, it fails. And why is that? It’s because the system permitted the program to allocate more memory than it could ever provide — risking an OOM if the program actually tried to use it all; but on the 32-bit architecture, it simply cannot fit all these allocations into 32-bit addresses.
The following sample can trivially demonstrate this: [...]