Home Rumble Youtube Twitter/X Kofi Contact / Crypto

Macros in C

Macros have a bad rap, mostly because people are too lazy to learn how they actually work. They can be abused, but so can loops, pointers, and even variable names. Don't think variable names can be abused? Try reading some code written by mathematicians.

Macros are simple but nobody explains them well. I only learned how they really worked when I went to write my own preprocessor and dug through GCC's documentation on how their preprocessor works. The preprocessor operates on lexical tokens, not text. As such, comments are stripped, whitespace disappears, and all inputs and outputs of a macro must be a valid C token. You can't use macros to make code that is lexically invalid.

Traditionally, macros names are declared in all upper case. This is a good convention to follow in general, but can be ignored in cases where there is no special magic happening and the macro behaves more or less how a normal function would. It can also be ignored, like and formatting convention, when doing so is the significant lesser of evils.

Macro Expansion

Macros exist in two forms: "object-like" and "function-like". Object-like macros are just an identifier like DEBUG or INT_MAX. It is evaluated by replacing it in the token stream with its definition, then re-scanning the resulting tokens starting from the beginning. This results in the recursive expansion of any macros inside an object-like macro. That's all there is to it.

It should be noted here that macros cannot be self-recursive. When it's first expanded, the preprocessor marks a given macro as used within the context of this expansion and doesn't allow it to be expanded again. If you try, the unexpanded version of the text will appear instead. This is annoying when making certain variadic macros, but also makes expansion deterministic. M4 allows recursive macros, and you can get stuck in an infinite loop pretty easily.

Function-like macros have an argument list that looks like a function. Evaluation happens in three stages:

  1. The values passed in to each argument is scanned and fully expanded, including any level of macros within:
  2. The expanded arguments are replaced into the body of the macro where their respective variables are:
  3. The entire resulting stream of tokens is scanned and expanded from the start. Function-like macros can create new function-like macro invocations when expanded: #define EXEC(fn, ...) EXEC_(fn, (__VA_ARGS__)) #define EXEC_(a, b) a b This last expansion pass catches any new macro invocations created this way, or created through token-pasting: #define type_max(a) a ## _MAX int sz = type_max(INT);

Special Macro Features

There are a few special tokens used to do certain things inside a macro. Macros can be variadic similar to C functions. Named arguments are filled in in order first, then the rest of the arguments passed in can be referenced with __VA_ARGS__. They still have commas in between so you can use __VA_ARGS__ to pass all of them to a subsequent macros. You'll often see macros doing this to peel off the first few arguments from the list one at a time before passing the rest along to the next macro in a chain.

The special builtin function-like macro __VA_OPT__(x) expands to its argument list if there is anything inside __VA_ARGS__, and to nothing if there isn't. Among other things, you can use it to add the comma between fixed arguments and __VA_ARGS__ but only if needed.

The token # used to the left of any other token inside a macro will wrap its contents in double quotes, stringifying them.

The token ## used between tokens will paste them together as if they were one token, eliminating any lexical whitespace between.

All The Stuff You Were (maybe) Never Told

The above is a quick overview of how macros technically work but it doesn't tell you how to do anything useful with them, nor inform you about common utilities or practices.

Gotchas

Generally, you should wrap arguments in parenthesis because otherwise operator precedence may not be what you expect: #define alloc2D_bad(ptr, w, h) malloc(sizeof(*ptr, w * h) #define alloc2D_good(ptr, w, h) malloc(sizeof(*(ptr), (w) * (h)) alloc2D_bad(f + 1, 1024 - 1, 2048 - 1) // --> malloc(sizeof((*f) + 1), 1024 - (1 * 2048) - 1) alloc2D_good(f + 1, 1024 - 1, 2048 - 1) // --> malloc(sizeof(*(f + 1)), (1024 - 1) * (2048 - 1))

Macros with multiple statements inside need to be appropriately wrapped in order to not behave in unexpected ways when used with flow control statements: #define update_bad(a, i) *(a) = i; (a) += 2 #define update_good(a, i) do { *(a) = i; (a) += 2; } while(0) for(int i = 0; i < 10; i++) update_bad(vals, i); // --> for(int i = 0; i < 10; i++) { *(vals) = i; } (vals) += 2; for(int i = 0; i < 10; i++) update_good(vals, i); // --> for(int i = 0; i < 10; i++) do { *(vals) = i; (vals) += 2; } while(0);

If an argument is used more than once inside a macro body, any side effects it has may happen as many times, unexpected by the user: #define MAX(a, b) ((a) > (b) > (a) : (b)) z = MAX(x++, y++); // --> z = ((x++) > (y++) > (x++) : (y++))

You can often get around this by declaring a local temporary variable inside of a block: #define MAX(a, b) ({ \ __typeof__(a) a__ = (a); \ __typeof__(b) b__ = (b); \ a__ > b__ ? a__ : b__; \ }) z = MAX(x++, y++); // ({ // __typeof__(x++) a__ = (x++); // __typeof__(y++) b__ = (y++); // a__ > b__ ? a__ : b__; // }) Note that the __typeof__ builtin does not evaluate its argument.

Near-universal macros you will see

#define CAT(a, b) a ## b token-pastes its arguments together. It needs to be a separate macro in order to isolate the arguments from subsequent parenthesis before being pasted together.

PP_NARG() is a macro from way back in the day that evaluates to the number of arguments (within sane limits) passed into it. You pass in __VA_ARGS__ and it tells you how many there are. You can use this to make overloaded functions.

Useful variadic functions

C has no built-in way of determining how many arguments were passed in to a variadic function. There are historical reasons for this which are hard to change now without breaking other code, and unlike with unstable hipster langauges, the C standards committee tries to not break things.

We can work around this with a macro wrapper. Add an underscore to the end of the real function name, and a "secret" first argument which will be the number of variadic arguments passed in, then wrap it in a macro with PP_NARG: char* str_join_(int nargs, char* joiner, ...); #define str_join(joiner, ...) str_join_(PP_NARG(__VA_ARGS__), joiner __VA_OPT(,) __VA_ARGS__) Now you can loop over the variadic arguments inside the function.

Function overloading and default arguments

C has no built-in function overloading (thank God), but sometimes it's a useful feature to use carefully and in moderation. Macros allow you to choose between several different functions, or invocations of them, based on the number of arguments. #define pcalloc(...) pcalloc_N(PP_NARG(__VA_ARGS__), __VA_ARGS__) #define pcalloc_N(n, __VA_ARGS__) CAT(pcalloc_, n)(__VA_ARGS__) #define pcalloc_1(ptr) calloc(1, sizeof(*(ptr)) #define pcalloc_2(ptr, cnt) calloc(1, sizeof(*(ptr) * (cnt)) #define pcalloc_3(ptr, xdim, ydim) calloc(1, sizeof(*(ptr) * (xdim) * (ydim)) #define pcalloc_4(ptr, xdim, ydim, zdim) calloc(1, sizeof(*(ptr) * (xdim) * (ydim) * (zdim))