Macros have a bad rap, mostly because people are too lazy to learn how they actually work. They can be abused, but so can loops, pointers,
and even variable names. Don't think variable names can be abused? Try reading some code written by mathematicians.
Macros are simple but nobody explains them well. I only learned how they really worked when I went to write my own preprocessor and dug through
GCC's documentation on how their preprocessor works. The preprocessor operates on lexical tokens, not text. As such, comments are
stripped, whitespace disappears, and all inputs and outputs of a macro must be a valid C token. You can't use macros to make code that is
lexically invalid.
Traditionally, macros names are declared in all upper case. This is a good convention to follow in general, but can be ignored in cases
where there is no special magic happening and the macro behaves more or less how a normal function would. It can also be ignored, like
and formatting convention, when doing so is the significant lesser of evils.
Macro Expansion
Macros exist in two forms: "object-like" and "function-like". Object-like macros are just an identifier like
DEBUG or
INT_MAX.
It is evaluated by replacing it in the token stream with its definition, then re-scanning the resulting tokens starting from the beginning. This
results in the recursive expansion of any macros inside an object-like macro. That's all there is to it.
It should be noted here that macros cannot be self-recursive. When it's first expanded, the preprocessor marks a given macro as used
within the context of this expansion and doesn't allow it to be expanded again. If you try, the unexpanded version of the text will appear
instead. This is annoying when making certain variadic macros, but also makes expansion deterministic. M4 allows recursive macros, and you can get
stuck in an infinite loop pretty easily.
Function-like macros have an argument list that looks like a function. Evaluation happens in three stages:
- The values passed in to each argument is scanned and fully expanded, including any level of macros within:
- The expanded arguments are replaced into the body of the macro where their respective variables are:
- The entire resulting stream of tokens is scanned and expanded from the start. Function-like macros
can create new function-like macro invocations when expanded:
#define EXEC(fn, ...) EXEC_(fn, (__VA_ARGS__))
#define EXEC_(a, b) a b
This last expansion pass catches any new macro invocations created this way, or created through token-pasting:
#define type_max(a) a ## _MAX
int sz = type_max(INT);
Special Macro Features
There are a few special tokens used to do certain things inside a macro. Macros can be variadic similar to C functions. Named arguments are filled
in in order first, then the rest of the arguments passed in can be referenced with
__VA_ARGS__. They still have commas in between so you can
use
__VA_ARGS__ to pass all of them to a subsequent macros. You'll often see macros doing this to peel off the first few arguments from
the list one at a time before passing the rest along to the next macro in a chain.
The special builtin function-like macro __VA_OPT__(x) expands to its argument list if there is anything inside __VA_ARGS__, and to
nothing if there isn't. Among other things, you can use it to add the comma between fixed arguments and __VA_ARGS__ but only if needed.
The token # used to the left of any other token inside a macro will wrap its contents in double quotes, stringifying them.
The token ## used between tokens will paste them together as if they were one token, eliminating any lexical whitespace between.
All The Stuff You Were (maybe) Never Told
The above is a quick overview of how macros
technically work but it doesn't tell you how to do anything useful with them, nor inform you
about common utilities or practices.
Gotchas
Generally, you should wrap arguments in parenthesis because otherwise operator precedence may not be what you expect:
#define alloc2D_bad(ptr, w, h) malloc(sizeof(*ptr, w * h)
#define alloc2D_good(ptr, w, h) malloc(sizeof(*(ptr), (w) * (h))
alloc2D_bad(f + 1, 1024 - 1, 2048 - 1)
// --> malloc(sizeof((*f) + 1), 1024 - (1 * 2048) - 1)
alloc2D_good(f + 1, 1024 - 1, 2048 - 1)
// --> malloc(sizeof(*(f + 1)), (1024 - 1) * (2048 - 1))
Macros with multiple statements inside need to be appropriately wrapped in order to not behave in unexpected ways when used with flow control
statements:
#define update_bad(a, i) *(a) = i; (a) += 2
#define update_good(a, i) do { *(a) = i; (a) += 2; } while(0)
for(int i = 0; i < 10; i++) update_bad(vals, i);
// --> for(int i = 0; i < 10; i++) { *(vals) = i; } (vals) += 2;
for(int i = 0; i < 10; i++) update_good(vals, i);
// --> for(int i = 0; i < 10; i++) do { *(vals) = i; (vals) += 2; } while(0);
If an argument is used more than once inside a macro body, any side effects it has may happen as many times, unexpected by the user:
#define MAX(a, b) ((a) > (b) > (a) : (b))
z = MAX(x++, y++);
// --> z = ((x++) > (y++) > (x++) : (y++))
You can often get around this by declaring a local temporary variable inside of a block:
#define MAX(a, b) ({ \
__typeof__(a) a__ = (a); \
__typeof__(b) b__ = (b); \
a__ > b__ ? a__ : b__; \
})
z = MAX(x++, y++);
// ({
// __typeof__(x++) a__ = (x++);
// __typeof__(y++) b__ = (y++);
// a__ > b__ ? a__ : b__;
// })
Note that the __typeof__ builtin does not evaluate its argument.
Near-universal macros you will see
#define CAT(a, b) a ## b token-pastes its arguments together. It needs to be a separate macro in order to isolate the arguments from
subsequent parenthesis before being pasted together.
PP_NARG() is a macro from way back in the day that evaluates to the number of arguments (within sane limits) passed into it.
You pass in __VA_ARGS__ and it tells you how many there are. You can use this to make overloaded functions.
Useful variadic functions
C has no built-in way of determining how many arguments were passed in to a variadic function. There are historical reasons for this which are hard to
change now without breaking other code, and unlike with unstable hipster langauges, the C standards committee tries to not break things.
We can work around this with a macro wrapper. Add an underscore to the end of the real function name, and a "secret" first argument which will be the
number of variadic arguments passed in, then wrap it in a macro with PP_NARG:
char* str_join_(int nargs, char* joiner, ...);
#define str_join(joiner, ...) str_join_(PP_NARG(__VA_ARGS__), joiner __VA_OPT(,) __VA_ARGS__)
Now you can loop over the variadic arguments inside the function.
Function overloading and default arguments
C has no built-in function overloading (thank God), but sometimes it's a useful feature to use carefully and in moderation. Macros allow you
to choose between several different functions, or invocations of them, based on the number of arguments.
#define pcalloc(...) pcalloc_N(PP_NARG(__VA_ARGS__), __VA_ARGS__)
#define pcalloc_N(n, __VA_ARGS__) CAT(pcalloc_, n)(__VA_ARGS__)
#define pcalloc_1(ptr) calloc(1, sizeof(*(ptr))
#define pcalloc_2(ptr, cnt) calloc(1, sizeof(*(ptr) * (cnt))
#define pcalloc_3(ptr, xdim, ydim) calloc(1, sizeof(*(ptr) * (xdim) * (ydim))
#define pcalloc_4(ptr, xdim, ydim, zdim) calloc(1, sizeof(*(ptr) * (xdim) * (ydim) * (zdim))