Skip to content

Latest commit

 

History

History
840 lines (704 loc) · 35.9 KB

clang_fortify_anatomy.md

File metadata and controls

840 lines (704 loc) · 35.9 KB

This document was originally written for a broad audience, and it was determined that it'd be good to hold in Bionic's docs, too. Due to the ever-changing nature of code, it tries to link to a stable tag of Bionic's libc, rather than the live code in Bionic. Same for Clang. Reader beware. :)

The Anatomy of Clang FORTIFY

Objective

The intent of this document is to run through the minutiae of how Clang FORTIFY actually works in Bionic at the time of writing. Other FORTIFY implementations that target Clang should use very similar mechanics. This document exists in part because many Clang-specific features serve multiple purposes simultaneously, so getting up-to-speed on how things function can be quite difficult.

Background

FORTIFY is a broad suite of extensions to libc aimed at catching misuses of common library functions. Textually, these extensions exist purely in libc, but all implementations of FORTIFY rely heavily on C language extensions in order to function at all.

Broadly, FORTIFY implementations try to guard against many misuses of C standard(-ish) libraries:

  • Buffer overruns in functions where pointers+sizes are passed (e.g., memcpy, poll), or where sizes exist implicitly (e.g., strcpy).
  • Arguments with incorrect values passed to libc functions (e.g., out-of-bounds bits in umask).
  • Missing arguments to functions (e.g., open() with O_CREAT, but no mode bits).

FORTIFY is traditionally enabled by passing -D_FORTIFY_SOURCE=N to your compiler. N==0 disables FORTIFY, whereas N==1, N==2, and N==3 enable increasingly strict versions of it. In general, FORTIFY doesn't require user code changes; that said, some code patterns are incompatible with stricter versions of FORTIFY checking. This is largely because FORTIFY has significant flexibility in what it considers to be an "out-of-bounds" access.

FORTIFY implementations use a mix of compiler diagnostics and runtime checks to flag and/or mitigate the impacts of the misuses mentioned above.

Further, given FORTIFY's design, the effectiveness of FORTIFY is a function of -- among other things -- the optimization level you're compiling your code at. Many FORTIFY implementations are implicitly disabled when building with -O0, since FORTIFY's design for both Clang and GCC relies on optimizations in order to provide useful run-time checks. For the purpose of this document, all analysis of FORTIFY functions and commentary on builtins assume that code is being built with some optimization level > -O0.

A note on GCC

This document talks specifically about Bionic's FORTIFY implementation targeted at Clang. While GCC also provides a set of language extensions necessary to implement FORTIFY, these tools are different from what Clang offers. This divergence is an artifact of Clang and GCC's differing architecture as compilers.

Textually, quite a bit can be shared between a FORTIFY implementation for GCC and one for Clang (e.g., see ChromeOS' Glibc patch), but this kind of sharing requires things like macros that expand to unbalanced braces depending on your compiler:

/*
 * Highly simplified; if you're interested in FORTIFY's actual implementation,
 * please see the patch linked above.
 */
#ifdef __clang__
# define FORTIFY_PRECONDITIONS
# define FORTIFY_FUNCTION_END
#else
# define FORTIFY_PRECONDITIONS {
# define FORTIFY_FUNCTION_END }
#endif

/*
 * FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN is not defined, due to its
 * complexity and irrelevance. It turns into a compile-time warning if the
 * compiler can determine `*buf` has fewer than `size` bytes available.
 */

char *getcwd(char *buf, size_t size)
FORTIFY_PRECONDITIONS
  FORTIFY_WARNING_ONLY_IF_SIZE_OF_BUF_LESS_THAN(buf, size, "`buf` is too smol.")
{
  // Actual shared function implementation goes here.
}
FORTIFY_FUNCTION_END

All talk of GCC-focused implementations and how to merge Clang and GCC implementations is out-of-scope for this doc, however.

The Life of a Clang FORTIFY Function

As referenced in the Background section, FORTIFY performs many different checks for many functions. This section intends to go through real-world examples of FORTIFY functions in Bionic, breaking down how each part of these functions work, and how the pieces fit together to provide FORTIFY-like functionality.

While FORTIFY implementations may differ between stdlibs, they broadly follow the same patterns when implementing their checks for Clang, and they try to make similar promises with respect to FORTIFY compiling to be zero-overhead in some cases, etc. Moreover, while this document specifically examines Bionic, many stdlibs will operate very similarly to Bionic in their Clang FORTIFY implementations.

In general, when reading the below, be prepared for exceptions, subtlety, and corner cases. The individual function breakdowns below try to not offer redundant information. Each one focuses on different aspects of FORTIFY.

Terminology

Because FORTIFY should be mostly transparent to developers, there are inherent naming collisions here: memcpy(x, y, z) turns into fundamentally different generated code depending on the value of _FORTIFY_SOURCE. Further, said memcpy call with _FORTIFY_SOURCE enabled needs to be able to refer to the memcpy that would have been called, had _FORTIFY_SOURCE been disabled. Hence, the following convention is followed in the subsections below for all prose (namely, multiline code blocks are exempted from this):

  • Standard library function names preceded by __builtin_ refer to the use of the function with _FORTIFY_SOURCE disabled.
  • Standard library function names without a prefix refer to the use of the function with _FORTIFY_SOURCE enabled.

This convention also applies in clang. __builtin_memcpy will always call memcpy as though _FORTIFY_SOURCE were disabled.

Breakdown of mempcpy

The FORTIFY'ed version of mempcpy is a full, featureful example of a FORTIFY'ed function from Bionic. From the user's perspective, it supports a few things:

  • Producing a compile-time error if the number of bytes to copy trivially exceeds the number of bytes available at the destination pointer.
  • If the mempcpy has the potential to write to more bytes than what is available at the destination, a run-time check is inserted to crash the program if more bytes are written than what is allowed.
  • Compiling away to be zero overhead when none of the buffer sizes can be determined at compile-time1.

The declaration in Bionic's headers for __builtin_mempcpy is:

void* mempcpy(void* __dst, const void* __src, size_t __n) __INTRODUCED_IN(23);

Which is annotated with nothing special, so it will be ignored.

The source for mempcpy in Bionic's headers for is:

__BIONIC_FORTIFY_INLINE
void* mempcpy(void* const dst __pass_object_size0, const void* src, size_t copy_amount)
        __overloadable
        __clang_error_if(__bos_unevaluated_lt(__bos0(dst), copy_amount),
                         "'mempcpy' called with size bigger than buffer") {
#if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
    size_t bos_dst = __bos0(dst);
    if (!__bos_trivially_ge(bos_dst, copy_amount)) {
        return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
    }
#endif
    return __builtin_mempcpy(dst, src, copy_amount);
}

Expanding some of the important macros here, this function expands to roughly:

static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
void* mempcpy(
        void* const dst __attribute__((pass_object_size(0))),
        const void* src,
        size_t copy_amount)
        __attribute__((overloadable))
        __attribute__((diagnose_if(
            __builtin_object_size(dst, 0) != -1 && __builtin_object_size(dst, 0) <= copy_amount),
            "'mempcpy' called with size bigger than buffer"))) {
#if __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
    size_t bos_dst = __builtin_object_size(dst, 0);
    if (!(__bos_trivially_ge(bos_dst, copy_amount))) {
        return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
    }
#endif
    return __builtin_mempcpy(dst, src, copy_amount);
}

So let's walk through this step by step, to see how FORTIFY does what it says on the tin here.

How does Clang select mempcpy?

First, it's critical to notice that mempcpy is marked overloadable. This function is a static inline __attribute__((always_inline)) overload of __builtin_mempcpy:

  • __attribute__((overloadable)) allows us to perform overloading in C.
  • __attribute__((overloadable)) mangles all calls to functions marked with __attribute__((overloadable)).
  • __attribute__((overloadable)) allows exactly one function signature with a given name to not be marked with __attribute__((overloadable)). Calls to this overload will not be mangled.

Second, one might note that this mempcpy implementation has the same C-level signature as __builtin_mempcpy. pass_object_size is a Clang attribute that is generally needed by FORTIFY, but it carries the side-effect that functions may be overloaded simply on the presence (or lack of presence) of pass_object_size attributes. Given two overloads of a function that only differ on the presence of pass_object_size attributes, the candidate with pass_object_size attributes is preferred.

Finally, the prior paragraph gets thrown out if one tries to take the address of mempcpy. It is impossible to take the address of a function with one or more parameters that are annotated with pass_object_size. Hence, &__builtin_mempcpy == &mempcpy. Further, because this is an issue of overload resolution, (&mempcpy)(x, y, z); is functionally identical to __builtin_mempcpy(x, y, z);.

All of this accomplishes the following:

  • Direct calls to mempcpy should call the FORTIFY-protected mempcpy.
  • Indirect calls to &mempcpy should call __builtin_mempcpy.

How does Clang offer compile-time diagnostics?

Once one is convinced that the FORTIFY-enabled overload of mempcpy will be selected for direct calls, Clang's diagnose_if and __builtin_object_size do all of the work from there.

Subtleties here primarily fall out of the discussion in the above section about &__builtin_mempcpy == &mempcpy:

#define _FORTIFY_SOURCE 2
#include <string.h>
void example_code() {
  char buf[4]; // ...Assume sizeof(char) == 1.
  const char input_buf[] = "Hello, World";
  mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued.

  mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5.
  __builtin_mempcpy(buf, input_buf, 5); // No compile-time error.
  (&mempcpy)(buf, input_buf, 5); // No compile-time error, since __builtin_mempcpy is selected.
}

Otherwise, the rest of this subsection is dedicated to preliminary discussion about __builtin_object_size.

Clang's frontend can do one of two things with __builtin_object_size(p, n):

  • Evaluate it as a constant.
    • This can either mean declaring that the number of bytes at p is definitely impossible to know, so the default value is used, or the number of bytes at p can be known without optimizations.
  • Declare that the expression cannot form a constant, and lower it to @llvm.objectsize, which is discussed in depth later.

In the examples above, since diagnose_if is evaluated with context from the caller, Clang should be able to trivially determine that buf refers to a char array with 4 elements.

The primary consequence of the above is that diagnostics can only be emitted if no optimizations are required to detect a broken code pattern. To be specific, clang's constexpr evaluator must be able to determine the logical object that any given pointer points to in order to fold __builtin_object_size to a constant, non-default answer:

#define _FORTIFY_SOURCE 2
#include <string.h>
void example_code() {
  char buf[4]; // ...Assume sizeof(char) == 1.
  const char input_buf[] = "Hello, World";
  mempcpy(buf, input_buf, 4); // Valid, no diagnostic issued.
  mempcpy(buf, input_buf, 5); // Emits a compile-time error since sizeof(buf) < 5.
  char *buf_ptr = buf;
  mempcpy(buf_ptr, input_buf, 5); // No compile-time error; `buf_ptr`'s target can't be determined.
}

How does Clang insert run-time checks?

This section expands on the following statement: FORTIFY has zero runtime cost in instances where there is no chance of catching a bug at run-time. Otherwise, it introduces a tiny additional run-time cost to ensure that functions aren't misused.

In prior sections, the following was established:

  • overloadable and pass_object_size prompt Clang to always select this overload of mempcpy over __builtin_mempcpy for direct calls.
  • If a call to mempcpy was trivially broken, Clang would produce a compile-time error, rather than producing a binary.

Hence, the case we're interested in here is one where Clang's frontend selected a FORTIFY'ed function's implementation for a function call, but was unable to find anything seriously wrong with said function call. Since the frontend is powerless to detect bugs at this point, our focus shifts to the mechanisms that LLVM uses to support FORTIFY.

Going back to Bionic's mempcpy implementation, we have the following (ignoring diagnose_if and assuming run-time checks are enabled):

static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
void* mempcpy(
        void* const dst __attribute__((pass_object_size(0))),
        const void* src,
        size_t copy_amount)
        __attribute__((overloadable)) {
    size_t bos_dst = __builtin_object_size(dst, 0);
    if (bos_dst != -1 &&
        !(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) {
        return __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);
    }
    return __builtin_mempcpy(dst, src, copy_amount);
}

In other words, we have a static, always_inline function which:

  • If __builtin_object_size(dst, 0) cannot be determined (in which case, it returns -1), calls __builtin_mempcpy.
  • Otherwise, if copy_amount can be folded to a constant, and if __builtin_object_size(dst, 0) >= copy_amount, calls __builtin_mempcpy.
  • Otherwise, calls __builtin___mempcpy_chk.

How can this be "zero overhead"? Let's focus on the following part of the function:

    size_t bos_dst = __builtin_object_size(dst, 0);
    if (bos_dst != -1 &&
        !(__builtin_constant_p(copy_amount) && bos_dst >= copy_amount)) {

If Clang's frontend cannot determine a value for __builtin_object_size, Clang lowers it to LLVM's @llvm.objectsize intrinsic. The @llvm.objectsize invocation corresponding to __builtin_object_size(p, 0) is guaranteed to always fold to a constant value by the time LLVM emits machine code.

Hence, bos_dst is guaranteed to be a constant; if it's -1, the above branch can be eliminated entirely, since it folds to if (false && ...). Further, the RHS of the && in this branch has us call __builtin_mempcpy if copy_amount is a known value less than bos_dst (yet another constant value). Therefore, the entire condition is always knowable when LLVM is done with LLVM IR-level optimizations, so no condition is ever emitted to machine code in practice.

Why is "zero overhead" in quotes? Why is unique_ptr relevant?

__builtin_object_size and __builtin_constant_p are forced to be constants after most optimizations take place. Until LLVM replaces both of these with constants and optimizes them out, we have additional branches and function calls in our IR. This can have negative effects, such as distorting inlining costs and inhibiting optimizations that are conservative around branches in control-flow.

So FORTIFY is free in these cases in isolation of any of the code around it. Due to its implementation, it may impact the optimizations that occur on code around the literal call to the FORTIFY-hardened libc function.

unique_ptr was just the first thing that came to the author's mind for "the type should be zero cost with any level of optimization enabled, but edge-cases might make it only-mostly-free to use."

How is checking actually performed?

In cases where checking can be performed (e.g., where we call __builtin___mempcpy_chk(dst, src, copy_amount, bos_dst);), Bionic provides an implementation for __mempcpy_chk. This is:

extern "C" void* __mempcpy_chk(void* dst, const void* src, size_t count, size_t dst_len) {
  __check_count("mempcpy", "count", count);
  __check_buffer_access("mempcpy", "write into", count, dst_len);
  return mempcpy(dst, src, count);
}

This function itself boils down to a few small branches which abort the program if they fail, and a direct call to __builtin_mempcpy.

Wrapping up

In the above breakdown, it was shown how Clang and Bionic work together to:

  • represent FORTIFY-hardened overloads of functions,
  • report misuses of stdlib functions at compile-time, and
  • insert run-time checks for uses of functions that might be incorrect, but only if we have the potential of proving the incorrectness of these.

Breakdown of open

In Bionic, the FORTIFY'ed implementation of open is quite large. Much like mempcpy, the __builtin_open declaration is simple:

int open(const char* __path, int __flags, ...);

With some macros expanded, the FORTIFY-hardened header implementation is:

int __open_2(const char*, int);
int __open_real(const char*, int, ...) __asm__(open);

#define __open_modes_useful(flags) (((flags) & O_CREAT) || ((flags) & O_TMPFILE) == O_TMPFILE)

static
int open(const char* pathname, int flags, mode_t modes, ...) __overloadable
        __attribute__((diagnose_if(1, "error", "too many arguments")));

static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags)
        __attribute__((overloadable))
        __attribute__((diagnose_if(
            __open_modes_useful(flags),
            "error",
            "'open' called with O_CREAT or O_TMPFILE, but missing mode"))) {
#if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
    return __open_2(pathname, flags);
#else
    return __open_real(pathname, flags);
#endif
}
static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes)
        __attribute__((overloadable))
        __clang_warning_if(!__open_modes_useful(flags) && modes,
                           "'open' has superfluous mode bits; missing O_CREAT?") {
    return __open_real(pathname, flags, modes);
}

Which may be a lot to take in.

Before diving too deeply, please note that the remainder of these subsections assume that the programmer didn't make any egregious typos. Moreover, there's no real way that Bionic tries to prevent calls to open like open("foo", 0, "how do you convert a const char[N] to mode_t?");. The only real C-compatible solution the author can think of is "stamp out many overloads to catch sort-of-common instances of this very uncommon typo." This isn't great.

More directly, no effort is made below to recognize calls that, due to incompatible argument types, cannot go to any open implementation other than __builtin_open, since it's recognized right here. :)

Implementation breakdown

This open implementation does a few things:

  • Turns calls to open with too many arguments into a compile-time error.
  • Diagnoses calls to open with missing modes at compile-time and run-time (both cases turn into errors).
  • Emits warnings on calls to open with useless mode bits, unless the mode bits are all 0.

One common bit of code not explained below is the __open_real declaration above:

int __open_real(const char*, int, ...) __asm__(open);

This exists as a way for us to call __builtin_open without needing clang to have a pre-defined __builtin_open function.

Compile-time error on too many arguments

static
int open(const char* pathname, int flags, mode_t modes, ...) __overloadable
        __attribute__((diagnose_if(1, "error", "too many arguments")));

Which matches most calls to open that supply too many arguments, since int(const char *, int, ...) matches less strongly than int(const char *, int, mode_t, ...) for calls where the 3rd arg can be converted to mode_t without too much effort. Because of the diagnose_if attribute, all of these calls turn into compile-time errors.

Compile-time or run-time error on missing arguments

The following overload handles all two-argument calls to open.

static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags)
        __attribute__((overloadable))
        __attribute__((diagnose_if(
            __open_modes_useful(flags),
            "error",
            "'open' called with O_CREAT or O_TMPFILE, but missing mode"))) {
#if __ANDROID_API__ >= 17 && __BIONIC_FORTIFY_RUNTIME_CHECKS_ENABLED
    return __open_2(pathname, flags);
#else
    return __open_real(pathname, flags);
#endif
}

Like mempcpy, diagnose_if handles emitting a compile-time error if the call to open is broken in a way that's visible to Clang's frontend. This essentially boils down to "open is being called with a flags value that requires mode bits to be set."

If that fails to catch a bug, we unconditionally call __open_2, which performs a run-time check:

int __open_2(const char* pathname, int flags) {
  if (needs_mode(flags)) __fortify_fatal("open: called with O_CREAT/O_TMPFILE but no mode");
  return FDTRACK_CREATE_NAME("open", __openat(AT_FDCWD, pathname, force_O_LARGEFILE(flags), 0));
}

Compile-time warning if modes are pointless

Finally, we have the following open call:

static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int open(const char* const __attribute__((pass_object_size(1))) pathname, int flags, mode_t modes)
        __attribute__((overloadable))
        __clang_warning_if(!__open_modes_useful(flags) && modes,
                           "'open' has superfluous mode bits; missing O_CREAT?") {
    return __open_real(pathname, flags, modes);
}

This simply issues a warning if Clang's frontend can determine that flags isn't necessary. Due to conventions in existing code, a modes value of 0 is not diagnosed.

What about &open?

One yet-unaddressed aspect of the above is how &open works. This is thankfully a short answer:

  • It happens that open takes a parameter of type const char*.
  • It happens that pass_object_size -- an attribute only applicable to parameters of type T* -- makes it impossible to take the address of a function.

Since clang doesn't support a "this function should never have its address taken," attribute, Bionic uses the next best thing: pass_object_size. :)

Breakdown of poll

(Preemptively: at the time of writing, Clang has no literal __builtin_poll builtin. __builtin_poll is referenced below to remain consistent with the convention established in the Terminology section.)

Bionic's poll implementation is closest to mempcpy above, though it has a few interesting aspects worth examining.

The full header implementation of poll is, with some macros expanded:

#define __bos_fd_count_trivially_safe(bos_val, fds, fd_count) \
  ((bos_val) == -1) || \
    (__builtin_constant_p(fd_count) && \
    (bos_val) >= sizeof(*fds) * (fd_count)))

static
__inline__
__attribute__((no_stack_protector))
__attribute__((always_inline))
int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout)
    __attribute__((overloadable))
    __attriubte__((diagnose_if(
       __builtin_object_size(fds, 1) != -1 && __builtin_object_size(fds, 1) < sizeof(*fds) * fd_count,
        "error",
        "in call to 'poll', fd_count is larger than the given buffer"))) {
  size_t bos_fds = __builtin_object_size(fds, 1);
  if (!__bos_fd_count_trivially_safe(bos_fds, fds, fd_count)) {
    return __poll_chk(fds, fd_count, timeout, bos_fds);
  }
  return (&poll)(fds, fd_count, timeout);
}

To get the commonality with mempcpy and open out of the way:

  • This function is an overload with __builtin_poll.
  • The signature is the same, modulo the presence of a pass_object_size attribute. Hence, for direct calls, overload resolution will always prefer it over __builtin_poll. Taking the address of poll is forbidden, so all references to &poll actually reference __builtin_poll.
  • When fds is too small to hold fd_count pollfds, Clang will emit a compile-time error if possible using diagnose_if.
  • If this can't be observed until run-time, __poll_chk verifies this.
  • When fds is a constant according to __builtin_constant_p, this always compiles into __poll_chk for always-broken calls to poll, or __builtin_poll for always-safe calls to poll.

The critical bits to highlight here are on this line:

int poll(struct pollfd* const fds __attribute__((pass_object_size(1))), nfds_t fd_count, int timeout)

And this line:

  return (&poll)(fds, fd_count, timeout);

Starting with the simplest, we call __builtin_poll with (&poll)(...);. As referenced above, taking the address of an overloaded function where all but one overload has a pass_object_size attribute on one or more parameters always resolves to the function without any pass_object_size attributes.

The other line deserves a section. The subtlety of it is almost entirely in the use of pass_object_size(1) instead of pass_object_size(0). on the fds parameter, and the corresponding use of __builtin_object_size(fds, 1); in the body of poll.

Subtleties of __builtin_object_size(p, N)

Earlier in this document, it was said that a full description of each attribute/builtin necessary to power FORTIFY was out of scope. This is... only somewhat the case when we talk about __builtin_object_size and pass_object_size, especially when their second argument is 1.

tl;dr

__builtin_object_size(p, N) and pass_object_size(N), where (N & 1) == 1, can only be accurately determined by Clang. LLVM's @llvm.objectsize intrinsic ignores the value of N & 1, since handling (N & 1) == 1 accurately requires data that's currently entirely inaccessible to LLVM, and that is difficult to preserve through LLVM's optimization passes.

pass_object_size's "lifting" of the evaluation of __builtin_object_size(p, N) to the caller is critical, since it allows Clang full visibility into the expression passed to e.g., poll(&foo->bar, baz, qux). It's not a perfect solution, but it allows N == 1 to be fully accurate in at least some cases.

Background

Clang's implementation of __builtin_object_size aims to be compatible with GCC's, which has a decent bit of documentation. Put simply, __builtin_object_size(p, N) is intended to evaluate at compile-time how many bytes can be accessed after p in a well-defined way. Straightforward examples of this are:

char buf[8];
assert(__builtin_object_size(buf, N) == 8);
assert(__builtin_object_size(buf + 1, N) == 7);

This should hold for all values of N that are valid to pass to __builtin_object_size. The N value of __builtin_object_size is a mask of settings.

(N & 2) == ?

This is mostly for completeness sake; in Bionic's FORTIFY implementation, N is always either 0 or 1.

If there are multiple possible values of p in a call to __builtin_object_size(p, N), the second bit in N determines the behavior of the compiler. If (N & 2) == 0, __builtin_object_size should return the greatest possible size for each possible value of p. Otherwise, it should return the least possible value. For example:

char smol_buf[7];
char buf[8];
char *p = rand() ? smol_buf : buf;
assert(__builtin_object_size(p, 0) == 8);
assert(__builtin_object_size(p, 2) == 7);
(N & 1) == 0

__builtin_object_size(p, 0) is more or less as simple as the example in the Background section directly above. When Clang attempts to evaluate __builtin_object_size(p, 0); and when LLVM tries to determine the result of a corresponding @llvm.objectsize call to, they search for the storage underlying the pointer in question. If that can be determined, Clang or LLVM can provide an answer; otherwise, they cannot.

(N & 1) == 1, and the true magic of pass_object_size

__builtin_object_size(p, 1) has a less uniform implementation between LLVM and Clang. According to GCC's documentation, "If the least significant bit [of __builtin_object_size's second argument] is clear, objects are whole variables, if it is set, a closest surrounding subobject is considered the object a pointer points to."

The "closest surrounding subobject," means that (N & 1) == 1 depends on type information in order to operate in many cases. Consider the following examples:

struct Foo {
  int a;
  int b;
};

struct Foo foo;
assert(__builtin_object_size(&foo, 0) == sizeof(foo));
assert(__builtin_object_size(&foo, 1) == sizeof(foo));
assert(__builtin_object_size(&foo->a, 0) == sizeof(foo));
assert(__builtin_object_size(&foo->a, 1) == sizeof(int));

struct Foo foos[2];
assert(__builtin_object_size(&foos[0], 0) == 2 * sizeof(foo));
assert(__builtin_object_size(&foos[0], 1) == sizeof(foo));
assert(__builtin_object_size(&foos[0]->a, 0) == 2 * sizeof(foo));
assert(__builtin_object_size(&foos[0]->a, 1) == sizeof(int));

...And perhaps somewhat surprisingly:

void example(struct Foo *foo) {
  // (As a reminder, `-1` is "I don't know" when `(N & 2) == 0`.)
  assert(__builtin_object_size(foo, 0) == -1);
  assert(__builtin_object_size(foo, 1) == -1);
  assert(__builtin_object_size(foo->a, 0) == -1);
  assert(__builtin_object_size(foo->a, 1) == sizeof(int));
}

In Clang, this type-aware requirement poses problems for us: Clang's frontend knows everything we could possibly want about the types of variables, but optimizations are only performed by LLVM. LLVM has no reliable source for C or C++ data types, so calls to __builtin_object_size(p, N) that cannot be resolved by clang are lowered to the equivalent of __builtin_object_size(p, N & ~1) in LLVM IR.

Moreover, Clang's frontend is the best-equipped part of the compiler to accurately determine the answer for __builtin_object_size(p, N), given we know what p is. LLVM is the best-equipped part of the compiler to determine the value of p. This ordering issue is unfortunate.

This is where pass_object_size(N) comes in. To summarize the docs for pass_object_size, it evaluates __builtin_object_size(p, N) within the context of the caller of the function annotated with pass_object_size, and passes the value of that into the callee as an invisible parameter. All calls to __builtin_object_size(parameter, N) are substituted with references to this invisible parameter.

Putting this plainly, Clang's frontend struggles to evaluate the following:

int foo(void *p) {
  return __builtin_object_size(p, 1);
}

void bar() {
  struct { int i, j } k;
  // The frontend can't figure this interprocedural objectsize out, so it gets lowered to
  // LLVM, which determines that the answer here is sizeof(k).
  int baz = foo(&k.i);
}

However, with the magic of pass_object_size, we get one level of inlining to look through:

int foo(void *const __attribute__((pass_object_size(1))) p) {
  return __builtin_object_size(p, 1);
}

void bar() {
  struct { int i, j } k;
  // Due to pass_object_size, this is equivalent to:
  // int baz = foo(&k.i, __builtin_object_size(&k.i, 1));
  // ...and `int foo(void *)` is actually equivalent to:
  // int foo(void *const, size_t size) {
  //   return size;
  // }
  int baz = foo(&k.i);
}

So we can obtain an accurate result in this case.

What about pass_object_size(0)?

It's sort of tangential, but if you find yourself wondering about the utility of pass_object_size(0) ... it's somewhat split. pass_object_size(0) in Bionic's FORTIFY exists mostly for visual consistency, simplicity, and as a useful way to have e.g., &mempcpy == &__builtin_mempcpy.

Outside of these fringe benefits, all of the functions with pass_object_size(0) on parameters are marked with always_inline, so "lifting" the __builtin_object_size call isn't ultimately very helpful. In theory, users can always have something like:

// In some_header.h
// This function does cool and interesting things with the `__builtin_object_size` of its parameter,
// and is able to work with that as though the function were defined inline.
void out_of_line_function(void *__attribute__((pass_object_size(0))));

Though the author isn't aware of uses like this in practice, beyond a few folks on LLVM's mailing list seeming interested in trying it someday.

Wrapping up

In the (long) section above, two things were covered:

  • The use of (&poll)(...); is a convenient shorthand for calling __builtin_poll.
  • __builtin_object_size(p, N) with (N & 1) == 1 is not easy for Clang to answer accurately, since it relies on type info only available in the frontend, and it sometimes relies on optimizations only available in the middle-end. pass_object_size helps mitigate this.

Miscellaneous Notes

The above should be a roughly comprehensive view of how FORTIFY works in the real world. The main thing it fails to mention is the use of the diagnose_as_builtin attribute in Clang.

As time has moved on, Clang has increasingly gained support for emitting warnings that were previously emitted by FORTIFY machinery. diagnose_as_builtin allows us to remove the diagnose_ifs from some of the static inline overloads of stdlib functions above, so Clang may diagnose them instead.

Clang's built-in diagnostics are often better than diagnose_if diagnostics, since Clang can format its diagnostics to include e.g., information about the sizes of buffers in a suspect call to a function. diagnose_if can only have the compiler output constant strings.

Footnotes

  1. "Zero overhead" in a way similar to C++11's std::unique_ptr: this will turn into a direct call __builtin_mempcpy (or an optimized form thereof) with no other surrounding checks at runtime. However, the additional complexity may hinder optimizations that are performed before the optimizer can prove that the if (...) { ... } can be optimized out. Depending on how late this happens, the additional complexity may skew inlining costs, hide opportunities for e.g., memcpy coalescing, etc etc.