To add to @J_H's good points:
Only include what's required:
In "aho_corasick.h", <stddef.h> should suffice for the definition of size_t. You do not need to include <stdlib.h> for it.
Use static keyword for array parameters:
In C99 and above, you can do array[static size] in function parameters to specify that array must have at least size items.
A compiler has then the right to assume that argument is not nullptr and therefore it could perform some optimizations.
Moreover, this also documents the function better. As the array must have at least size items, it also means that a nullptr is not a valid value for this argument.
It also eliminates the need of having to document that the function would invoke undefined behavior if nullptrs are passed in, or if the function was testing if any parameter is nullptr, it eliminates onw or more branches.
After modifying one function prototype, this is how it looks:
// ac_automaton_t new_automaton(const char **words, size_t wc);
ac_automaton_t new_automaton(size_t wc, const char *words[static wc]);
Now for these calls:
new_automaton(10, nullptr);
static const char *const words[5] = {};
new_automaton(10, words);
char *p = nullptr;
new_automaton(102, p);
GCC 14.1 detects all 3 wrong use cases and raises diagnostic information, whilst Clang 18.1.0 and x86-64 icx 2024.0.0 only detect the first one. A win for GCC.
MSVC does not support this C99 feature (who is suprised?), but MinGW gcc 13.1.0 and MinGW clang 16.0.2 have the same output as GCC and Clang.
Note that this feature is not usable for return values, local variables, or for functions like memcpy() because arrays of void are illegal. A GCC extension, __attribute__((nonnull)) allows parameters to be marked as nonnull. This extension is also supported by Intel's compiler and Clang. A Clang extension allows _Nullable, _Nonnull and _Null_unspecified. This extension is not supported by GCC. A GCC extension, __attribute__((returns_nonnull)) can be used to specify that a function returns a nonnull value. This extension is also supported by Clang and Intel's compiler. I await an _Optional type in the next revision of the C Standard. There are already some proposals for it.
Aside: In C23, you can replace the ASCII_CHAR_COUNT macro with:
static constexpr unsigned int ASCII_CHAR_COUNT = 256;
struct ac_node *children[ASCII_CHAR_COUNT];
Potential Undefined Behavior:
ac_node_t *node = calloc(1, sizeof(ac_node_t));
node->parent = parent;
There's no check for calloc() failure. malloc() and family returns a nullptr on failure. Failing to check for it risks invoking undefined behavior by a subsequent nullptr dereference.
Change it to:
ac_node_t *node = calloc(1, sizeof *node);
if (!node) { // Or, node == nullptr or node == NULL
// Complain
TDB_code();
}
And similarly for the other calls to malloc() in the other functions.
Note that I have changed sizeof (ac_node_t) to sizeof *node. In the case that node was changed to some other type, you would not require a similar change in the calloc() call. This eases maintainence.
Portability:
From the Linux man page:
HISTORY top
<sys/queue.h> macros first appeared in 4.4BSD.
NOTES top
Some BSDs provide SIMPLEQ instead of STAILQ. They are identical,
but for historical reasons they were named differently on
different BSDs. STAILQ originated on FreeBSD, and SIMPLEQ
originated on NetBSD. For compatibility reasons, some systems
provide both sets of macros. glibc provides both STAILQ and
SIMPLEQ, which are identical except for a missing SIMPLEQ
equivalent to STAILQ_CONCAT().
It should be enough to define
_BSD_SOURCEforsys/queue.h. Compiling with-std=gnu17should not be necessarySome compile-time detection is required to use the proper macros provided by an implementation. As the man page says, some BSDs provide
SIMPLEQinstead ofSTAILQ.
To add to @J_H's "braces" point, see: Apple's SSL/TLS bug.
// Insert words
for (size_t i = 0; i < wc; ++i)
{
const char *w = words[i];
insert_word(root, w);
}
And do not comment the obvious. Though I don't see why you need the extra variable w, insert_word(root, words[i]) should suffice. And if you go to Reddit, @skeeto might see your post and teach you a thing or two about fuzz testing with AFL++ (the author of this review is not familiar with the API yet).