Skip to main content
3 of 5
added 44 characters in body
chux
  • 36.4k
  • 2
  • 43
  • 97

There is a lot to review here.


Power-of-2 read size in io_read_line()

Since it looks like code is trying to use a power-of-2 page_size, Lets fread() read a power-of-2 bytes

    // IO_REALLOC(content, len + page_size)
    // ...
    // rcount = fread(content + len, 1, page_size - 1, stream);

    IO_REALLOC(content, len + page_size + 1)
    ...
    rcount = fread(content + len, 1, page_size, stream);

OR

Just allocate in a power of 2 increment and perform, when needed, a IO_REALLOC() at the end to accommodate the append null character.

OR

Just allocate in a power of 2 increment and always perform a "right-size" IO_REALLOC() at the end to accommodate the exact size needed.

O() of io_read_line() and io_read_line()

Rather than grow allocation linearly by capacity += page_size, use a geometric growth. I like 0, 1, 3, 7, 15 ... SZIE_MAX or capacity = capacity*2 + 1.

With linear growth code is making O(length) allocations and potentiality incurring O(length*length) time.

With geometric growth, code is making O(ln(length)) allocations and potentiality incurring O(length*ln(length)) time.

Minor: reformat for clarity

Consider putting attributes on following line,

// IO_DEF char *io_read_file(FILE *stream,
//   size_t *nbytes) ATTRIB_NONNULL(1) ATTRIB_WARN_UNUSED_RESULT;

IO_DEF char *io_read_file(FILE *stream, size_t *nbytes) //
  ATTRIB_NONNULL(1) ATTRIB_WARN_UNUSED_RESULT;

Bug: Infinite loop in io_read_file()?

When fread(content + len, 1, page_size - 1, stream); returns 0 due to end-of-file, for (size_t rcount = 1; rcount; len += rcount) iterates endlessly.

Do not assume file error is false before io_read_file()

Similar problem in io_read_line().

    // ferror(stream) could be true due to prior stream error.
    if (ferror(stream)) {
        IO_FREE(content);
        return content = NULL;
    }

Fix for this and prior bug:

    size_t n = page_size - 1; // Or n = page_size (see above).
    rcount = fread(content + len, 1, n, stream);
    if (rcount < n) {
      if (!feof(stream)) {
        IO_FREE(content);
        return content = NULL;
      }
      break;
    }

*nbytes on error in io_read_line()

IMO, *nbytes should always be set, when nbytes != NULL, even when NULL is returned from the function.

Code may return non-NULL on input error in io_read_line()

Code has:

             if (feof(stream)) {
                if (!count) {
                    free(line);
                    return NULL;
                }
                /* Return what was read. */
                break;
            }

which, IMO, should always return NULL when an input error occurs, not just when count == 0.

Bug: Unsafe re-alloc()

In the following code, should tmp == NULL, line[count] may be beyond prior allocation as count == capacity is possible.

void *tmp = realloc(line, count + 1);
if (tmp) {
    line = tmp;
}
line[count] = '\0';  // not safe

Questionable file size

while ((rcount = fread(chunk, 1, CHUNK_SIZE, stream)) > 0) continues to read even when a read error occurs. Read errors are not certainly sticky. The read error flag is sticky.

Better as

IO_DEF bool io_fsize(FILE *stream, size_t *size) {
    size_t rcount = 0;
    char chunk[CHUNK_SIZE];

    // Consider a `rewind()` here
    // Depends on what OP wants should stream not be at the beginning.

    do {
      rcount = fread(chunk, 1, CHUNK_SIZE, stream);
     *size += rcount;
    } while (rcount == CHUNK_SIZE);
    return !ferror(stream);
}

Unsigned constants

Since CHUNK_SIZE and TOKEN_CHUNK_SIZE are use exclusively for sizing, uses size_t math and types.

// #define CHUNK_SIZE          INT64_C(1024 * 8)
// #define TOKEN_CHUNK_SIZE    INT64_C(1024 * 2)
#define CHUNK_SIZE          ((size_t)1024 * 8)
#define TOKEN_CHUNK_SIZE    ((size_t)1024 * 2)

Side issue: INT64_C(1024 * 2) make a constant that is at least 64-bit, but I do not think that means the 1024 * 2 is done first using 64-bit math, but with int math. Might be a problem with 1024 * 1024 * 1024 * 2 as that int overflows.

File size versus size_t

There are many systems where file size > SIZE_MAX. Consider using wider math for file size or detect overflow.

Naked Magic numbers

With ATTRIB_NONNULL(1, 2), what is 1 and 2 about?

Use a named macro rather than 1, 2. If not possible, at least comment why 1 ,2.

Uncomplicate long lines

// char **const tmp = IO_REALLOC(tokens,
//            sizeof *tokens * (capacity += chunk_size));
capacity += chunk_size;
char **const tmp = IO_REALLOC(tokens, sizeof *tokens * capacity);

Style: little reason to type the return from realloc()

(Further, I like comparing with == clearer than !=. 'I don't know half of you half ...)

        // char **const tmp = IO_REALLOC(tokens, sizeof *tokens * (capacity += chunk_size));
        // if (!tmp) {
        //    IO_FREE(tokens);
        //    return NULL;
        // }
        // tokens = tmp;

        void *tmp = IO_REALLOC(tokens, sizeof *tokens * (capacity += chunk_size));
        if (tmp == NULL) {
            IO_FREE(tokens);
            return NULL;
        }
        tokens = tmp; 

Note: if (ferror()) can mislead.


I realize now OP's post comes after a recent post. It looks like there is a fair amount of repetition of same issues.

chux
  • 36.4k
  • 2
  • 43
  • 97