C - Dynamic array handling advice

Question

I am currently learning C, as someone, who has quite a bit of experience in more high-level languages than that.

After considering some of your comments and doing a bit more testing and fidelling, I came up with a better solution (scroll down).

Advice and comments are still very welcome though.

That said, I struggle a bit with dynamic array handling in C. I got it down to a point, where I quite like, simply terminating every array with NULL and using realloc where neccessary.

Basically I am in search of advice: Is there any downside, of doing it that way (see example code below), besides the obvious (can't use NULL values as data value) ?

#include "stdlib.h"

int array_length(void* array)
{
    int i;
    for (i = 0; ((void**)array)[i]; i++);
    return i;
}

void array_add(void* array, void* item)
{
    int size = array_length(array);
    realloc(array, sizeof(item) * (size + 2));

    ((void**)array)[size] = item;
    ((void**)array)[size + 1] = NULL;
}

void* array_new(){
    void** array = malloc(sizeof(NULL));
    array[0] = NULL;
    return array;
}

#include "stdio.h"

void print_items(char** array){
    printf("%i - [", array_length(array));
    for (int i = 0; array[i]; i++){
        printf("\"%s\"", array[i]);
        if (array[i+1])
            printf(", ");
    }
    printf("]\n");
}

int main(){
    char** str_list = array_new();

    print_items(str_list);
    array_add(str_list, "Something!");
    print_items(str_list);
    array_add(str_list, "Another Thing.");
    print_items(str_list);
}

Better Version (Update!)

After reading through your answers and debugging a bit more myself, I came up with a new version, which is mainly based on the struct approach mentioned below.

Have a Look, and tell me, what I am doing wrong now. :)

#include "stdlib.h"
#include "assert.h"

typedef void* Any;

typedef struct {
    Any* items;
    int count;
} List;

List* list_new(){
    List* list = malloc(sizeof(List));
    list->items = NULL;
    list->count = 0;
    return list;
}

void list_add(List* list, Any element){
    int count = list->count + 1;
    list->items = realloc(list->items, count * sizeof(element));
    list->items[count-1] = element;
    list->count = count;
}

void list_free(List* list){
    if (list){
        if (list->items) free(list->items);
        free(list);
    }
}

void list_each(List* list, void (*callback)(List*, Any, int)){
    for (int i = 0; i < list->count; i++){
        callback(list, list->items[i], i);
    }
}

// Sample usage

#include "stdio.h"

void debug_item(List* list, Any item, int index){
    if(index > 0) printf(", ");
    printf("\"%s\"", item);
}

void debug(List* list){
    printf("%i : [", list->count);
    list_each(list, debug_item);
    printf("]\n");
}

int main(){
    List* cats = list_new();
    debug(cats);
    list_add(cats, "Baltazar");
    list_add(cats, "Scar");
    list_add(cats, "Garfield");
    debug(cats);
    list_free(cats);
}

malloc(sizeof(NULL)) will fail miserably if sizeof(NULL) < sizeof(void *). A safer idiom is T *p = malloc(sizeof *p) for some type T. Anyway, this 0-terminated array thing is very ugly, error-prone, and linear-time length querying can become a performance problem for large arrays. The usual practice when implementing a dynamic array is to maintain a base pointer and an explicit size. — The Paramagnetic Croissant
– The Paramagnetic Croissant, Commented Dec 31, 2014 at 0:24
Hey, thanks for the comment. 1. Why would malloc(sizeof(NULL))fail? Basically I do only need the space to store a integer 0 at the end of the stream, right? No matter the actual datatype. 2. I buy the performance argument, that makes sence, thanks. 3. Can you elaborate a bit more on the error phrone bit? What are the errors I could expect? Why is it so ugly? — Hr. Rabe
– Hr. Rabe, Commented Dec 31, 2014 at 0:26
(Not the original replier) I think in most implementations of vector, the capacity is the power of 2 >= the size. You may want to realloc in a similar fashion. I can see the error-prone issue, if you do for some reason need to store a NULL. In a large project, this would probably happen due to someone's carelessness. The performance, too, if it matters in your application. Seems like you could also use a stack, and keep null at the bottom, then just find it when you want to read from the bottom, if you are not wanting to store another pointer/var and will not be seeking the bottom often. — dwn
– dwn, Commented Dec 31, 2014 at 0:45
This will be deadly slow. Common vector implementations in C do a couple of things: first, keep the current allocation size, current occupied size, a pointer to the memory block, and other data in a struct. Also, they initially allocate some small amount of space like for 8 or ten items, then realloc more when needed rather than every time. A common strategy is to grow 1.5X when needed, that way performance on inserts scales well. — Lee Daniel Crocker
– Lee Daniel Crocker, Commented Dec 31, 2014 at 1:24
@Hr.Rabe "Basically I do only need the space to store a integer 0 at the end of the stream, right? No matter the actual datatype." - no, that's wrong. In C, arrays are homogenous. If you are 0-terminating an array of Ts, then you need to terminate it with (T)0. "Can you elaborate a bit more on the error phrone bit? What are the errors I could expect?" - for one, you may forget the 0-terminator. (People forget to NUL-terminate their strings all over the place, even experts.) "Why is it so ugly?" - because it breaks symmetry, and you lose part of the domain… — The Paramagnetic Croissant
– The Paramagnetic Croissant, Commented Dec 31, 2014 at 2:21

hugomg · Accepted Answer · 2014-12-31 01:33:33Z

2

Your array_length is O(N). This will lead to quadratic runtimes if you add lots of items in an array, which can be quite slow.

What you can so instead is use a struct to store the size of the array

struct dyn_array {
    size_t array_size; // how many elements fit in the array?
    size_t nelems;     // how much array space are we using now? 
    int *elems; //or void*, you know the drill...
}

One of the benefits of this approach is that you don't need to use the NULL terminator anymore. It also lets you use arrays of ints (or whatever else) instead of just arrays of pointers.

edited Dec 31, 2014 at 1:33

answered Dec 31, 2014 at 0:41

hugomg

70.4k30 gold badges168 silver badges257 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user3386109 Over a year ago

This approach could be expanded to keep track of the allocated size versus the currently in use size, so that it's not necessary to call realloc every single time an item is added to the array.

Hr. Rabe Over a year ago

How would the start value of dyn_array.elems look like then? As far as I understood, I can only use realloc on pointers that do actually occupy stack space (read: Have been created via malloc at some point) anyway? Otherwise at least Clang gives me a nice error of: pointer being realloc'd was not allocated... So: How would you initialize such a structed array?

Lee Daniel Crocker Over a year ago

The usual strategy is to keep allocated size and filled size, then whenever you add an item when full, realloc to 1.5X allocated size. This keeps insert performance in the N*log(N) range as you scale.

hugomg Over a year ago

@Hr.Rabe: Use malloc in array_new and realloc in array_add, just like you re doing now. @Lee: That strategy is actually O(N) amortized, so its better than you say. You ca use different multipliers if you want though (time/space tradeoff)

Hr. Rabe Over a year ago

If I get you right, that implies, that there is basically no meaningful way in C, to start an empty array of with actually no elements in memory. I would (If I understood correctly) always need at least a NULL element, only with the struct approach I would be able to fill that on the first call to àrray_add. Correct?

|

Hans Klünder · Accepted Answer · 2014-12-31 01:06:19Z

0

This is what the Croissant probably meant:

#define NULL 0 // in some include file.

This is unusual, but possible. Now assume 0 is a 4-byte int and void**array is an 8-byte pointer:

void* array_new(){
    void** array = malloc(sizeof(NULL)); // this works
    array[0] = NULL;                     // this write 8 bytes
    return array;
}

array[0]=NULLinterprets 0 as an 8-byte pointer and writes it into the allocated 4 bytes and beyond.

edited Dec 31, 2014 at 1:06

answered Dec 31, 2014 at 0:41

Hans Klünder

2,30214 silver badges9 bronze badges

7 Comments

Hr. Rabe Over a year ago

So what you are saying is, that sizeof(NULL) can actually give a size different to what NULL has been defined to mean? Can this really happen? oO

Hans Klünder Over a year ago

Maybe NULL has no meaning. Stroustrup strongly advised not to use anything that looks like "NULL" and is defined in an include file. He says if you need a 0, write 0, and if you need a 0 pointer, write 0. If he would be forced to define NULL, it would be define NULL 0 which, for me, is quite reasonable. And 0 can always be interpreted as a pointer, which may change the size. Thus in the instance you assign NULL to a variable of pointer type, it changes it's type and size.

Rufflewind Over a year ago

@Hans_Klünder: it's not really appropriate to apply Stroustrup's reasoning here because we're talking about C, not C++ and NULL is subtly different between the two languages.

ryanpattison Over a year ago

@HansKlünder That's a bit misleading. Stroustrup advised against NULL in favour of 0 in C++, and C++ 11 now has nullptr. Besides Kernighan and Ricthie used NULL.

Hans Klünder Over a year ago

Yes, K&R use NULL, and it's a pointer. stroustrup prefers 0 for C++. I think it's an unnecessary risk to make the integrity of the memory depend on sizeof(NULL) == sizeof(void*), while it's much safer to write void** array = malloc(sizeof(*array)) - this is what the Croissant suggested to do, and I agree. It's a good habit which makes sure that the memory returned by malloc is large enough to store one item of the type being used in that line.

|

Collectives™ on Stack Overflow

C - Dynamic array handling advice

Better Version (Update!)

2 Answers 2

6 Comments

7 Comments

Hot Network Questions

Collectives™ on Stack Overflow

Better Version (Update!)

2 Answers 2

6 Comments

7 Comments

Related