7

I have some code in which an array of strings is defined in a conditional statement. The array is out of scope where I need it. So I defined another pointer in the outer scope. In the conditional statement I assign the array address to the outer scope pointer. The code compiles without warnings for dangling pointers and such, but the code is not correct.

I produced a (correctly compiling) MRE below. The function does not make any sense. Essential is that:

  • The array of strings cmd_str is defined inside a condition
  • The number of strings is variable
  • The length of the strings is variable
  • The strings are either constants or null terminated strings
  • The last string only contains NULL
  • I use the resulting strings array outside the condition
int noCmd(int *opt) {

    int             local_opt;
    const char**    cmd = NULL;

        local_opt = *opt; // Make sure the compiler cannot know this value to avoid optimization.
        if (local_opt == 1) {
            const char* cmd_str[] = {"foo", "bar", NULL};
            cmd = cmd_str;
        } else {
            const char* cmd_str[] = {"flop", "blah", NULL};
            cmd = cmd_str;
        }
    // Dummy statement to force referencing cmd
    // actually I need the contents of cmd here
    return sizeof(cmd);
}

The main problem is that cmd doesn’t have any value in the return statement. This is confirmed by the assembly listing where cmd does not get a value assigned:

    Dump of assembler code for function noCmd:
    0x00007ffff7fb9e80 <+0>:     mov    $0x8,%eax
    0x00007ffff7fb9e85 <+5>:     ret
    End of assembler dump.

My reasoning is that cmd_str goes out of scope. Since it is not a valid pointer anymore, it is not allowed to be used. And the compiler optimizes all code out, because the nothing is used inside the scope and must not be used outside the scope.

The actual code is obviously much more complicated that this MRE. But the problem (conditional char** assignment) and the need for the strings outside the scope is the same.

How do I solve this?

I cannot do a memcpy from cmd_str into cmd, because I don't know the size. I might loop through each string in cmd_str, find the size of the string and then copy each string into cmd one by one, but that is very inelegant.

I cannot declare cmd_str in the outer scope, because

const char* cmd_str[]

is invalid.

Neither can I declare

const char** cmd_str

because it would be impossible to assign a number of strings to cmd_str[].

Isn't there a better way to define an array of strings inside a condition and use it outside the condition? The string to be assigned are either constants or null terminated strings. The last string gets the NULL value.

6
  • 7
    cmd doesn't have a value in the return statement. The function returns the size of a pointer, regardless of what you've been doing with it. Commented Oct 21 at 16:57
  • 5
    Don't try to understand what the compiler did and why when you're invoking UB by doing something you shouldn't. The compiler could produce almost any garbage. One way would be to define a double pointer variable in the calling function, pass a pointer to it to the function, and them update the value there, using the heap (malloc) for your string pointer array and the strings themselves. It's a bit ugly as you'jj have a triple pointer as an input argument, but I think that's one acceptable way to do it. Remember to then free the strings after use. Commented Oct 21 at 17:20
  • 6
    The optimizer never got as far as "Since it is not a valid pointer anymore it is not allowed to be used." sizeof is a compile-time constant unless you're playing games with VLAs. sizeof cmd is simply sizeof (const char**) and does not odr-use cmd. Poor choice of dummy statement. Commented Oct 21 at 20:34
  • 2
    I think the answer by John Bolinger is best, the only way to access a value outside its declared scope is to make it static - global existence, in scope access. But what is the reason for limiting its scope? Because copying the array contents would also get you the data (malloc or to an externally declared empty struct passed in), is there a reason that wouldn't work? Commented Oct 21 at 21:27
  • 1
    I think OP is confused by what sizeof actually does. It does not return the size of the array cmd_str points to (or might at some point have pointed to). It just returns the size of the variable cmd_str in bytes, but that variable is a pointer and not an array, and the compiler can compute this size without ever dereferencing the pointer. Commented yesterday

6 Answers 6

17

I have some code in which an array of strings is defined in a conditional statement. The array is out of scope where I need it. So I defined another pointer in the outer scope. In the conditional statement I assign the array address to the outer scope pointer. The code compiles without warnings for dangling pointers and such, but the code is not correct.

Code such as you describe is not incorrect, but it becomes incorrect (in the sense of having undefined behavior) as soon as you dereference that outer pointer, or use its value in any other way before assigning a new one. This is because the lifetime of the array declared inside the conditional ends as soon as execution of the innermost containing block does, where "block" in this case means any containing statement, not just one delimited by braces. In your case, it is whichever clause of the conditional is executed.

Essential is that:

  • The array of strings (cmd_str) is defined inside a condition

[...]

  • I use the resulting strings array outside the condition

If that's truly essential then you are out of luck. If you define an array anywhere inside a conditional statement, then its lifetime is over by the time execution of the conditional terminates. Outside the conditional, there is no longer any array to access. Assigning a pointer to point to it while it is still alive doesn't change that.

How do I solve this.

One way would be to dynamically allocate memory inside the conditional, once you know how much you need. You then assign or copy whatever value you need into the dynamically allocated space, and set the pointer to point to it. The dynamically allocated object lives until it is explicitly deallocated (so don't forget to deallocate it when you no longer need it).

Example:

    const char **cmd;

    if (local_opt == 1) {
        const char* cmd_str[] = {"foo", "bar", NULL};
        cmd = malloc(sizeof cmd_str);
        // ... abort if null ...

        memcpy(cmd, cmd_str, sizeof cmd_str);
    } else {
        // ...
    }

There are other ways to work the details.

Note well that the usage of string literals in the above is not problematic, because these represent objects having static storage duration. Their lifetime is the entire execution of the program. It's unclear what you mean, however, when you say the array elements may include (pointers to) null-terminated strings. The string literals satisfy that description, but if you mean anything else by that then there are lifetime considerations surrounding those, too.

If you don't need to be able to modify the array once produced, then another alternative that would be suitable at least for the example given would be to declare the inner arrays static, so that their lifetimes are not bounded by the execution of the conditional:

    const char * const *cmd;

    if (local_opt == 1) {
        static const char * const cmd_str[] = {"foo", "bar", NULL};
        cmd = cmd_str;
    } else {
        // ...
    }

The additional const qualification is not essential, but it helps protect against the array being modified. This is important because any changes will persist indefinitely, through the end of the program's execution, across any number of calls to the function containing that code.

Sign up to request clarification or add additional context in comments.

9 Comments

With (pointers) to null terminated strings I mean that it is either "foo" or a variable containing foo\NULL. Not more complicated than that. Maybe I did not express that properly. I might try static as you suggest and that is elegant.
It would be to your advantage to edit the question to provide an example of the second of those, because I'm still not sure I'm grasping all the details. The lifetime of the array of pointers is one thing, but the lifetimes of the things its elements point to is another, separate one that cannot safely be ignored.
This code is very invalid. So you think that allocating 3 bytes of memory will be sufficient? const char* cmd_str[] = {"foo", "bar", NULL}; size_t array_size = (sizeof cmd_str) / (sizeof *cmd_str);
Thanks for pointing out that error. I think. Fixed.
I accepted this answer because it provides two solutions. Declaring static seems to be the solution. And it helped with finding the size for memcpy. Using the additional const is likely not correct as cmd_str[] has to be composed differently during each function call.
If you are modifying a static variable then the function will no longer be re-entrant. I don't know if that is a concern or not in your application.
If the extra const is not workable for you then addressing your issue by declaring the arrays static is at best a bad idea. If you don't want to deal with dynamic memory allocation and you need the arrays to be modifiable then Ben Voigt's suggestion is probably your best bet.
I was about to comment that changes to static objects do not persist between invocations of a program (UB notwithstanding), and then I realised: s/through the end/through to the end/ or s/through the end of/throughout/ ("through X" includes both sides of X, so "through the end" includes after the end and thus implies that it persists between invocations).
"Through" is a word with many definitions. The appropriate ones for use in conjunction with lifetime are those applying to periods of time. In particular, this is Merriam-Webster definition 4c: "to and including".
12

By far the easiest solution is to simply move the initialized definitions above the conditional.

    const char* cmd_str_opt1[] = {"foo", "bar", nullptr};
    const char* cmd_str_notopt[] = {"flop", "blah", nullptr};
    cmd = (local_opt == 1)? cmd_str_opt1: cmd_str_notopt;
    cmd_size = (local_opt == 1)? (sizeof cmd_str_opt1): (sizeof cmd_str_notopt);

It wastes memory for handful of extra pointers, and only until the end of the next enclosing scope (such as your function). The memory for the actual string contents (literals) is used for the entire duration of the program anyway.

If some of the strings can only be produced inside the conditional, you can overwrite elements selectively.

5 Comments

I agree that this is easy and should work, and that at least for the example, the extra memory use is insignificant (+1). It does not satisfy the OP's criteria, however, which explicitly call for the arrays to be defined inside the conditional. That requirement is unrealistic, to be sure, but this answer would be improved by addressing that matter directly.
As you point out in your answer, that particular requirement is untenable. I did address the possibility that some strings aren't known until the condition is checked.
Maybe I should add something here. The decision of HOW the array of strings are to be composed has to be inside the conditional. However, to DEFINE the string literals outside the conditional would be acceptable. The current code contains the definition of the array strings inside the conditional and I tried to find a solution for using those outside the conditional. During this whole discussion it dawned to me that this might not be the best approach to keep using in C. I inherited the code and expanded it, not realizing blindly keeping the same track might not be the best.
You will also have to define the array itself outside the conditional... however you can decide on its contents (using assignment to array elements) inside the conditional.
And if those NULLs you are putting in the arrays serve as sentinels, it's probably ok to define an array that is longer than you (@JohannesLinkels) actually need. That could be a way around defining multiple arrays.
7

You can dynamically allocate space for cmd and then use memcpy to copy the contents of cmd_str.

const char *cmd_str[] = {"foo", "bar", NULL};
cmd = malloc(sizeof cmd_str);
memcpy(cmd, cmd_str, sizeof cmd_str);

Just make sure to free(cmd) when you're done.

9 Comments

I considered that. But isn't sizeof(cmd_str) the size of the pointer?
No, because cmd_str is an array, and this one of the exceptions to array-to-pointer conversion.
sizeof for dynamic allocated space, not works same as for sizeof for static allocated space. What i'm saying is your sizeof will return just size of datatype (char*), instead of actual size you are looking for
cmd_str is an array, so sizeof cmd_str is the size of the array in bytes, which in this case is 3 * sizeof(char *).
@dbush yes, 3 * sizeof(char *), not just 3, you have error.
No, it's not an error. sizeof cmd_str and 3 * sizeof(char *) are the same value because cmd_str is an array.
@dbush I'M not saying sizeof cmd_str is not array, i saying that you passing 3 * sizeof(char *) to the memcpy function instead of just 3.
And 3 * sizeof(char *) is exactly what should be passed.
The third parameter to memcpy is the number of bytes, not the number of array elements. It doesn't know what kind of memory is being pointed to.
6

As written, the code in the question is neither incorrect nor does it have undefined behavior if opt is a valid pointer, yet it would invoke undefined behavior if the value of cmd (as opposed to just its size) was used after leaving the scope where the array it points to is defined.

Since this is the intent and given the additional constraints for the arrays, the easy solution to this problem is to make the arrays static, preferably static and const.

Here is a modified example:

int noCmd(int *opt) {
    const char * const *cmd = NULL;

    switch (*opt) {
      case 0: {
        static const char * const cmd_str[] = { "foo", "bar", NULL };
        cmd = cmd_str;
        break;
      }
#ifdef FOOBAR_ID
      case FOOBAR_ID: {
        static const char * const cmd_str[] = { "foobar", NULL };
        cmd = cmd_str;
        break;
      }
#endif
      // [...] this can be extended at will.
      // cmd_str stays local to the #ifdef block for readability
    }

    // example use: return the number of strings in the array
    int i = -1;
    if (cmd) {
        for (i = 0; cmd[i]; i++)
            continue;
    }
    return i;
}

3 Comments

Yes, I will convert them to static. Not const as the cmd_str has to be composed differently during each subsequent call of the function.
Be careful, if they're static then the initialization will be skipped during subsequent calls (you can still use assignment to make changes if it's not const)
This strategy is only useful with static constant arrays (in the .rodata section of your program), and you pick which one cmd points to. If you eventually need a modified copy, and the number of elements is small, it makes more sense to look at other answers which assign pointers to elements of a different array. Point is, you can think in terms of what read-only constants your program should have just sitting in static storage (the string literals themselves for sure, and maybe arrays of pointers to copy), and what should get constructed by assignments to non-const arrays
2

Assuming *opt is restricted to a continguous (or at least a dense subset of an) interval of integers, you really have a ragged array of string literals, which you can index by *opt. The below assumes the interval [1,n], where n as an integer than fits in an int . Note that another interval can be sent to this one with simple arithmetic.

int noCmd(int *opt) {

    const char **cmd = NULL,
               **cmd_strs[] = {
                   {"flop", "blah", NULL},  /* cmd_str[] for the else clause */
                   {"foo", "bar", NULL},
                       /* ... more cmd_str[]s */
                   NULL,   /* for rows which can't be a value of *opt */
                       /* ... more cmd_str[]s */
               };

    /*  Compute the maximum index from the table, because a separate constant 
     *  will always be out of sync with the list of cmd_strs.
     */
    maxOpt = sizeof(cmd_strs) / sizeof(cmd_strs[0]);

    if (1 <= *opt <= maxOpt) {
        cmd = cmd_strs[*opt];
    } else {
        cmd = cmd_strs[0];
    }
    /*  more readable for some readers:
     *      cmd = cmd_strs[(1 <= *opt <= maxOpt) ? *opt : 0];
     */
}

Comments

2

Another possibility is to exploit the fact that when arrays are struct members, they are naturally "value-copied" like all other members of structs. That makes (together with compound literals) the following compact code possible:

struct ArrayHolder { const char *mArr[3]; };

struct ArrayHolder noCmd(int opt) 
{
  return opt == 1 ? (struct ArrayHolder) { { "foo",  "bar",  NULL } }
                  : (struct ArrayHolder) { { "flop", "blah", NULL } };
  }
}

The struct containing the array is returned by value which seems extravagant until you remember that memcpy is also a hidden, crude "return by value".

Note that as with all other solutions which do not copy the strings but only their addresses, this only works for strings with static life time (which string literals are).

If the strings are are non-static but have a maximum length you can define an array with actual char arrays as members; this will make the data mutable, and copying that will be a deep copy. Interestingly, the initialization looks almost the same, even though the strings are copied, except that NULL does not work:

#define MAX_STR_LEN 20

struct MatrixHolder
{
  char mArr[3][MAX_STR_LEN+1];
};

struct MatrixHolder noCmdMatrix(int opt)
{
  // imagine the data source to be something else than static strings...
  return opt == 1 ? (struct MatrixHolder) { { "mfoo",  "mbar",                 "" } }
                  : (struct MatrixHolder) { { "mflop", "12345678901234567890", "" } };
}

Of course, returning an entire matrix by value is even more extravagant; depending on the matrix size and frequency of calls it may well be prohibitive. But in an initialization routine that's just called once it would hardly matter and makes the code very clean. (Of course everyone is yearning for C++'s movable containers here.)

Usage of both could look like this (note how the caller is free to write into the character matrix at the end of main()):


int main()
{
  {
    struct ArrayHolder ah = noCmd(0);
    printf("%s %s\n", ah.mArr[0], ah.mArr[1]);
    ah = noCmd(1);
    printf("%s %s\n", ah.mArr[0], ah.mArr[1]);
  }
  
  {
    struct MatrixHolder mh = noCmdMatrix(0);
    printf("%s %s\n", mh.mArr[0], mh.mArr[1]);
    mh = noCmdMatrix(1);
    printf("%s %s\n", mh.mArr[0], mh.mArr[1]);
    mh.mArr[0][0] = 'X';
    printf("%s\n", mh.mArr[0]);
  }
}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.