Revisions to Serialize shell variable in bash or zsh

replaced http://unix.stackexchange.com/ with https://unix.stackexchange.com/

Source Link

edited Apr 13, 2017 at 12:36

1

Note that the funky sed expression is to only match the first occurrence of either 'typeset' or 'declare' and add -g as a first argument. It is necessary to only match the first occurrence because, as Stéphane Chazelas Stéphane Chazelas rightly pointed out in comments, otherwise it will also match cases where the serialized string contains literal newlines followed by the word declare or typeset.

In addition to correcting my initial parsing faux pas, Stéphane also suggested suggested a less brittle way to hack this that not only side steps the issues with parsing the strings but could be a useful hook to add additional functionality by using a wrapper function to redefine the actions taken when sourcing the data back in. This assumes you are not playing any other games with the declare or typeset commands, but this technique would be easier to implement in a situation where you were including this functionality as part of another function of your own or you were not in control of the data being written and whether or not it had the -g flag added. Something similar could also be done with aliases, see Gilles's answer Gilles's answer for an implementation.

Note that the funky sed expression is to only match the first occurrence of either 'typeset' or 'declare' and add -g as a first argument. It is necessary to only match the first occurrence because, as Stéphane Chazelas rightly pointed out in comments, otherwise it will also match cases where the serialized string contains literal newlines followed by the word declare or typeset.

In addition to correcting my initial parsing faux pas, Stéphane also suggested a less brittle way to hack this that not only side steps the issues with parsing the strings but could be a useful hook to add additional functionality by using a wrapper function to redefine the actions taken when sourcing the data back in. This assumes you are not playing any other games with the declare or typeset commands, but this technique would be easier to implement in a situation where you were including this functionality as part of another function of your own or you were not in control of the data being written and whether or not it had the -g flag added. Something similar could also be done with aliases, see Gilles's answer for an implementation.

Note that the funky sed expression is to only match the first occurrence of either 'typeset' or 'declare' and add -g as a first argument. It is necessary to only match the first occurrence because, as Stéphane Chazelas rightly pointed out in comments, otherwise it will also match cases where the serialized string contains literal newlines followed by the word declare or typeset.

In addition to correcting my initial parsing faux pas, Stéphane also suggested a less brittle way to hack this that not only side steps the issues with parsing the strings but could be a useful hook to add additional functionality by using a wrapper function to redefine the actions taken when sourcing the data back in. This assumes you are not playing any other games with the declare or typeset commands, but this technique would be easier to implement in a situation where you were including this functionality as part of another function of your own or you were not in control of the data being written and whether or not it had the -g flag added. Something similar could also be done with aliases, see Gilles's answer for an implementation.

rewrote to remove broken solutions instead of just warn about them and explain the whole thing more lucidly

Source Link

edited Jun 16, 2014 at 13:03

Caleb

72k
19
203
232

Warning: With any of these solutions, you need to be aware that you are trusting the integrity of the data files to be safe as they will get executed as shell code in your script. Securing them is paramount to your script's security!

Simple inline implementation for serializing one or more variables

Yes, in both bash and zsh you can serialize the contents of a variable in a way that is easy to retrieve using the declaretypeset builtin and the -p argument. The output format is such that you can simply source the output to get your stuff back.

 # You have $VARvariable(s) $FOO and $BAR already with your stuff
 declaretypeset -p VARFOO BAR > serialized_VAR./serialized_data.sh

EitherYou can get your stuff back like this either later in your script or in another script altogether, you can get your stuff back like this:

# Load up the serialized data back into $VARthe current shell
source serialized_VARserialized_data.sh

How portable this is beyondThis will work for bash, zsh and ksh including passing data between different shells. Bash will translate this to its builtin declare function while zsh I don't knowimplements this with typeset but as bash has an alias for this to work either way for we use typeset here for ksh compatibility. It works in sh

More complex generalized implementation using functions

The above implementation is really simple, but if you call it frequently you might want to give yourself a utility function to make it easier. Additionally if you ever try to include the above inside custom functions you will run into issues with variable scoping. This version should eliminate those issues.

Note for meall of these, in testing but I don't know whatorder to maintain bash/zsh cross-compatibility we will be fixing both the POSIX standard is oncases of typeset and declare so the code should work in either or both shells. This adds some bulk and mess that could be eliminated if you were only doing this for one shell or another.

You could generalizeThe main problem with using functions for this (or including the code in other functions) is that the typeset function generates code that, when sourced back into a little bit and make pairscript from inside a function, defaults to creating a local variable rather than a global one.

This can be fixed with one of functionsseveral hacks. My initial attempt to save stuff to a folderfix this was parse the output of serialized data and read itthe serialize process through sed to add the -g flag so the created code defines a global variable when sourced back on demandin. Obviously you would need to be aware that the stuff in the serialized directory will get executed as shell code so securing it is paramount to your script's security!

# Did you see the warning? Don't do this if you can't trust the files
# ./serialized_* as they will be executed as part of your script!

serialize() {
    declaretypeset -p "$1" | sed -E 's'0,/^(typeset|declare)/&{s/ / -g /}' > "./serialized_$1.sh"
}
 
deserialize() {
    source "./serialized_$1.sh"
}

Note that the dirty hack with sedfunky sed expression is to addonly match the first occurrence of either 'typeset' or 'declare' and add -g flag to the declare statement. Without this, declare would createas a local variable when run again from inside the other functionfirst argument. Matching 'typeset' as well as 'declare'It is for portability between bash and zsh; you could clean that up if you werenecessary to only running this with one ormatch the other.

first occurrence because, as Stéphane Chazelas suggestedrightly pointed out in comments, otherwise it will also match cases where the serialized string contains literal newlines followed by the word declare or typeset.

In addition to correcting my initial parsing faux pas, Stéphane also suggested a cleanerless brittle way to hack this in comments that avoidsnot only side steps the issues with parsing the strings but could be a potential problem case ifuseful hook to add additional functionality by using a wrapper function to redefine the variableactions taken when sourcing the data you are serializing contains literal newlinesback in. This assumes you are not playing any other games with the declare or typeset commands. Implementing, but this technique would lookbe easier to implement in a situation where you were including this functionality as part of another function of your own or you were not in control of the data being written and whether or not it had the -g flag added. Something similar could also be done with aliases, see Gilles's answer for an implementation.

To make the result even more useful, we can iterate over multiple variables passed to our functions by assuming that each word in the argument array is a variable name. The result becomes something like this:

serialize() {
    declarefor var in $@; do
        typeset -p "$1""$var" > "./serialized_$1serialized_$var.shsh"
    done
}

deserialize() {
    declare() { builtin declare -g "$@"; }
    typeset() { builtin typeset -g "$@"; }
    for var in $@; do
        source "./serialized_$1serialized_$var.sh"
    done
    unset -f declare typeset
}

# Load some test data into variables
FOO=(an array or something)
BAR=$(uptime)

serialize# FOO
serializeSave BAR

unsetit FOOout BAR
#to <snip>our later....
serialized data files
deserializeserialize FOO
deserialize BAR

echo# "FOO:For $FOO\nBAR:testing $BAR"

If keeping the values in a single file is a must, you could do something like so:

touchpurposes ./serializedvars.sh

serialize()unset {
the variables to we sedknow -iEif "/^($1=|(typeset|declare)[^=]*$1)/d"it ./serializedvars.shworked
 unset FOO BAR

# declareLoad -p "$1"the |data sedback -Ein 's/^(typeset|declare)/&from -g/'out >>data ./serializedvars.sh
}
files
deserialize() {
     sourceFOO <(sedBAR

echo -nE"FOO: "/^($1=|(typeset|declare)[^=]*$1)/p"$FOO\nBAR: ./serializedvars.sh)
}$BAR"

Warning: That implementation will break rather badly for variables that contain literal newlines. In order to get around that problem, the easiest thing to do is to use per-variable files for the serialized data per the example above. The other way would be to always append the data without bothering to delete any previous instances of a variable name from the data file. Only the last instance will "stick" anyway, but this could easily get to be unwieldy as your data file grows.

Again, some of the complexity of the expressions here are to accommodate both bash and zsh output formats from declare -p. One or the other would be notably simpler.

Yes, in both bash and zsh you can serialize the contents of a variable in a way that is easy to retrieve using the declare builtin and the -p argument. The output format is such that you can simply source the output to get your stuff back.

 # You have $VAR already with your stuff
 declare -p VAR > serialized_VAR.sh

Either later in your script or in another script altogether, you can get your stuff back like this:

# Load up the serialized data back into $VAR 
source serialized_VAR.sh

How portable this is beyond bash and zsh I don't know. It works in sh for me in testing but I don't know what the POSIX standard is on this.

You could generalize this a little bit and make pair of functions to save stuff to a folder of serialized data and read it back on demand. Obviously you would need to be aware that the stuff in the serialized directory will get executed as shell code so securing it is paramount to your script's security!

# Did you see the warning? Don't do this if you can't trust the files
# ./serialized_* as they will be executed as part of your script!

serialize() {
    declare -p "$1" | sed -E 's/^(typeset|declare)/& -g/' > "./serialized_$1.sh"
}
 
deserialize() {
    source "./serialized_$1.sh"
}

Note the dirty hack with sed is to add the -g flag to the declare statement. Without this, declare would create a local variable when run again from inside the other function. Matching 'typeset' as well as 'declare' is for portability between bash and zsh; you could clean that up if you were only running this with one or the other.

Stéphane Chazelas suggested a cleaner way to hack this in comments that avoids a potential problem case if the variable data you are serializing contains literal newlines. This assumes you are not playing any other games with the declare or typeset commands. Implementing this would look something like this:

serialize() {
    declare -p "$1" > ./serialized_$1.sh
}

deserialize() {
    declare() { builtin declare -g "$@"; }
    typeset() { builtin typeset -g "$@"; }
    source "./serialized_$1.sh"
    unset -f declare typeset
}

FOO=(an array or something)
BAR=$(uptime)

serialize FOO
serialize BAR

unset FOO BAR
# <snip> later....

deserialize FOO
deserialize BAR

echo "FOO: $FOO\nBAR: $BAR"

If keeping the values in a single file is a must, you could do something like so:

touch ./serializedvars.sh

serialize() {
    sed -iE "/^($1=|(typeset|declare)[^=]*$1)/d" ./serializedvars.sh
    declare -p "$1" | sed -E 's/^(typeset|declare)/& -g/' >> ./serializedvars.sh
}

deserialize() {
     source <(sed -nE "/^($1=|(typeset|declare)[^=]*$1)/p" ./serializedvars.sh)
}

Warning: That implementation will break rather badly for variables that contain literal newlines. In order to get around that problem, the easiest thing to do is to use per-variable files for the serialized data per the example above. The other way would be to always append the data without bothering to delete any previous instances of a variable name from the data file. Only the last instance will "stick" anyway, but this could easily get to be unwieldy as your data file grows.

Again, some of the complexity of the expressions here are to accommodate both bash and zsh output formats from declare -p. One or the other would be notably simpler.

Warning: With any of these solutions, you need to be aware that you are trusting the integrity of the data files to be safe as they will get executed as shell code in your script. Securing them is paramount to your script's security!

Simple inline implementation for serializing one or more variables

Yes, in both bash and zsh you can serialize the contents of a variable in a way that is easy to retrieve using the typeset builtin and the -p argument. The output format is such that you can simply source the output to get your stuff back.

 # You have variable(s) $FOO and $BAR already with your stuff
 typeset -p FOO BAR > ./serialized_data.sh

You can get your stuff back like this either later in your script or in another script altogether:

# Load up the serialized data back into the current shell
source serialized_data.sh

This will work for bash, zsh and ksh including passing data between different shells. Bash will translate this to its builtin declare function while zsh implements this with typeset but as bash has an alias for this to work either way for we use typeset here for ksh compatibility.

More complex generalized implementation using functions

The above implementation is really simple, but if you call it frequently you might want to give yourself a utility function to make it easier. Additionally if you ever try to include the above inside custom functions you will run into issues with variable scoping. This version should eliminate those issues.

Note for all of these, in order to maintain bash/zsh cross-compatibility we will be fixing both the cases of typeset and declare so the code should work in either or both shells. This adds some bulk and mess that could be eliminated if you were only doing this for one shell or another.

The main problem with using functions for this (or including the code in other functions) is that the typeset function generates code that, when sourced back into a script from inside a function, defaults to creating a local variable rather than a global one.

This can be fixed with one of several hacks. My initial attempt to to fix this was parse the output of the serialize process through sed to add the -g flag so the created code defines a global variable when sourced back in.

serialize() {
    typeset -p "$1" | sed -E '0,/^(typeset|declare)/{s/ / -g /}' > "./serialized_$1.sh"
}
deserialize() {
    source "./serialized_$1.sh"
}

Note that the funky sed expression is to only match the first occurrence of either 'typeset' or 'declare' and add -g as a first argument. It is necessary to only match the first occurrence because, as Stéphane Chazelas rightly pointed out in comments, otherwise it will also match cases where the serialized string contains literal newlines followed by the word declare or typeset.

In addition to correcting my initial parsing faux pas, Stéphane also suggested a less brittle way to hack this that not only side steps the issues with parsing the strings but could be a useful hook to add additional functionality by using a wrapper function to redefine the actions taken when sourcing the data back in. This assumes you are not playing any other games with the declare or typeset commands, but this technique would be easier to implement in a situation where you were including this functionality as part of another function of your own or you were not in control of the data being written and whether or not it had the -g flag added. Something similar could also be done with aliases, see Gilles's answer for an implementation.

To make the result even more useful, we can iterate over multiple variables passed to our functions by assuming that each word in the argument array is a variable name. The result becomes something like this:

serialize() {
    for var in $@; do
        typeset -p "$var" > "./serialized_$var.sh"
    done
}

deserialize() {
    declare() { builtin declare -g "$@"; }
    typeset() { builtin typeset -g "$@"; }
    for var in $@; do
        source "./serialized_$var.sh"
    done
    unset -f declare typeset
}

# Load some test data into variables
FOO=(an array or something)
BAR=$(uptime)

# Save it out to our serialized data files
serialize FOO BAR

# For testing purposes unset the variables to we know if it worked
unset FOO BAR

# Load  the data back in from out data files
deserialize FOO BAR

echo "FOO: $FOO\nBAR: $BAR"

implement alternative hack that doens't break on newlines, per Stéphane Chazelas's suggestion again

Source Link

edited Jun 16, 2014 at 11:24

Caleb

72k
19
203
232

# Did you see the warning? Don't do this if you can't trust the files
# ./serialized_* as they will be executed as part of your script!

serialize() {
    declare -p "$1" | sed 's/^typeset/& -g/;sE 's/^declare^(typeset|declare)/& -g/' > "./serialized_$1.sh"
}

deserialize() {
    source "./serialized_$1.sh"
}

touch ./serializedvars.sh

serialize() {
    sed -iiE "/^$1/d;/^typeset[^=]*$1/d;/^declare[^=]*$1^($1=|(typeset|declare)[^=]*$1)/d" ./serializedvars.sh
    declare -p "$1" | sed 's/^typeset/& -g/;sE 's/declare^(typeset|declare)/& -g/' >> ./serializedvars.sh
}

deserialize() {
     source <(sed -nnE "/^$1=/p;/^typeset[^=]*$1/p;/^declare[^=]*$1^($1=|(typeset|declare)[^=]*$1)/p" ./serializedvars.sh)
}

# Did you see the warning? Don't do this if you can't trust the files
# ./serialized_* as they will be executed as part of your script!

serialize() {
    declare -p "$1" | sed 's/^typeset/& -g/;s/^declare/& -g/' > "./serialized_$1.sh"
}

deserialize() {
    source "./serialized_$1.sh"
}

touch ./serializedvars.sh

serialize() {
    sed -i "/^$1/d;/^typeset[^=]*$1/d;/^declare[^=]*$1/d" ./serializedvars.sh
    declare -p "$1" | sed 's/^typeset/& -g/;s/declare/& -g/' >> ./serializedvars.sh
}

deserialize() {
     source <(sed -n "/^$1=/p;/^typeset[^=]*$1/p;/^declare[^=]*$1/p" ./serializedvars.sh)
}

# Did you see the warning? Don't do this if you can't trust the files
# ./serialized_* as they will be executed as part of your script!

serialize() {
    declare -p "$1" | sed -E 's/^(typeset|declare)/& -g/' > "./serialized_$1.sh"
}

deserialize() {
    source "./serialized_$1.sh"
}

touch ./serializedvars.sh

serialize() {
    sed -iE "/^($1=|(typeset|declare)[^=]*$1)/d" ./serializedvars.sh
    declare -p "$1" | sed -E 's/^(typeset|declare)/& -g/' >> ./serializedvars.sh
}

deserialize() {
     source <(sed -nE "/^($1=|(typeset|declare)[^=]*$1)/p" ./serializedvars.sh)
}

implement alternative hack that doens't break on newlines, per Stéphane Chazelas's suggestion again

Source Link

edited Jun 16, 2014 at 11:19

Caleb

72k
19
203
232

Loading

removed invocations of the split+glob operator which doesn't make sense here. Moved GNU's -i option before arguments so it still works with POSIXLY_CORRECT on.

Source Link

edited Jun 16, 2014 at 11:10

Stéphane Chazelas

584.9k
96
1.1k
1.7k

Loading

use more portable sed syntax, thanks to tip from Stéphane Chazelas

Source Link

edited Jun 16, 2014 at 11:00

Caleb

72k
19
203
232

Loading

switch to normal sed regex syntax for seds that don't support -E

Source Link

edited Jun 16, 2014 at 10:48

Caleb

72k
19
203
232

Loading

added 20 characters in body

Source Link

edited Jun 14, 2014 at 16:32

Caleb

72k
19
203
232

Loading

added 20 characters in body

Source Link

edited Jun 14, 2014 at 16:18

Caleb

72k
19
203
232

Loading

added 1159 characters in body

Source Link

edited Jun 14, 2014 at 15:53

Caleb

72k
19
203
232

Loading

Source Link

answered Jun 14, 2014 at 15:19

Caleb

72k
19
203
232

Loading

Stack Exchange Network

Return to Answer

Simple inline implementation for serializing one or more variables

More complex generalized implementation using functions

Simple inline implementation for serializing one or more variables

More complex generalized implementation using functions