17

I need to create a multidimensional array with bash and I read that there is no such thing as multidimensional arrays in bash .

Here are my possible data, what they look like and what I need. This is not a bash array:

DATACOL = [
    "1"=>("Santiago","Barcelona","AWG","6792992","Male"),
    "2"=>("Santi","Texas","BGG","6792992","Male"),
    "3"=>("Tiago","Rio","GHA","6792992","Female") 
]

How can I do something similar with a simple script? I'm a complete newbie to bash.

7
  • 1
    Might be best to create a json object. Commented Apr 2, 2023 at 13:00
  • do you mean something like that cameronnokes.com/blog/working-with-json-in-bash-using-jq ? Commented Apr 2, 2023 at 13:02
  • 3
    bash doesn't support multi-dimensional arrays. use perl. or awk. Commented Apr 2, 2023 at 13:18
  • 5
    This seems like X/Y problem. What is the big idea, @Santiago? Commented Apr 2, 2023 at 13:42
  • 3
    You can use an associative array (declare -A DATACOL) to realize any multidimensional "array" with bash. Commented Apr 2, 2023 at 16:01

5 Answers 5

35

You don't. If you find yourself needing something like a multi-dimensional array, that is a very strong indication that you should be using an actual programming language instead of a shell. Shells are not really programming languages, and although they can be (ab)used as one, that should only ever be for simple things. Unfortunately, many people seem to think that the shell is the right tool for all situations and that leads to a lot of wasted effort trying to do something the shell either cannot do, or can do very badly, instead of using a tool designed for the job.

That said, you can hack something together using using namerefs:

A variable can be assigned the nameref attribute using the -n option to the declare or local builtin commands (see Bash Builtin Commands) to create a nameref, or a reference to another variable. This allows variables to be manipulated indirectly. Whenever the nameref variable is referenced, assigned to, unset, or has its attributes modified (other than using or changing the nameref attribute itself), the operation is actually performed on the variable specified by the nameref variable’s value. A nameref is commonly used within shell functions to refer to a variable whose name is passed as an argument to the function. For instance, if a variable name is passed to a shell function as its first argument, running

declare -n ref=$1

inside the function creates a nameref variable ref whose value is the variable name passed as the first argument. References and assignments to ref, and changes to its attributes, are treated as references, assignments, and attribute modifications to the variable whose name was passed as $1.

For example, like this:

#!/bin/bash

data1=("Santiago" "Barcelona" "AWG" "6792992" "Male")
data2=("Santi" "Texas" "BGG" "6792992" "Male")
data3=("Tiago" "Rio" "GHA" "6792992" "Female")

datacol=("data1" "data2" "data3")

for arrayName in "${datacol[@]}"; do
  declare -n array="$arrayName"
  echo "The second element of the array '$arrayName' is: ${array[1]}"
done

Which produces:

$ foo.sh
The second element of the array 'data1' is: Barcelona
The second element of the array 'data2' is: Texas
The second element of the array 'data3' is: Rio

It's just really complicated, fragile, and not worth the effort. Use a real scripting language instead.

5
  • 2
    Another example of this on Stack Overflow, also using Bash4.3 namerefs: How do format and traverse an array that contains arrays ,and each array contains an array? But yes, agreed, a different language would be the right tool for a job that involves this. Perl can do this nicely, and is still good at many of the things that Bash is good at, like conveniently running other programs and capturing their stdout, and text manipulation. Commented Apr 3, 2023 at 23:21
  • namerefs and array merge sounds good Commented Apr 4, 2023 at 21:53
  • At the end will i need another script language Commented Apr 4, 2023 at 21:56
  • 1
    it is the right tool, i need to loop over nested arrays in an ffmpeg shell script. Most ffmpeg language wrappers dont support writing a detailed filter_complex expression Commented Dec 3, 2023 at 4:45
  • @PirateApp I suggest you ask a new question then, whatever it is you are trying to do will almost certainly be easier in another language. The shell just isn't that good at complex data structures. Commented Dec 3, 2023 at 12:12
8

You can't in Bash, not without tricks. Ksh93 does have native multidimensional arrays, though.

One common trick is to use an associative array (declare -A arr), and use keys like 1,2, with multiple indexes separated by a literal comma. Though iterating over a single "row" or "column" is not that simple. This is also how AWK implements multidimensional arrays, see e.g. the GNU AWK manual.

In ksh:

arr=((a b c) (d e f))       # 3x2
arr[2]=(g h i)              # one more row
arr[0][2]=x                 # change a value
typeset -p arr
echo ---
for i in ${!arr[@]}; do     # ${!arr[@]} gives the indexes
   for j in ${!arr[i][@]}; do
      echo -n "${arr[i][j]} ";
   done;
   echo;
done

prints

typeset -a arr=((a b x) (d e f) (g h i) )
---
a b x 
d e f 
g h i 

But really, this is one of those cases where you should probably consider switching to e.g. Python (or Perl, or...) instead, unless your use-case is quite special. The shell languages make it easy to start external programs, but handling data structures is much harder.

5

I find "abusing" the array a bit achieves what you want:

#!/bin/bash

AllServers=(
    "vmkm13, 172.16.39.71"
    "vmkm14, 172.16.39.72"
    "vmkm15, 172.16.39.84"
    "vmkw51, 172.16.39.73"
    "vmkw52, 172.16.39.74"
    "vmkw53, 172.16.39.75"
    "vmkw54, 172.16.39.76"
    "vmkw55, 172.16.39.77"
    "vmkw56, 172.16.39.78"
    "vmkw57, 172.16.39.79"
    "vmkw58, 172.16.39.80"
    "vmkw59, 172.16.39.81"
    "vmkw60, 172.16.39.82"
    "vmkw61, 172.16.39.83"
    "vmkw62, 172.16.39.85"
    "vmkw63, 172.16.39.86"
    "vmkw64, 172.16.39.87"

)

for Servers in "${AllServers[@]}"; do
    Servername=$(echo "$Servers" | awk -F',' '{ print $1 }')
    ServerIP=$(echo "$Servers" | awk -F',' '{ print $2 }')

done

This is by no means correct. But enclosing the data in double quotes and then having entries separated by commas allows you to using something like AWK to extract the column entry into a variable. I did say it is a very hacky method. But it works.

3

There are of course situations in which a shell script is simpler/more practical than a Python or Perl ... script, such as when a script serves as "glue" between an input source and subsequent processing.

OP didn't mention what they wanted to do with the data, so the following is a summary of a few of the possibilities of dealing with key-value data in shell script. I've also assumed that the fields in the input are comma-separated (as would be the case with a en_US-locale *.csv for example).

Possibility #1: linear processing with an array per line from an input file

# generic shell script
rownum=0
while IFS=',' read -r loca locb ccode ncode gend; do 
    # do something with fields 'loca',... etc per row
    process_row "$rownum" "$loca" "$locb" "$ccode" "$ncode" "$gend" || break
    rownum=$(( rownum + 1 ))
done < sourcefile.csv

Possibility #2: linear processing using the positional element list, with e.g. input from a function/program

# generic shell script
oifs="$IFS"; newline="
"; IFS="$newline"
for var in $(your_input_source_command_that_emits_csv); do
    IFS=','; set -f; set -- $var; set +f; IFS="$oifs"
    process_row "$@" || break
    IFS="$newline"
done
IFS="$oifs"

Possibility #3: load data into a multidimensional array for subsequent non-linear processing

# can be done with ksh, zsh or even, uh, bash.
declare/local -A arry=()
rownum=0; # <= first dimension is rownum
while read -r lin; do
    for key in 'loca' 'locb' 'ccode' 'ncode' 'gend'; do
        arry["$rownum.$key"]="${lin%%,*}"; lin="${lin#*,"}"
    done
    rownum=$(( rownum + 1 ))
done < fsource # or < <(function_or_program)

Here, the "dimension separator" is a dot, but any other non-digit char would be ok too. One could use any character (including $'\n',$'\a' etc) that doesn't appear in any dimension name.

0
2

It's hacky, but you can serialize an array with declare -p (typeset -p also works). Using that strategy, you can make an array of serialized arrays in bash, and then eval them. Notice that we have to use quoting if we care about empty strings as values:

# multidimensional array serialized with `declare -p`
# red, yellow, green things
declare -a color_table=(
  "$(inner_row=(stop caution go);          declare -p inner_row)"
  "$(inner_row=(rose tulip clover);        declare -p inner_row)"
  "$(inner_row=(strawberry banana grape);  declare -p inner_row)"
)

And here's how you can use that multidimensional array:

# using eval, we deserialize the inner_row variable as we loop
echo "=== table ==="
printf '%s\n' "${color_table[@]}"
echo "=== rows ==="
for row in "${color_table[@]}"; do
  eval $row
  echo "red thing: ${inner_row[0]}"
  echo "yellow thing: ${inner_row[1]}"
  echo "green thing: ${inner_row[2]}"
done

And the output is:

=== table ===
declare -a inner_row=([0]="stop" [1]="caution" [2]="go")
declare -a inner_row=([0]="rose" [1]="tulip" [2]="clover")
declare -a inner_row=([0]="strawberry" [1]="banana" [2]="grape")
=== rows ===
red thing: stop
yellow thing: caution
green thing: go
red thing: rose
yellow thing: tulip
green thing: clover
red thing: strawberry
yellow thing: banana
green thing: grape

For Zsh users, remember arrays start at 1 so this script will work, but instead of indexing with 0-2 in this example, use 1-3.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.