2

I have a file that has let's say, 5 columns(obtained by redirecting the results of a Sybase select query). Each column is separated by a tab. I need to filter out the columns that have all NULLs. Any column among the five can be null.

For example, if the columns in the file look like the below:

1000    NULL    NULL    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL
7       NULL    1000    2       NULL

The output must be(after removing columns 2 and 5), preferably in the same file:

1000    NULL    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2    
7       1000    2 

I reached:

for i in {1..5}  
do
echo $i
dlr="$"$i
str="{print $dlr}"
echo $str
awk '$str' <input_file> | while read value
    do
        echo $value

        if [ "$value" == "NULL" ]
        then
                echo "inside"
                cut $i 
        fi
    done
done

value right now, shows all the rows!!! Also, I am not sure how to cut off the column. I am a shell script beginner and unable to proceed further.

Can you please advise how to do this?

2
  • 2
    Would it not be easier to do this in the database query? Commented May 24, 2019 at 14:23
  • It doesn't feel like a particularly easy thing to do in SQL Commented May 24, 2019 at 20:16

4 Answers 4

3

By means of the GNU utility datamash, you first invert the rows then delete all null rows and invert the rows once again:

$ datamash transpose | sed -Ee '/^(NULL\t)*NULL$/d' | datamash transpose 
1

I don't really like this solution but it seems to work:

#!/bin/bash

input=~/tmp/input
skip=()

c=$(awk '{print NF;exit}' "$input")

for ((i=1;i<=c;i++)); do
    col=$(awk -v c="$i" '{print $c}' "$input" | sort -u)
    if [[ $col == NULL ]]; then
        skip+=( "$i" )
    fi
done
( IFS=,; awk -v S="${skip[*]}" '{ split(S,s,","); for (i=1;i<=length(s);i++) { $s[i]="" } print}' "$input" )
  1. input (your input file)
  2. skip (array that will be filled later)
  3. c (will be set to the number of columns in your file. This assumes the entire file has a uniform number of columns)
  4. We now will loop through each column in the file, sort and unique it, and check if the column only contains NULL. If so we will add that column number to the skip array
  5. We now set the IFS to a comma (the reason I'm using a subshell) and pass the skip array as a comma separated value to an awk variable. This is so we can use split to convert that variable back into an awk array.
  6. awk will then loop through each number in the s array and set that column to empty and print what remains.
1

You can't do it without processing the file twice. The following is first using awk to construct the final cut command:

killnulls(){
    cut -f "$(awk -F'\t' '
        { for(i=1;i<=NF;i++) a[i] += $i!="NULL" }
        END { for(i=1;i in a;i++) if(a[i]) printf j++?","i:i }
    ' "$@")" "$@"
}

$ killnulls file
1000    NULL    2
7       1000    2
7       1000    2
...
0

Command:

awk '/NULL/{gsub(/NULL/,"",$0);print $0}' filename| sed -r "s/\s+/ /g"| awk 'NR==1{$3=$2;$2="NULL";}1'

output

awk '/NULL/{gsub(/NULL/,"",$0);print $0}' i.txt| sed -r "s/\s+/ /g"| awk 'NR==1{$3=$2;$2="NULL";}1'
1000 NULL 2
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2 
7 1000 2

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.