Skip to main content
added 13 characters in body
Source Link
FelixJN
  • 14.1k
  • 2
  • 36
  • 55

Via awk:

  • Test the first three fields if the conditions are met for all set members. (split fields on ; and test each pair)

  • If so, remember the position of the set in each field. (in array sel)

  • In the second block, run through all fields and keep only the values matching the positions from before.

  • Print only, if any match was found.

    BEGIN {FS=OFS="\t"}
    {
    #select IDs of value sets to be keept
      split($1,a,";")
      split($2,b,";")
      split($3,c,";")
      nsel=0 ; delete sel
      for ( i in a ) {
          if (a[i]+0<-0.5 && b[i]+0>1 && c[i]+0>2) {
          sel[++nsel]=i
          }
      }
    #if any: run through all fields and reselect
      if (nsel) {
          for (i=1 ; i<=NF ; i++) {
              split($i,a,";")
              $i=a[sel[1]]
              for (j=2 ; j<=nsel ; j++) {
                  $i=$i";"a[sel[j]]
              }
          }
      }
    }
    #print only if any matching set was found
    nsel
    

Via awk:

  • Test the first three fields if the conditions are met for all set members. (split fields on ; and test each pair)

  • If so, remember the position of the set in each field. (in array sel)

  • In the second block, run through all fields and keep only the values matching the positions from before.

  • Print only, if any match was found.

    BEGIN {FS=OFS="\t"}
    {
    #select IDs of value sets to be keept
      split($1,a,";")
      split($2,b,";")
      split($3,c,";")
      nsel=0
      for ( i in a ) {
          if (a[i]+0<-0.5 && b[i]+0>1 && c[i]+0>2) {
          sel[++nsel]=i
          }
      }
    #if any: run through all fields and reselect
      if (nsel) {
          for (i=1 ; i<=NF ; i++) {
              split($i,a,";")
              $i=a[sel[1]]
              for (j=2 ; j<=nsel ; j++) {
                  $i=$i";"a[sel[j]]
              }
          }
      }
    }
    #print only if any matching set was found
    nsel
    

Via awk:

  • Test the first three fields if the conditions are met for all set members. (split fields on ; and test each pair)

  • If so, remember the position of the set in each field. (in array sel)

  • In the second block, run through all fields and keep only the values matching the positions from before.

  • Print only, if any match was found.

    BEGIN {FS=OFS="\t"}
    {
    #select IDs of value sets to be keept
      split($1,a,";")
      split($2,b,";")
      split($3,c,";")
      nsel=0 ; delete sel
      for ( i in a ) {
          if (a[i]+0<-0.5 && b[i]+0>1 && c[i]+0>2) {
          sel[++nsel]=i
          }
      }
    #if any: run through all fields and reselect
      if (nsel) {
          for (i=1 ; i<=NF ; i++) {
              split($i,a,";")
              $i=a[sel[1]]
              for (j=2 ; j<=nsel ; j++) {
                  $i=$i";"a[sel[j]]
              }
          }
      }
    }
    #print only if any matching set was found
    nsel
    
deleted 17 characters in body
Source Link
FelixJN
  • 14.1k
  • 2
  • 36
  • 55

Via awk:

  • Test the first three fields if the conditions are met for all set members. (split fields on ; and test each pair)

  • If so, remember the position of the set in each field. (in array sel)

  • In the second block, run through all fields and keep only the values matching the positions from before.

  • Print only, if any match was found.

    BEGIN {FS=OFS="\t"}
    {
    #select IDs of value sets to be keept
      split($1,a,";")
      split($2,b,";")
      split($3,c,";")
      nsel=0
      for ( i in a ) {
          if (a[i]+0<-0.5 && b[i]+0>1 && c[i]+0>2) {
          sel[++nsel]=i
          }
      }
    #if any: run through all fields and reselect
      if (nsel) {
          for (i=1 ; i<=NF ; i++) {
              split($i,a,";")
              $i=a[sel[1]]
              for (j=2 ; j<=nsel ; j++) {
                  el=sel[j]
                  $i=$i";"a[el]$i=$i";"a[sel[j]]
              }
          }
      }
    }
    #print only if any matching set was found
    nsel
    

Via awk:

  • Test the first three fields if the conditions are met for all set members. (split fields on ; and test each pair)

  • If so, remember the position of the set in each field. (in array sel)

  • In the second block, run through all fields and keep only the values matching the positions from before.

  • Print only, if any match was found.

    BEGIN {FS=OFS="\t"}
    {
    #select IDs of value sets to be keept
      split($1,a,";")
      split($2,b,";")
      split($3,c,";")
      nsel=0
      for ( i in a ) {
          if (a[i]+0<-0.5 && b[i]+0>1 && c[i]+0>2) {
          sel[++nsel]=i
          }
      }
    #if any: run through all fields and reselect
      if (nsel) {
          for (i=1 ; i<=NF ; i++) {
              split($i,a,";")
              $i=a[sel[1]]
              for (j=2 ; j<=nsel ; j++) {
                  el=sel[j]
                  $i=$i";"a[el]
              }
          }
      }
    }
    #print only if any matching set was found
    nsel
    

Via awk:

  • Test the first three fields if the conditions are met for all set members. (split fields on ; and test each pair)

  • If so, remember the position of the set in each field. (in array sel)

  • In the second block, run through all fields and keep only the values matching the positions from before.

  • Print only, if any match was found.

    BEGIN {FS=OFS="\t"}
    {
    #select IDs of value sets to be keept
      split($1,a,";")
      split($2,b,";")
      split($3,c,";")
      nsel=0
      for ( i in a ) {
          if (a[i]+0<-0.5 && b[i]+0>1 && c[i]+0>2) {
          sel[++nsel]=i
          }
      }
    #if any: run through all fields and reselect
      if (nsel) {
          for (i=1 ; i<=NF ; i++) {
              split($i,a,";")
              $i=a[sel[1]]
              for (j=2 ; j<=nsel ; j++) {
                  $i=$i";"a[sel[j]]
              }
          }
      }
    }
    #print only if any matching set was found
    nsel
    
deleted 1728 characters in body
Source Link
FelixJN
  • 14.1k
  • 2
  • 36
  • 55

EDIT: following the comment, that the whole value pairs should be removed once one of them fails the requirement.

It saves each of the value pairs as element i of arrays a, b, and c, respectively; then checks for the condition and only adds the value triples to the respective fields if the conditions all apply. Since $1 is emptied, it may be used as indicator to print a line only if it was filled with a value again; and to see if a value is the first in the modified field (determine need for semicolon).

BEGIN {FS=OFS="\t"}
{
split($1,a,";") ; $1=""
split($2,b,";")
split($3,c,";")
for ( i in a ) {
    if (a[i]+0<-0.5 && b[i]+0>1 && c[i]+0>2) {
        if ($1) { $1=$1";"a[i] ; $2=$2";"b[i] ; $3=$3";"c[i] }
        else { $1=a[i] ; $2=b[i] ; $3=c[i] }
    }
}
}
$1

Old code with misunderstood requirements: (deletes individual values only, does not treat triples as paired data)

With Via awk you could split the fields into arrays (called a), select desired values (into an array b), and redefine the field (print elements ob b with ;-separators).

I used a function that gets the field, operator (as string), and comparison value, then returns the new field content. As operators cannot be taken from a variable, I had to use a second function that selects the comparison operation based on the operator string (Ref at SO). This keeps you flexible for all fields and possible comparisons. I used val1+0 to ensure that contents like N/A are suppressed by forcing the variable to be numerical:

$cat script.awk
function operation(val1,operator,val2) {
    if (operator == "==" ) return val1+0 == val2
    if (operator == "<=" ) return val1+0 <= val2
    if (operator == "<" )  return val1+0 < val2
    if (operator == ">=" ) return val1+0 >= val2
    if (operator == ">" )  return val1+0 > val2
    if (operator == "!=" ) return val1+0 != val2
}

function valselect(fieldin,operator,value){
    split(fieldin,a,";") ; delete b ; blength=0
    for (i in a) {
       if (operation(a[i],operator,value)) { b[++blength]=a[i] }
    }
    fieldout=b[1]
    for (i=2 ; i<=blength ; i++) { fieldout=fieldout";"b[i] }
    return fieldout
}

BEGIN {FS=OFS="\t"}
{
$1=valselect($1,"<",-0.5)
$2=valselect($2,">",1)
$3=valselect($3,">",2)
print $0}

Then run

awk -f script.awk file

For an even more flexible approach, you may as well define the fields, operators and values in arrays and loop over them in the execution block.

  • Test the first three fields if the conditions are met for all set members. (split fields on ; and test each pair)

  • If so, remember the position of the set in each field. (in array sel)

  • In the second block, run through all fields and keep only the values matching the positions from before.

  • Print only, if any match was found.

    BEGIN {FS=OFS="\t"}
    {
    #select IDs of value sets to be keept
      split($1,a,";")
      split($2,b,";")
      split($3,c,";")
      nsel=0
      for ( i in a ) {
          if (a[i]+0<-0.5 && b[i]+0>1 && c[i]+0>2) {
          sel[++nsel]=i
          }
      }
    #if any: run through all fields and reselect
      if (nsel) {
          for (i=1 ; i<=NF ; i++) {
              split($i,a,";")
              $i=a[sel[1]]
              for (j=2 ; j<=nsel ; j++) {
                  el=sel[j]
                  $i=$i";"a[el]
              }
          }
      }
    }
    #print only if any matching set was found
    nsel
    

EDIT: following the comment, that the whole value pairs should be removed once one of them fails the requirement.

It saves each of the value pairs as element i of arrays a, b, and c, respectively; then checks for the condition and only adds the value triples to the respective fields if the conditions all apply. Since $1 is emptied, it may be used as indicator to print a line only if it was filled with a value again; and to see if a value is the first in the modified field (determine need for semicolon).

BEGIN {FS=OFS="\t"}
{
split($1,a,";") ; $1=""
split($2,b,";")
split($3,c,";")
for ( i in a ) {
    if (a[i]+0<-0.5 && b[i]+0>1 && c[i]+0>2) {
        if ($1) { $1=$1";"a[i] ; $2=$2";"b[i] ; $3=$3";"c[i] }
        else { $1=a[i] ; $2=b[i] ; $3=c[i] }
    }
}
}
$1

Old code with misunderstood requirements: (deletes individual values only, does not treat triples as paired data)

With awk you could split the fields into arrays (called a), select desired values (into an array b), and redefine the field (print elements ob b with ;-separators).

I used a function that gets the field, operator (as string), and comparison value, then returns the new field content. As operators cannot be taken from a variable, I had to use a second function that selects the comparison operation based on the operator string (Ref at SO). This keeps you flexible for all fields and possible comparisons. I used val1+0 to ensure that contents like N/A are suppressed by forcing the variable to be numerical:

$cat script.awk
function operation(val1,operator,val2) {
    if (operator == "==" ) return val1+0 == val2
    if (operator == "<=" ) return val1+0 <= val2
    if (operator == "<" )  return val1+0 < val2
    if (operator == ">=" ) return val1+0 >= val2
    if (operator == ">" )  return val1+0 > val2
    if (operator == "!=" ) return val1+0 != val2
}

function valselect(fieldin,operator,value){
    split(fieldin,a,";") ; delete b ; blength=0
    for (i in a) {
       if (operation(a[i],operator,value)) { b[++blength]=a[i] }
    }
    fieldout=b[1]
    for (i=2 ; i<=blength ; i++) { fieldout=fieldout";"b[i] }
    return fieldout
}

BEGIN {FS=OFS="\t"}
{
$1=valselect($1,"<",-0.5)
$2=valselect($2,">",1)
$3=valselect($3,">",2)
print $0}

Then run

awk -f script.awk file

For an even more flexible approach, you may as well define the fields, operators and values in arrays and loop over them in the execution block.

Via awk:

  • Test the first three fields if the conditions are met for all set members. (split fields on ; and test each pair)

  • If so, remember the position of the set in each field. (in array sel)

  • In the second block, run through all fields and keep only the values matching the positions from before.

  • Print only, if any match was found.

    BEGIN {FS=OFS="\t"}
    {
    #select IDs of value sets to be keept
      split($1,a,";")
      split($2,b,";")
      split($3,c,";")
      nsel=0
      for ( i in a ) {
          if (a[i]+0<-0.5 && b[i]+0>1 && c[i]+0>2) {
          sel[++nsel]=i
          }
      }
    #if any: run through all fields and reselect
      if (nsel) {
          for (i=1 ; i<=NF ; i++) {
              split($i,a,";")
              $i=a[sel[1]]
              for (j=2 ; j<=nsel ; j++) {
                  el=sel[j]
                  $i=$i";"a[el]
              }
          }
      }
    }
    #print only if any matching set was found
    nsel
    
deleted 3 characters in body
Source Link
FelixJN
  • 14.1k
  • 2
  • 36
  • 55
Loading
deleted 55 characters in body
Source Link
FelixJN
  • 14.1k
  • 2
  • 36
  • 55
Loading
added 106 characters in body
Source Link
FelixJN
  • 14.1k
  • 2
  • 36
  • 55
Loading
added case "no triple selected"
Source Link
FelixJN
  • 14.1k
  • 2
  • 36
  • 55
Loading
added 796 characters in body
Source Link
FelixJN
  • 14.1k
  • 2
  • 36
  • 55
Loading
added 150 characters in body
Source Link
FelixJN
  • 14.1k
  • 2
  • 36
  • 55
Loading
Source Link
FelixJN
  • 14.1k
  • 2
  • 36
  • 55
Loading