Revisions to Delete consecutive lines in CSV with duplicate values in one field, but keep the last line

replaced http://unix.stackexchange.com/ with https://unix.stackexchange.com/

Source Link

edited Apr 13, 2017 at 12:36

1

Here's another awk approach (thanks @Glenn @Glenn):

 tac file | awk -F, 'awk -F, '!seen[$1]++' | tac

The -F, sets the delimiter. In awk, the default action when an expression evaluates to true is to print the current line. !seen[$1] will be true when the first field doesn't exist in the array seen. However, since we are also creating it with seen[$1]++, that will only be false the 1st time it is seen. The result is that only the first of the duplicates will be printed.

Since the script above will keep the first and not the last of each run of duplicates, the two tac calls are an ugly hack to reverse the order and make it keep the last. Since there are two, the final order will be unchanged.

Here's another awk approach (thanks @Glenn):

 tac file | awk -F, 'awk -F, '!seen[$1]++' | tac

The -F, sets the delimiter. In awk, the default action when an expression evaluates to true is to print the current line. !seen[$1] will be true when the first field doesn't exist in the array seen. However, since we are also creating it with seen[$1]++, that will only be false the 1st time it is seen. The result is that only the first of the duplicates will be printed.

Since the script above will keep the first and not the last of each run of duplicates, the two tac calls are an ugly hack to reverse the order and make it keep the last. Since there are two, the final order will be unchanged.

Here's another awk approach (thanks @Glenn):

 tac file | awk -F, 'awk -F, '!seen[$1]++' | tac

The -F, sets the delimiter. In awk, the default action when an expression evaluates to true is to print the current line. !seen[$1] will be true when the first field doesn't exist in the array seen. However, since we are also creating it with seen[$1]++, that will only be false the 1st time it is seen. The result is that only the first of the duplicates will be printed.

Since the script above will keep the first and not the last of each run of duplicates, the two tac calls are an ugly hack to reverse the order and make it keep the last. Since there are two, the final order will be unchanged.

added 317 characters in body

Source Link

edited Feb 25, 2016 at 9:18

terdon ♦

252.2k
69
480
718

Here's another awk approach (thanks @Glenn):

 tac file | awk -F, 'awk -F, '{if($1!=last){print;}last=$1}'seen[$1]++' | tac

The -F, sets the delimiter. Then, for each lineIn awk, wethe default action when an expression evaluates to true is to print it if itsthe current line. !seen[$1] will be true when the first field is not the same asdoesn't exist in the current value ofarray lastseen. Then, for each lineHowever, thesince we are also creating it with lastseen[$1]++ variable, that will only be false the 1st time it is set toseen. The result is that only the first field. This will leaveof the original order unchangedduplicates will be printed.

Since the script above will keep the first and not the last of each run of duplicates, the two tac calls are an ugly hack to reverse the order and make it keep the last. Since there are two, the final order will be unchanged.

Here's another awk approach:

 tac file | awk -F, '{if($1!=last){print;}last=$1}' | tac

The -F, sets the delimiter. Then, for each line, we print it if its first field is not the same as the current value of last. Then, for each line, the last variable is set to the first field. This will leave the original order unchanged.

Since the script above will keep the first and not the last of each run of duplicates, the two tac calls are an ugly hack to reverse the order and make it keep the last. Since there are two, the final order will be unchanged.

Here's another awk approach (thanks @Glenn):

 tac file | awk -F, 'awk -F, '!seen[$1]++' | tac

The -F, sets the delimiter. In awk, the default action when an expression evaluates to true is to print the current line. !seen[$1] will be true when the first field doesn't exist in the array seen. However, since we are also creating it with seen[$1]++, that will only be false the 1st time it is seen. The result is that only the first of the duplicates will be printed.

Since the script above will keep the first and not the last of each run of duplicates, the two tac calls are an ugly hack to reverse the order and make it keep the last. Since there are two, the final order will be unchanged.

added 243 characters in body

Source Link

edited Feb 24, 2016 at 19:37

terdon ♦

252.2k
69
480
718

Here's another awk approach:

 tac file | awk -F, '{if($1!=last){print;}last=$1}' file| tac

The -F, sets the delimiter. Then, for each line, we print it if its first field is not the same as the current value of last. Then, for each line, the last variable is set to the first field. This will leave the original order unchanged.

Since the script above will keep the first and not the last of each run of duplicates, the two tac calls are an ugly hack to reverse the order and make it keep the last. Since there are two, the final order will be unchanged.

Here's another awk approach:

awk -F, '{if($1!=last){print;}last=$1}' file

The -F, sets the delimiter. Then, for each line, we print it if its first field is not the same as the current value of last. Then, for each line, the last variable is set to the first field. This will leave the original order unchanged.

Here's another awk approach:

 tac file | awk -F, '{if($1!=last){print;}last=$1}' | tac

The -F, sets the delimiter. Then, for each line, we print it if its first field is not the same as the current value of last. Then, for each line, the last variable is set to the first field. This will leave the original order unchanged.

Since the script above will keep the first and not the last of each run of duplicates, the two tac calls are an ugly hack to reverse the order and make it keep the last. Since there are two, the final order will be unchanged.

Source Link

answered Feb 24, 2016 at 19:08

terdon ♦

252.2k
69
480
718

Loading

Stack Exchange Network

Return to Answer