Skip to main content
Removed "thanks"; matching question with tags
Source Link

I have a lot CSV files that I have combined. However, there are duplicates, but the entire line is not duplicated. I do have a column that I want to use as the criteria to search for a duplicate. And if there is a duplicate in that column from the entire column, then delete the rows that contain the duplicates in the columns until you have all unique values in this column.

Does anyone know the best way to accomplish this in bashBash, sed or awk?

Thanks

I have a lot CSV files that I have combined. However, there are duplicates, but the entire line is not duplicated. I do have a column that I want to use as the criteria to search for a duplicate. And if there is a duplicate in that column from the entire column, then delete the rows that contain the duplicates in the columns until you have all unique values in this column.

Does anyone know the best way to accomplish this in bash?

Thanks

I have a lot CSV files that I have combined. However, there are duplicates, but the entire line is not duplicated. I do have a column that I want to use as the criteria to search for a duplicate. And if there is a duplicate in that column from the entire column, then delete the rows that contain the duplicates in the columns until you have all unique values in this column.

Does anyone know the best way to accomplish this in Bash, sed or awk?

Source Link

Bash commands/script to remove a line from CSV with duplicate in column

I have a lot CSV files that I have combined. However, there are duplicates, but the entire line is not duplicated. I do have a column that I want to use as the criteria to search for a duplicate. And if there is a duplicate in that column from the entire column, then delete the rows that contain the duplicates in the columns until you have all unique values in this column.

Does anyone know the best way to accomplish this in bash?

Thanks