Timeline for Bash: Nested while loop to detect duplicates and number the duplicates
Current License: CC BY-SA 4.0
11 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Nov 5, 2020 at 12:35 | vote | accept | Jerry | ||
| Nov 5, 2020 at 11:01 | comment | added | terdon♦ |
Do you have two files or three? You only use two files in your script (headers.txt and uniqueheaders.txt) but you also seem to have a file that has both headers and sequences. Is that file headers.txt or is it a third file? And what do you mean that "both files contain unique header names"? Isn't the whole point that one of the files has duplicate header names?
|
|
| Nov 5, 2020 at 10:58 | answer | added | AdminBee | timeline score: 1 | |
| Nov 5, 2020 at 10:47 | comment | added | AdminBee | Thanks for the clarification. Do I understand correctly that there is no "blank line" in between gene sequences, or anything else that would identify a header line (apart from gene sequences being all uppercase ;) ) | |
| Nov 5, 2020 at 10:44 | history | edited | Jerry | CC BY-SA 4.0 |
deleted 8 characters in body
|
| Nov 5, 2020 at 10:19 | history | edited | AdminBee | CC BY-SA 4.0 |
Formatting and tags
|
| Nov 5, 2020 at 10:15 | comment | added | Jerry | I edited the post, that is basically all I wanted... @terdon | |
| Nov 5, 2020 at 10:14 | history | edited | Jerry | CC BY-SA 4.0 |
added 541 characters in body
|
| Nov 5, 2020 at 9:54 | comment | added | terdon♦ | Can you show us a few lines of both files and the output you are expecting? Doing this in the shell is incredibly inefficient. You probably just need a simple awk one-liner that will run in seconds (your loop will take several minutes for larger files). | |
| Nov 5, 2020 at 9:27 | comment | added | choroba | What do you think the resulting sed command is? | |
| Nov 5, 2020 at 9:22 | history | asked | Jerry | CC BY-SA 4.0 |