Skip to main content
19 events
when toggle format what by license comment
Jun 19, 2014 at 1:13 comment added Aquarius Power awk one is VERY fast at time()!
May 20, 2014 at 22:43 comment added Serge Stroobandt The awk trick is also well explained here.
May 18, 2014 at 1:24 comment added cjm @Sadi, if you link to the original question/answer and then explain why it doesn't solve your problem, it won't be closed as a duplicate. (It might get closed for some other reason, but not that.)
May 17, 2014 at 20:07 comment added Sadi Ooops, I didn't notice the spaces at the end of some lines; thank you! I just didn't want to be warned of a duplicate question.
May 17, 2014 at 17:07 comment added cjm @Sadi, that really should have been asked as a question, not a comment. But some of the lines in that file end in a space, and some don't. These commands consider the entire line significant, including the whitespace at the end.
May 17, 2014 at 8:44 comment added Sadi I merged several lists of special characters from different websites into a single plain text file (dl.dropboxusercontent.com/u/39189022/SpecialCharacters.txt) and tried to get rid of duplicates to make a proper list, using different commands I could find, including this one. But the result was always unsatisfactory, including many duplicates and so on (dl.dropboxusercontent.com/u/39189022/…) This casts a doubt in my mind on the reliability of these commands which I use from time to time on some important data :) I wonder why this is so???
Jul 29, 2013 at 19:05 review Suggested edits
Jul 29, 2013 at 19:07
Jan 18, 2012 at 9:55 comment added Christoph Wurm @cjm: I think I get it - mostly: It's an ordinary user-defined array and every input line is added as an index, though without a corresponding value. But what does ++ do?
Jan 17, 2012 at 19:14 comment added cjm @Legate, it's the name of an array in which we're recording every line we've seen. You could change it to '!LarryWall[$0]++' for all awk cares, but "seen" helps people understand the program better.
Jan 17, 2012 at 14:58 comment added Christoph Wurm What is seen? I've searched the User's Guide and can't find anything.
Apr 28, 2011 at 12:52 vote accept Lazer
Apr 25, 2011 at 15:34 history edited cjm CC BY-SA 3.0
use Gordon Davisson's improved awk script
Apr 25, 2011 at 6:29 comment added Gordon Davisson The awk version can be made even shorter by leaving out the if, print, parentheses, and braces: awk '!seen[$0]++'
Apr 25, 2011 at 3:32 comment added cjm @fred, unless your file is truly huge, either version takes longer to type in than it does to run.
Apr 25, 2011 at 0:48 comment added Peter.O general info: I time tested both perl and awk... awk was faster by 25+%)... 450000 unique lines (doubled up) to make 900000 lines (4 bytes per line)
Apr 24, 2011 at 22:46 history edited cjm CC BY-SA 3.0
use camh's improved awk script
Apr 24, 2011 at 22:32 comment added camh A slightly shorter awk script is { if (!seen[$0]++) print }
Apr 24, 2011 at 21:17 history edited cjm CC BY-SA 3.0
added awk version
Apr 24, 2011 at 20:57 history answered cjm CC BY-SA 3.0