Skip to main content
12 events
when toggle format what by license comment
Dec 28, 2020 at 16:34 comment added Sridhar Sarnobat This is also useful if you're not operating on a "file" as such but a long-running stream, and you need to get output immediately. I'm writing a tool to migrate data between data centers which will run for a long time and I'd like to see a sample of the data but not all of it.
Apr 12, 2019 at 15:17 comment added Bruno Bronosky If you need an exact number, you can always… Run this with a % greater than your need. Count the result. Remove lines matching count mod difference.
Apr 15, 2018 at 18:42 comment added Polymerase This is the best answer, the lines are picked randomly while respecting the chronological order of the original file, in case this is a requirement. In addition awk is more resource friendly than shuf
Dec 6, 2016 at 18:35 history edited Txangel CC BY-SA 3.0
added 50 characters in body
Dec 6, 2016 at 18:32 comment added Txangel @G-Man The question seems to talk about getting 10k lines from a million as an example. None of the answers around did work for me (because of the size of the files and hardware limitations) and I propose this as a reasonable compromise. It won't get you 10k lines out of a million but it might be close enough for most practical purposes. I've clarified it a bit more following your advise. Thanks.
Dec 6, 2016 at 18:26 history edited Txangel CC BY-SA 3.0
added 95 characters in body
Dec 5, 2016 at 21:48 comment added G-Man Says 'Reinstate Monica' P.S.  Simplistic approaches using $RANDOM won’t work correctly for files larger than 32767 lines.  The statement “Using $RANDOM doesn’t reach the entire file” is a bit broad.
Dec 5, 2016 at 21:47 comment added G-Man Says 'Reinstate Monica' If a user wants approximately 1% of the non-blank lines, this is a pretty good answer. But if the user wants an exact number of lines (e.g., 1000 out of a 1000000-line file), this fails. As the answer you got it from says, it yields only a statistical estimate. And do you understand the answer well enough to see that it is ignoring blank lines? This might be a good idea, in practice, but undocumented features are, in general, not a good idea.
S Dec 5, 2016 at 21:07 history suggested phk CC BY-SA 3.0
formatting
Dec 5, 2016 at 20:32 review Suggested edits
S Dec 5, 2016 at 21:07
Dec 5, 2016 at 20:24 review First posts
Dec 5, 2016 at 20:32
Dec 5, 2016 at 20:23 history answered Txangel CC BY-SA 3.0