Timeline for How to manage huge amount of files in shell?

Current License: CC BY-SA 3.0

19 events

when toggle format	what		by	license	comment
S Jun 14, 2013 at 21:15	history	suggested	Kevin Bowen	CC BY-SA 3.0	corrected spelling and minor grammar improvements
Jun 14, 2013 at 20:56	review	Suggested edits
S Jun 14, 2013 at 21:15
Jun 14, 2013 at 19:26	history	edited	user26112		edited tags
Apr 10, 2011 at 18:53	comment	added	alex		Why not just use full-blown database then? Modern databases allow you to store arbitrary file content in a table column. Adding an index to modification date column you can instantly search it.
Apr 10, 2011 at 15:58	comment	added	user2362		@alex: of course not. The question is about managing huge amount of files and it really means, the scalability is a serious problem which I don't know how to address correctly, not just that many familiar shell commands fail so I am using Python like the added example code shows. I think the dir restriction can be circumvented by mounting new fs to the system every time space is used in one machine, any experience with that?
Apr 10, 2011 at 14:27	comment	added	alex		@hhh: with this approach you may only have up to about 256 directories on any given level, so for two levels of indirection you'll have up to 256*256=65536 directories in total (more, if Unicode in filenames is employed.) Is that too large already for your filesystem?
Apr 10, 2011 at 4:06	comment	added	user2362		@alex: that is a good idea but what do you do when you have too large amount of dirs?
Apr 10, 2011 at 4:03	history	edited	user2362	CC BY-SA 3.0	hopefully getting more quality answers, current answers are unrelated and even wrong
Mar 25, 2011 at 5:15	answer	added	polemon		timeline score: 4
Mar 25, 2011 at 5:07	comment	added	alex		Do you have millions of files in a single directory? If so you might consider splitting them by one or two-levels deep subdirs, based on first chars of the file name, e.g: `a/b/abcdef.jpg`
Mar 25, 2011 at 3:57	answer	added	asoundmove		timeline score: 1
Mar 25, 2011 at 2:31	history	tweeted			twitter.com/#!/StackUnix/status/51108944495656960
Mar 25, 2011 at 1:26	comment	added	xenoterracide		@hhh as root do `tune2fs -l /dev/<ext partition>` look to see if filesystem features include `dir_index` honestly though I doubt it helps much in this case. I don't know if things like nepomuk or whatever gnome has... have cli clients... but almost every solution is going to need a pregenerated index, which will take a while to run.
Mar 25, 2011 at 1:07	comment	added	user2362		@xenoterracide: ext3, not sure either whether it matters. I think the solution I illustrated fixes the problem for future search problem but it does not help at all with current photos, it is very time-consuming to search it.
Mar 25, 2011 at 1:01	comment	added	xenoterracide		@hhh what filesystem are you using? or doesn't that matter yet... ext does has some performance enhancing features which may not be on by default. Though even those probably won't deal on the scale you're talking. DB's are optimized for these things and have various indexing solutions to deal with them. for example a btree index is not just a simple array...
Mar 25, 2011 at 0:57	comment	added	user2362		@xenoterracide: but even dbs must implement fast searching with something like arrays, db sounds an overkill. Source for picture-taking thing is here: github.com/fsphil/fswebcam. Perhaps, I could mod it a bit the time it saves the picture so I could append a line with inode-number&unix-time-stamp to file. Now not with the pictures but line, it would be much faster to search for pictures. Or even more easily, each time a picture is saved to a disk, I append a line to a file of its time-stamp. Round-around solution. But won't solve the problem with current pictures so question relevant.
Mar 25, 2011 at 0:43	comment	added	xenoterracide		@hhh for dataset's on this scale a properly index-ed db is probably the only option
Mar 25, 2011 at 0:18	history	edited	user2362	CC BY-SA 2.5	added 91 characters in body
Mar 25, 2011 at 0:11	history	asked	user2362	CC BY-SA 2.5

toggle format