Timeline for How to manage huge amount of files in shell?
Current License: CC BY-SA 3.0
        19 events
    
    | when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| S Jun 14, 2013 at 21:15 | history | suggested | Kevin Bowen | CC BY-SA 3.0 | 
        
            
             
                
                    corrected spelling and minor grammar improvements 
                
             
        
     | 
| Jun 14, 2013 at 20:56 | review | Suggested edits | |||
| S Jun 14, 2013 at 21:15 | |||||
| Jun 14, 2013 at 19:26 | history | edited | user26112 | 
        
            
             
                
                    edited tags 
                
             
        
     | 
|
| Apr 10, 2011 at 18:53 | comment | added | alex | Why not just use full-blown database then? Modern databases allow you to store arbitrary file content in a table column. Adding an index to modification date column you can instantly search it. | |
| Apr 10, 2011 at 15:58 | comment | added | user2362 | @alex: of course not. The question is about managing huge amount of files and it really means, the scalability is a serious problem which I don't know how to address correctly, not just that many familiar shell commands fail so I am using Python like the added example code shows. I think the dir restriction can be circumvented by mounting new fs to the system every time space is used in one machine, any experience with that? | |
| Apr 10, 2011 at 14:27 | comment | added | alex | @hhh: with this approach you may only have up to about 256 directories on any given level, so for two levels of indirection you'll have up to 256*256=65536 directories in total (more, if Unicode in filenames is employed.) Is that too large already for your filesystem? | |
| Apr 10, 2011 at 4:06 | comment | added | user2362 | @alex: that is a good idea but what do you do when you have too large amount of dirs? | |
| Apr 10, 2011 at 4:03 | history | edited | user2362 | CC BY-SA 3.0 | 
        
            
             
                
                    hopefully getting more quality answers, current answers are unrelated and even wrong 
                
             
        
     | 
| Mar 25, 2011 at 5:15 | answer | added | polemon | timeline score: 4 | |
| Mar 25, 2011 at 5:07 | comment | added | alex | 
        
            
    Do you have millions of files in a single directory?  If so you might consider splitting them by one or two-levels deep subdirs, based on first chars of the file name, e.g: a/b/abcdef.jpg
        
     | 
|
| Mar 25, 2011 at 3:57 | answer | added | asoundmove | timeline score: 1 | |
| Mar 25, 2011 at 2:31 | history | tweeted | twitter.com/#!/StackUnix/status/51108944495656960 | ||
| Mar 25, 2011 at 1:26 | comment | added | xenoterracide | 
        
            
    @hhh as root do tune2fs -l /dev/<ext partition> look to see if filesystem features include dir_index honestly though I doubt it helps much in this case. I don't know if things like nepomuk or whatever gnome has... have cli clients... but almost every solution is going to need a pregenerated index, which will take a while to run.
        
     | 
|
| Mar 25, 2011 at 1:07 | comment | added | user2362 | @xenoterracide: ext3, not sure either whether it matters. I think the solution I illustrated fixes the problem for future search problem but it does not help at all with current photos, it is very time-consuming to search it. | |
| Mar 25, 2011 at 1:01 | comment | added | xenoterracide | @hhh what filesystem are you using? or doesn't that matter yet... ext does has some performance enhancing features which may not be on by default. Though even those probably won't deal on the scale you're talking. DB's are optimized for these things and have various indexing solutions to deal with them. for example a btree index is not just a simple array... | |
| Mar 25, 2011 at 0:57 | comment | added | user2362 | @xenoterracide: but even dbs must implement fast searching with something like arrays, db sounds an overkill. Source for picture-taking thing is here: github.com/fsphil/fswebcam. Perhaps, I could mod it a bit the time it saves the picture so I could append a line with inode-number&unix-time-stamp to file. Now not with the pictures but line, it would be much faster to search for pictures. Or even more easily, each time a picture is saved to a disk, I append a line to a file of its time-stamp. Round-around solution. But won't solve the problem with current pictures so question relevant. | |
| Mar 25, 2011 at 0:43 | comment | added | xenoterracide | @hhh for dataset's on this scale a properly index-ed db is probably the only option | |
| Mar 25, 2011 at 0:18 | history | edited | user2362 | CC BY-SA 2.5 | 
        
            
             
                
                    added 91 characters in body 
                
             
        
     | 
| Mar 25, 2011 at 0:11 | history | asked | user2362 | CC BY-SA 2.5 |