I am in a situation where multiple threads reading the same huge file with mutliple file pointers to same file. The file will have atleast 1 million lines. Eachline's length varies from 500 characters to 1500 characters. There won't "write" operations on the file. Each thread will start reading the same file from different lines. Which is the efficient way..? Using the Python's linecache or normal readline() or is there anyother effient way?
-
Do you have to use threading? It sounds like disk I/O will be the bottleneck, so reading from different parts of the file will cause more seek times and slow things down, rather than speed them up.Thomas– Thomas2010-05-07 08:48:37 +00:00Commented May 7, 2010 at 8:48
-
Yes, I have to use threads. I am working on performance testing. So don't want to take hits on both file readings, memory managment.KALYAN– KALYAN2010-05-07 09:13:34 +00:00Commented May 7, 2010 at 9:13
Add a comment
|
1 Answer
Have a look at the mmap module: http://docs.python.org/library/mmap.html
It will allow you to use the file as an array, while the OS handles the actual reading and buffering.