2

I am in a situation where multiple threads reading the same huge file with mutliple file pointers to same file. The file will have atleast 1 million lines. Eachline's length varies from 500 characters to 1500 characters. There won't "write" operations on the file. Each thread will start reading the same file from different lines. Which is the efficient way..? Using the Python's linecache or normal readline() or is there anyother effient way?

2
  • Do you have to use threading? It sounds like disk I/O will be the bottleneck, so reading from different parts of the file will cause more seek times and slow things down, rather than speed them up. Commented May 7, 2010 at 8:48
  • Yes, I have to use threads. I am working on performance testing. So don't want to take hits on both file readings, memory managment. Commented May 7, 2010 at 9:13

1 Answer 1

2

Have a look at the mmap module: http://docs.python.org/library/mmap.html

It will allow you to use the file as an array, while the OS handles the actual reading and buffering.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.