Python: Huge file reading by using linecache Vs normal file access open()

Question

I am in a situation where multiple threads reading the same huge file with mutliple file pointers to same file. The file will have atleast 1 million lines. Eachline's length varies from 500 characters to 1500 characters. There won't "write" operations on the file. Each thread will start reading the same file from different lines. Which is the efficient way..? Using the Python's linecache or normal readline() or is there anyother effient way?

Do you have to use threading? It sounds like disk I/O will be the bottleneck, so reading from different parts of the file will cause more seek times and slow things down, rather than speed them up. — Thomas
– Thomas, Commented May 7, 2010 at 8:48
Yes, I have to use threads. I am working on performance testing. So don't want to take hits on both file readings, memory managment. — KALYAN
– KALYAN, Commented May 7, 2010 at 9:13

Bastien Léonard · Accepted Answer · 2010-05-07 08:51:18Z

2

Have a look at the mmap module: http://docs.python.org/library/mmap.html

It will allow you to use the file as an array, while the OS handles the actual reading and buffering.

answered May 7, 2010 at 8:51

Bastien Léonard

62k20 gold badges81 silver badges95 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python: Huge file reading by using linecache Vs normal file access open()

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related