2

I have wiki.txt file and its size is 50 MB.

  1. I need to do several things on the file and so I thought that the best way in terms of performance is to load the file to memory, is that correct?

  2. This is the code that I written:

    File file = new File("wiki.txt");
    FileInputStream fileInputStream = new FileInputStream(file);
    FileChannel fileChannel = fileInputStream.getChannel();
    MappedByteBuffer mapByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, file.length());
    System.out.println((char)mapByteBuffer.get());
    

I get error on this code: mapByteBuffer.get(). I tried the get() function a few options but all of them I get error and didn't even get an error on e.getMessage() I just got null.

Another important thing to note, my text file contains English words and actions I need to do is search, if expressed is exist in this text file.

Thank you.

1
  • Please post the error-message you get. Also, it would help to see a few sample-lines from your text-file - we could then suggest an efficient way of reading and storing it (the code you show betrays a severe lack of understanding here). Commented Dec 14, 2011 at 7:59

4 Answers 4

2

I would suggest using a MemoryMappedFile, to read the file directly from the disk instead of loading it in memory.

RandomAccessFile file = new RandomAccessFile("wiki.txt", "r");
FileChannel channel = file.getChannel();
MappedByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, 1024*50);

And then you can read the buffer as usual.

Sign up to request clarification or add additional context in comments.

1 Comment

But here you suggest me to read every time 50K and on this little buffer to do the search? And the second, third ... times, I'll have to do more searches, I will do all the 50K load until I get back to the end of the file (50MB)?
1

My answers for point (1):

It depends on what you want to do with the file. If your processing doesn't involve rewind operation (looking what was read behind/before), it's best to just read as a stream and process it in one go (instead of loading all into memory).

Even if you need random access across the file, you may also be interested in doing block file operation, because your solution may not scale well when the file size change to bigger size. RandomAccessFile if you are on Java 1.4 or above. For random access, the operating system usually handles the file buffer caching quite well you don't have to handle yourself.

2 Comments

I get an expression (string that can contain a few words) and I need to return an answer if the expression is in the text file or not. And I have to repeat this operation several times on different expressions but on the same text file.
Now your requirement sounds to me like a string searching and string matching. It doesn't have to repeat if you do some preprocessing. I would suggest you to look at string searching algorithm such as Knuth-Morris-Pratt (en.wikipedia.org/wiki/…)
1

It is important to read the whole error, not just the message. Often the real information is in the exception's name not the text associated with it.

You will get an error if the file is empty as there is no first byte.

Note: the approach you are using assumes ASCII 7-bit characters. If you want to assume ISO-8859-1 characters you can use (char) (byteBuffer.get() & 0xFF)

However, if you have plan text you may find that using strings is simpler to use and not much slower. e.g. you can read a 50 MB file as text in less than a second. I would only use a memory mapped file if this is far too long.

Comments

0

I would suggest to use BufferedReader. It is much faster and requires relatively less resources. First read number of lines:

InputStream is = new BufferedInputStream(new FileInputStream(filename));
byte[] chars = new byte[1024];
int numberOfChars = 0;
while ((numberOfChars = is.read(chars)) != -1) 
{
    for (int i = 0; i < numberOfChars; ++i) 
    {
        if (chars[i] == '\n' && numberOfChars - i != 1)
        {
            ++count;
        }           
    }
}
count++
return count; // number of lines

Then read the lines:

BufferedReader in = new BufferedReader(new FileReader(fileName));
for (int i = 0; i < endLine; i++) 
{
    String oneLine = in.readLine();
}

In this strings you can even do search for what you need.

2 Comments

But I need to search back on a few different expressions, it really is better that any time I go through the whole file again?
Not a good idea. If my String is in the end of the 50MB file, so this takes a long time.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.