Since all numbers are within one line, the BufferedReader approach does not work or scale well. The complete file will be read into memory. Therefore the streaming approach (e.g. from @whbogado) is indeed the way to go.
StreamTokenizer tokenizer = new StreamTokenizer(new FileReader("bigfile.txt"));
tokenizer.parseNumbers(); // default behaviour
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {
numbers.add((int)Math.round(tokenizer.nval));
}
}
As you are writing, that you are getting a heap space error as well, I assume, that it is not a problem with the streaming anymore. Unfortunately you are storing all values within a List. I think that is the problem now. You say in a comment, that you do not know the actual count of numbers. Hence you should avoid to store those in a list and do here as well some kind of streaming.
For all who are interested, here is my little testcode (java 8) that does produce a testfile of the needed size USED_INT_VALUES. I limited it for now to 5 000 000 integers. As you can see running it, the memory increases steadily while reading through the file. The only place that holds that much memory is the numbers List.
Be aware that initializing an ArrayList with an initial capacity does not allocate the memory the stored objects need, in your case your Integers.
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.StreamTokenizer;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.logging.Level;
import java.util.logging.Logger;
public class TestBigFiles {
public static void main(String args[]) throws IOException {
heapStatistics("program start");
final int USED_INT_VALUES = 5000000;
File tempFile = File.createTempFile("testdata_big_50m", ".txt");
System.out.println("using file " + tempFile.getAbsolutePath());
tempFile.deleteOnExit();
Random rand = new Random();
FileWriter writer = new FileWriter(tempFile);
rand.ints(USED_INT_VALUES).forEach(i -> {
try {
writer.write(i + " ");
} catch (IOException ex) {
Logger.getLogger(TestBigFiles.class.getName()).log(Level.SEVERE, null, ex);
}
});
writer.close();
heapStatistics("large file generated - size=" + tempFile.length() + "Bytes");
List<Integer> numbers = new ArrayList<>(USED_INT_VALUES);
heapStatistics("large array allocated (to avoid array copy)");
int c = 0;
try (FileReader fileReader = new FileReader(tempFile);) {
StreamTokenizer tokenizer = new StreamTokenizer(fileReader);
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {
numbers.add((int) tokenizer.nval);
c++;
}
if (c % 100000 == 0) {
heapStatistics("within loop count " + c);
}
}
}
heapStatistics("large file parsed nummer list size is " + numbers.size());
}
private static void heapStatistics(String message) {
int MEGABYTE = 1024 * 1024;
//clean up unused stuff
System.gc();
Runtime runtime = Runtime.getRuntime();
System.out.println("##### " + message + " #####");
System.out.println("Used Memory:" + (runtime.totalMemory() - runtime.freeMemory()) / MEGABYTE + "MB"
+ " Free Memory:" + runtime.freeMemory() / MEGABYTE + "MB"
+ " Total Memory:" + runtime.totalMemory() / MEGABYTE + "MB"
+ " Max Memory:" + runtime.maxMemory() / MEGABYTE + "MB");
}
}