24

I have a big file, it's expected to be around 12 GB. I want to load it all into memory on a beefy 64-bit machine with 16 GB RAM, but I think Java does not support byte arrays that big:

File f = new File(file);
long size = f.length();
byte data[] = new byte[size]; // <- does not compile, not even on 64bit JVM

Is it possible with Java?

The compile error from the Eclipse compiler is:

Type mismatch: cannot convert from long to int

javac gives:

possible loss of precision
found   : long
required: int
         byte data[] = new byte[size];
5
  • 5
    Just curious: Why do you need to keep that much data in memory at the same time? Wouldn't it be possible to split that into chunks? Commented May 18, 2009 at 15:39
  • 1
    +1 to bruno's comment. The only way that having the entire file in memory will be a benefit is if you need to make random accesses into different points of the file, and in this case you'd almost certainly be better parsing it into a more computable representation Commented May 18, 2009 at 15:58
  • I am going to try to use a prefix tree (trie) to keep the data, this may shrink it enough to fit into 2gb of memory. Commented May 19, 2009 at 3:08
  • possible duplicate of converting 'int' to 'long' or accessing too long array with 'long' Commented Feb 8, 2013 at 15:54
  • Whow. Very frustated. Java must solve this in next 5 years. Commented Feb 5, 2017 at 11:14

11 Answers 11

22

Java array indices are of type int (4 bytes or 32 bits), so I'm afraid you're limited to 231 − 1 or 2147483647 slots in your array. I'd read the data into another data structure, like a 2D array.

Sign up to request clarification or add additional context in comments.

2 Comments

@OmryYadan, The real limit will actually be less than 2147483647.
you mean MAX_INT - 8 ? github.com/omry/banana/blob/…
15
package com.deans.rtl.util;

import java.io.FileInputStream;
import java.io.IOException;

/**
 * 
 * @author [email protected]
 *
 * Written to work with byte arrays requiring address space larger than 32 bits. 
 * 
 */

public class ByteArray64 {

    private final long CHUNK_SIZE = 1024*1024*1024; //1GiB

    long size;
    byte [][] data;

    public ByteArray64( long size ) {
        this.size = size;
        if( size == 0 ) {
            data = null;
        } else {
            int chunks = (int)(size/CHUNK_SIZE);
            int remainder = (int)(size - ((long)chunks)*CHUNK_SIZE);
            data = new byte[chunks+(remainder==0?0:1)][];
            for( int idx=chunks; --idx>=0; ) {
                data[idx] = new byte[(int)CHUNK_SIZE];
            }
            if( remainder != 0 ) {
                data[chunks] = new byte[remainder];
            }
        }
    }
    public byte get( long index ) {
        if( index<0 || index>=size ) {
            throw new IndexOutOfBoundsException("Error attempting to access data element "+index+".  Array is "+size+" elements long.");
        }
        int chunk = (int)(index/CHUNK_SIZE);
        int offset = (int)(index - (((long)chunk)*CHUNK_SIZE));
        return data[chunk][offset];
    }
    public void set( long index, byte b ) {
        if( index<0 || index>=size ) {
            throw new IndexOutOfBoundsException("Error attempting to access data element "+index+".  Array is "+size+" elements long.");
        }
        int chunk = (int)(index/CHUNK_SIZE);
        int offset = (int)(index - (((long)chunk)*CHUNK_SIZE));
        data[chunk][offset] = b;
    }
    /**
     * Simulates a single read which fills the entire array via several smaller reads.
     * 
     * @param fileInputStream
     * @throws IOException
     */
    public void read( FileInputStream fileInputStream ) throws IOException {
        if( size == 0 ) {
            return;
        }
        for( int idx=0; idx<data.length; idx++ ) {
            if( fileInputStream.read( data[idx] ) != data[idx].length ) {
                throw new IOException("short read");
            }
        }
    }
    public long size() {
        return size;
    }
}
}

2 Comments

A good idea to implement your own ByteArray for solving this case. If it wasn't for your answer I probably wouldn't have thought of doing so.
Anybody care to add an update(byte[] b, int start, int size) method? :)
7

If necessary, you can load the data into an array of arrays, which will give you a maximum of int.maxValue squared bytes, more than even the beefiest machine would hold well in memory.

2 Comments

that would be my next step. since I intend to do a binary search on the data, it will uglify the code, but I`m afraid there is no choice.
You could make a class that manages an array of arrays but provides an abstraction similar to a regular array, e.g, with get and set that take a long index.
4

You might consider using FileChannel and MappedByteBuffer to memory map the file,

FileChannel fCh = new RandomAccessFile(file,"rw").getChannel();
long size = fCh.size();
ByteBuffer map = fCh.map(FileChannel.MapMode.READ_WRITE, 0, fileSize);

Edit:

Ok, I'm an idiot it looks like ByteBuffer only takes a 32-bit index as well which is odd since the size parameter to FileChannel.map is a long... But if you decide to break up the file into multiple 2Gb chunks for loading I'd still recommend memory mapped IO as there can be pretty large performance benefits. You're basically moving all IO responsibility to the OS kernel.

3 Comments

I also hit the same limitation of ByteBuffer which I think should be able to deal with long offsets and indexes at least at interface level. Concrete implementation should check ranges explicitly. Unfortunately it is not possible to map more then 2GB file into memory.
Upvote as this is the right way to go, even if you have to partition the data into 2G chunks - wrap the chunks in a class which indexes with a long if you like.
MappedByteBuffer is also capped at 2GB, practically useless. See nyeggen.com/post/… for a solution which calls internal JNI methods to workaround this.
2

I suggest you define some "block" objects, each of which holds (say) 1Gb in an array, then make an array of those.

Comments

2

No, arrays are indexed by ints (except some versions of JavaCard that use shorts). You will need to slice it up into smaller arrays, probably wrapping in a type that gives you get(long), set(long,byte), etc. With sections of data that large, you might want to map the file use java.nio.

Comments

2

don't limit your self with Integer.MAX_VALUE

although this question has been asked many years ago, but a i wanted to participate with a simple example using only java se without any external libraries

at first let's say it's theoretically impossible but practically possible

a new look : if the array is an object of elements what about having an object that is array of arrays

here's the example

import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.List;

/**
*
* @author Anosa
*/
 public class BigArray<t>{

private final static int ARRAY_LENGTH = 1000000;

public final long length;
private List<t[]> arrays;

public BigArray(long length, Class<t> glasss)
{
    this.length = length;
    arrays = new ArrayList<>();
    setupInnerArrays(glasss);

}

private void setupInnerArrays(Class<t> glasss)
{
    long numberOfArrays = length / ARRAY_LENGTH;
    long remender = length % ARRAY_LENGTH;
    /*
        we can use java 8 lambdas and streams:
        LongStream.range(0, numberOfArrays).
                        forEach(i ->
                        {
                            arrays.add((t[]) Array.newInstance(glasss, ARRAY_LENGTH));
                        });
     */

    for (int i = 0; i < numberOfArrays; i++)
    {
        arrays.add((t[]) Array.newInstance(glasss, ARRAY_LENGTH));
    }
    if (remender > 0)
    {
        //the remainer will 100% be less than the [ARRAY_LENGTH which is int ] so
        //no worries of casting (:
        arrays.add((t[]) Array.newInstance(glasss, (int) remender));
    }
}

public void put(t value, long index)
{
    if (index >= length || index < 0)
    {
        throw new IndexOutOfBoundsException("out of the reange of the array, your index must be in this range [0, " + length + "]");
    }
    int indexOfArray = (int) (index / ARRAY_LENGTH);
    int indexInArray = (int) (index - (indexOfArray * ARRAY_LENGTH));
    arrays.get(indexOfArray)[indexInArray] = value;

}

public t get(long index)
{
    if (index >= length || index < 0)
    {
        throw new IndexOutOfBoundsException("out of the reange of the array, your index must be in this range [0, " + length + "]");
    }
    int indexOfArray = (int) (index / ARRAY_LENGTH);
    int indexInArray = (int) (index - (indexOfArray * ARRAY_LENGTH));
    return arrays.get(indexOfArray)[indexInArray];
}

}

and here's the test

public static void main(String[] args)
{
    long length = 60085147514l;
    BigArray<String> array = new BigArray<>(length, String.class);
    array.put("peace be upon you", 1);
    array.put("yes it worj", 1755);
    String text = array.get(1755);
    System.out.println(text + "  i am a string comming from an array ");

}

this code is only limited by only Long.MAX_VALUE and Java heap but you can exceed it as you want (I made it 3800 MB)

i hope this is useful and provide a simple answer

2 Comments

since then I wrote Banana : github.com/omry/banana , a lib that lets you do that among other things.
@OmryYadan good work i have a look on some examples good bro (:-
1

Java arrays use integers for their indices. As a result, the maximum array size is Integer.MAX_VALUE.

(Unfortunately, I can't find any proof from Sun themselves about this, but there are plenty of discussions on their forums about it already.)

I think the best solution you could do in the meantime would be to make a 2D array, i.e.:

byte[][] data;

Comments

1

java doesn't support direct array with more than 2^32 elements presently,

hope to see this feature of java in future

1 Comment

No, the limit is 2^31 − 1 elements. And your second line does not cite any references.
1

As others have said, all Java arrays of all types are indexed by int, and so can be of max size 231 − 1, or 2147483647 elements (~2 billion). This is specified by the Java Language Specification so switching to another operating system or Java Virtual Machine won't help.

If you wanted to write a class to overcome this as suggested above you could, which could use an array of arrays (for a lot of flexibility) or change types (a long is 8 bytes so a long[] can be 8 times bigger than a byte[]).

Comments

1

I think the idea of memory-mapping the file (using the CPU's virtual memory hardware) is the right approach. Except that MappedByteBuffer has the same limitation of 2Gb as native arrays. This guy claims to have solved the problem with a pretty simple alternative to MappedByteBuffer:

http://nyeggen.com/post/2014-05-18-memory-mapping-%3E2gb-of-data-in-java/

https://gist.github.com/bnyeggen/c679a5ea6a68503ed19f#file-mmapper-java

Unfortunately the JVM crashes when you read beyond 500Mb.

1 Comment

While in this specific example my use case was to read a file, this is not the only use case for large arrays.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.