10
\$\begingroup\$

This code is part of a growing "primitive" tools collection here on github

Use Case

Consider:

IntArray ia = new IntArray();
ia.set(300, 5);
System.out.printf("Values at 10, 300, and 3000 are: %d %d and %d\n",
        ia.get(10), ia.get(300), ia.get(3000));

This gives:

Values at 10, 300, and 3000 are: 0 5 and 0

Primary concerns are:

  1. correctness for all use cases
  2. efficiency in memory
  3. performance

Additionally, I am interested in use-cases or usability factors that are of concern, or should be added.

Background

Arrays are a very convenient system for storing data based on a simple index. Their constraints though (in Java), are that:

  1. They are limited in size (to 231 - 1 members)
  2. once initialized, they cannot be expanded, only replaced. Replacing an array requires having both the old, and new array in memory at once, and all the data needs to be copied from the old to new location.
  3. they require consecutive spans of memory to be stored in.

Many of these constraints can be avoided or reduced if you store array data in a 2-dimensional structure. Each 'row' is a portion of the overall span. If each row has the same, fixed size, then the translation from the logical linear system, the matrix system, is:

  • row -> index / rowsize
  • col -> index % rowsize

Similarly, the translation from a row/column position to the logical index, is:

  • index -> row * rowsize + col

If you choose a fixed row size that is also a power of 2, then you can accomplish the same with bitwise operations. A row size of 256 elements, is implemented as:

  • row -> index >>> 8
  • col -> index & 255

similarly:

  • index -> (row << 8) | col

The following is a collection of static methods designed to help manage a system where you create a matrix to represent a logical single-dimensional array as a matrix instead. In addition, an implementation of a primitive int array that allows some structured and dynamic access to the data in the array.

ArrayOps

Tools for managing the structure of the 2D matrices described above.

package net.tuis.primutils;

/**
 * Common dynamic array tools used by various classes in this package.
 * @author rolf
 *
 */
class ArrayOps {


    /**
     * KVSHIFT, KVEXTENT, and KVMASK are used together when creating an
     * efficient 2D matrix to store values in based on indexes. Treat the matrix
     * as a collection of rows and columns. As your data grows, you add rows. To
     * store data at any location:
     * 
     * <pre>
     * int row = index &gt;&gt;&gt; KVSHIFT;
     * if (row &gt;= matrix.length) {
     *     matrix = Arrays.copyOf(matrix, extendSize(row);
     * }
     * if (matrix[row] == null) {
     *     matrix[row] = new SomeType[KVEXTENT];
     * }
     * matrix[row][index &amp; KVMASK] = value;
     * </pre>
     * 
     * Using this system allows data to grow in an efficient way, and with
     * limited memory overhead and garbage collection.
     * <p>
     * This IntKeyIndex, because of the way that it creates incrementing Index
     * values for key associations, allows you to use an efficient linear
     * storage system like the above to map arbitrary key values to linear
     * storage.
     */
    public static final int KVSHIFT = 8;
    /**
     * @see #KVSHIFT
     */
    public static final int KVEXTENT = 1 << KVSHIFT;
    /**
     * @see #KVSHIFT
     */
    public static final int KVMASK = KVEXTENT - 1;
    /**
     * Simple alias for Integer.MAX_VALUE;
     */
    public static final int MAX_SIZE = Integer.MAX_VALUE;

    /**
     * Return the number of rows required to contain a dataset of the given
     * size.
     * 
     * @param size
     *            the size of the data to contain.
     * @return the number of rows required to contain that size.
     */
    public static final int getRowsFor(final int size) {
        if (size <= 0) {
            return 0;
        }
        // for a size of that, the last item will be at index (size - 1).
        // what row would that last index be in?
        // we need 1 more than that.
        return 1 + ((size - 1) >> KVSHIFT);
    }

    /**
     * Identify which row an index would appear in a value matrix as described
     * by {@link #KVSHIFT}
     * 
     * @param index
     *            the index to get the row for.
     * @return the row in which that index would appear.
     */
    public static final int getMatrixRow(int index) {
        return index >>> KVSHIFT;
    }

    /**
     * Identify which column an index would appear in a value matrix as
     * described by {@link #KVSHIFT}
     * 
     * @param index
     *            the index to get the column for.
     * @return the column in which that index would appear.
     */
    public static final int getMatrixColumn(int index) {
        return index & KVMASK;
    }

    /**
     * A simple helper method that returns a new size to use for a growing
     * array. It checks for overflow conditions, and expands the size by
     * approximately 25% each time.
     * 
     * @param from
     *            the current size
     * @return the recommended new size, with a hard limit at @link
     *         {@link #MAX_SIZE}
     */
    public static final int extendSize(final int from) {
        if (from <= 0) {
            // some small number.
            return 8;
        }
        int ns = from + (from >>> 2) + 1;
        if (ns < from || ns > MAX_SIZE) {
            // overflow conditions.
            ns = MAX_SIZE;
        }
        if (ns == from) {
            // unable to extend
            throw new IllegalStateException("Unable to have more than " + MAX_SIZE + " values in the Map");
        }
        return ns;
    }

}

IntOps

Some small helper functions used for some common operations on Int primitives, are here. I expect this to grow over time.

package net.tuis.primutils;

import java.util.function.IntPredicate;
import java.util.function.IntUnaryOperator;

/**
 * Simple tools and resources for int manipulations. 
 * 
 * @author rolf
 *
 */
public class IntOps {

    /**
     * INCREMENT is a function that returns the input value, plus one.
     */
    public static final IntUnaryOperator INCREMENT = (i) -> i + 1;
    /**
     * DECREMENT is a function that returns the input value, minus one.
     */
    public static final IntUnaryOperator DECREMENT = (i) -> i - 1;

    /**
     * ISZERO is a simple zero-check.
     */
    public static final IntPredicate ISZERO = (i) -> i == 0;


    private IntOps() {
        // inaccessible constructor.
    }

}

IntArray

This is the primary use case of the code above.

package net.tuis.primutils;

import static net.tuis.primutils.ArrayOps.KVEXTENT;
import static net.tuis.primutils.ArrayOps.KVMASK;
import static net.tuis.primutils.ArrayOps.extendSize;
import static net.tuis.primutils.ArrayOps.getMatrixColumn;
import static net.tuis.primutils.ArrayOps.getMatrixRow;
import static net.tuis.primutils.ArrayOps.getRowsFor;

import java.util.Arrays;
import java.util.function.IntUnaryOperator;
import java.util.stream.IntStream;
import java.util.stream.Stream;

/**
 * A dynamic primitive int array that expands as needed.
 * 
 * @author rolf
 *
 */
public class IntArray {

    private int[][] data;
    private int hwm = -1; // high water mark

    /**
     * Create a dynamic array of int values with preinitialized capacity.
     * @param capacity the initial capacity to budget for.
     */
    public IntArray(int capacity) {
        data = buildIntMatrix(getRowsFor(capacity));
    }

    /**
     * Create a dynamic array of int values with default capacity.
     */
    public IntArray() {
        this(1);
    }

    /**
     * Stream all values in this array to the high-water-mark.
     * @return an IntStream which accesses, in order, all values.
     */
    public IntStream stream() {
        return IntStream.rangeClosed(0, hwm).map(index -> getValue(index));
    }

    /**
     * Identify the high-water-mark for this array, which is the largest index accessed
     * @return the high water mark.
     */
    public int getHighWaterMark() {
        return hwm;
    }

    private void accessed(int index) {
        if (index > hwm) {
            hwm = index;
        }
    }

    /**
     * Set the value at a particular index, to a particular value.
     * @param index the index to set
     * @param value the value to set at that index.
     * @return the previous value at that index. Note that all values are initialized to 0.
     */
    public int set(int index, int value) {
        accessed(index);
        return setValue(index, value);
    }

    /**
     * Get the value at a particular index.
     * @param index the index to get the value of.
     * @return the value at that index.
     */
    public int get(int index) {
        accessed(index);
        return getValue(index);
    }

    /**
     * Retrieve the value at the given index, and then apply the supplied operation to that value.
     * @param index the index of the value to operate on.
     * @param op the operation to perform.
     * @return the value before the operation was performed.
     */
    public int postApply(int index, IntUnaryOperator op) {
        accessed(index);
        final int row = getMatrixRow(index);
        final int col = getMatrixColumn(index);
        ensureRow(row);
        int[] line = data[row];
        final int rv = line[col];
        line[col] = op.applyAsInt(line[col]);
        return rv;
    }

    /**
     * Apply the supplied operation to the value at the given index, and then return the result.
     * @param index the index of the value to operate on.
     * @param op the operation to perform.
     * @return the value after the operation was performed.
     */
    public int preApply(int index, IntUnaryOperator op) {
        accessed(index);
        final int row = getMatrixRow(index);
        final int col = getMatrixColumn(index);
        ensureRow(row);
        int[] line = data[row];
        line[col] = op.applyAsInt(line[col]);
        return line[col];
    }

    /**
     * Increment the value at the given index, and return the value as it was before the increment
     * @param index the index of the value to increment.
     * @return the previous value.
     */
    public int postIncrement(int index) {
        return postApply(index, IntOps.INCREMENT);
    }

    /**
     * Increment the value at the given index, and return the value as it is after the increment
     * @param index the index of the value to increment.
     * @return the incremented value.
     */
    public int preIncrement(int index) {
        return preApply(index, IntOps.INCREMENT);
    }

    /**
     * Decrement the value at the given index, and return the value as it was before the decrement
     * @param index the index of the value to decrement.
     * @return the previous value.
     */
    public int postDecrement(int index) {
        return postApply(index, IntOps.DECREMENT);
    }

    /**
     * Decrement the value at the given index, and return the value as it is after the decrement
     * @param index the index of the value to decrement.
     * @return the decremented value.
     */
    public int preDecrement(int index) {
        return preApply(index, IntOps.DECREMENT);
    }

    private final int setValue(final int index, final int value) {
        if (index < 0) {
            throw new ArrayIndexOutOfBoundsException("No index " + index);
        }
        final int row = getMatrixRow(index);
        final int col = getMatrixColumn(index);
        ensureRow(row);
        final int old = data[row][col];
        data[row][col] = value;
        return old;
    }

    private final int getValue(final int index) {
        if (index < 0) {
            throw new ArrayIndexOutOfBoundsException("No index " + index);
        }
        final int r = getMatrixRow(index);
        if (r >= data.length) {
            return 0;
        }
        final int[] row = data[r];
        return row == null ? 0 : row[index & KVMASK];
    }

    private final void ensureRow(final int row) {
        if (row >= data.length) {
            data = Arrays.copyOf(data, extendSize(row));
        }
        if (data[row] == null) {
            data[row] = buildIntRow();
        }
    }

    private static final int[][] buildIntMatrix(int rows) {
        return new int[Math.max(1, rows)][];
    }

    private static final int[] buildIntRow() {
        return new int[KVEXTENT];
    }

    @Override
    public String toString() {
        return String.format("IntArray(hwm: %d, alloc: %d)", hwm, Stream.of(data).mapToInt(row -> row == null ? 0 : row.length).sum());
    }

    @Override
    public int hashCode() {
        // because of the convenient row lengths, we can do:
        int hash = 0;
        for (final int[] row : data) {
            if (row == null) {
                continue;
            }
            int rh = 0;
            for (final int v : row) {
                Integer.rotateLeft(rh, 13);
                rh ^= v;
            }
            hash ^= rh;
        }
        return hash;
    }

    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof IntArray)) {
            return false;
        }
        if (obj == this) {
            return true;
        }
        final IntArray them = (IntArray)obj;
        if (them.hwm != this.hwm) {
            return false;
        }
        final int limit = getMatrixRow(hwm);
        for (int r = 0; r <= limit; r++) {
            final int[] a = r < this.data.length ? this.data[r] : null;
            final int[] b = r < them.data.length ? them.data[r] : null;
            if (a == null && b == null) {
                continue;
            }
            if (a == null && !IntStream.of(b).allMatch(IntOps.ISZERO)) {
                return false;
            }
            if (b == null && !IntStream.of(a).allMatch(IntOps.ISZERO)) {
                return false;
            }
            if (!Arrays.equals(a, b)) {
                return false;
            }
        }
        return true;
    }

}

Unit tests for IntArray

May help understand the usage, and run the code.

package net.tuis.primutils;

import static org.junit.Assert.*;

import java.util.IntSummaryStatistics;

import org.junit.Test;

@SuppressWarnings("javadoc")
public class TestIntArray {

    @Test
    public void testIntArrayInt() {
        IntArray ia = new IntArray(100);
        assertEquals(-1, ia.getHighWaterMark());
    }

    @Test
    public void testIntArray() {
        IntArray ia = new IntArray();
        assertEquals(-1, ia.getHighWaterMark());
    }

    @Test
    public void testStream() {
        IntArray ia = new IntArray();
        ia.get(10000);
        IntSummaryStatistics stats = ia.stream().summaryStatistics();
        assertEquals(10001, stats.getCount());
        assertEquals(0, stats.getMax());
        assertEquals(0, stats.getMin());
    }

    @Test
    public void testGetHighWaterMark() {
        IntArray ia = new IntArray(100);
        assertEquals(-1, ia.getHighWaterMark());
        for (int i = 0; i < 10; i++) {
            ia.set(i, i);
            assertEquals(i, ia.getHighWaterMark());
        }
        for (int i = 5000; i >= 4000; i--) {
            ia.set(i, i);
            assertEquals(5000, ia.getHighWaterMark());
        }
    }

    @Test
    public void testSet() {
        IntArray ia = new IntArray(100);
        assertEquals(0, ia.set(0, 1));
        assertEquals(1, ia.set(0, 2));
        assertEquals(0, ia.set(10000, 1));
        assertEquals(1, ia.set(10000, 2));
    }

    @Test
    public void testGet() {
        IntArray ia = new IntArray(100);
        assertEquals(0, ia.get(0));
        assertEquals(0, ia.getHighWaterMark());

        assertEquals(0, ia.get(10000));
        assertEquals(10000, ia.getHighWaterMark());
        assertEquals(0, ia.set(10000, -1));
        assertEquals(-1, ia.get(10000));
    }

    @Test
    public void testPostApply() {
        IntArray ia = new IntArray(100);
        assertEquals(0, ia.postApply(0, i -> i + 10));
        assertEquals(10, ia.get(0));
        assertEquals(10, ia.postApply(0, i -> i + 10));
        assertEquals(20, ia.get(0));
    }

    @Test
    public void testPreApply() {
        IntArray ia = new IntArray(100);
        assertEquals(10, ia.preApply(0, i -> i + 10));
        assertEquals(10, ia.get(0));
        assertEquals(20, ia.preApply(0, i -> i + 10));
        assertEquals(20, ia.get(0));
    }

    @Test
    public void testPostIncrement() {
        IntArray ia = new IntArray(100);
        assertEquals(0, ia.postIncrement(0));
        assertEquals(1, ia.get(0));
        assertEquals(1, ia.postIncrement(0));
        assertEquals(2, ia.get(0));
    }

    @Test
    public void testPreIncrement() {
        IntArray ia = new IntArray(100);
        assertEquals(1, ia.preIncrement(0));
        assertEquals(1, ia.get(0));
        assertEquals(2, ia.preIncrement(0));
        assertEquals(2, ia.get(0));
    }

    @Test
    public void testPostDecrement() {
        IntArray ia = new IntArray(100);
        assertEquals(0, ia.postDecrement(0));
        assertEquals(-1, ia.get(0));
        assertEquals(-1, ia.postDecrement(0));
        assertEquals(-2, ia.get(0));
    }

    @Test
    public void testPreDecrement() {
        IntArray ia = new IntArray(100);
        assertEquals(-1, ia.preDecrement(0));
        assertEquals(-1, ia.get(0));
        assertEquals(-2, ia.preDecrement(0));
        assertEquals(-2, ia.get(0));
    }

}
\$\endgroup\$
3
  • 3
    \$\begingroup\$ Do you have any primary concerns about this? What would you like reviewers to focus on? API? Usability? Performance? Readability? Maintainability? Everything? \$\endgroup\$ Commented Mar 1, 2015 at 23:31
  • \$\begingroup\$ I think "Use Case" usually describe a scenario where something is useful, and it explains the purpose. That section above is more like a "Usage Example". So I would rephrase that, and also add some real example scenarios where this library will be useful (reduce overall complexity in some way). Disclamer: I'm not a native speaker so I could be all wrong. \$\endgroup\$ Commented Mar 2, 2015 at 9:29
  • \$\begingroup\$ @janos - the initial use case is presented here: Mapping arbitrary Int values to a linear index space \$\endgroup\$ Commented Mar 2, 2015 at 11:52

2 Answers 2

7
\$\begingroup\$

First of all, I think it's a great idea to make an array-like data structure that can grow dynamically with much less expensive array-copying as the simple traditional method: allocate new space -> copy from old space -> discard old space.

On top of the clever dynamic resizing, the IntArray does something more: it allows arbitrary accesses of elements at any index, quietly expanding the storage as necessary. Too quietly. Let me explain this in more detail.

I don't know of an array-like data structure in any language where I can set the n-th element without ensuring the structure has the capacity. Of course, just because something doesn't exist doesn't mean it's a bad idea.

Java does a "good" (??) job at making developers complacent about memory management. But allocating memory remains to be special (at least I hope so). You know to pay attention when you use the new keyword when creating an array, or when using methods that allocate memory.

Now, I wouldn't expect from a IntArray.setValue function to allocate memory. I find this behavior misleading, and outright dangerous. If I suspect memory issues in my program, then every single call to .setValue becomes a suspect, instead of a fewer set of the special places where I consciously allocate memory.

In the worst case, I might have a bug that calls .setValue with a high value by mistake. For a nasty bug, the value will be high enough to allocate far more memory than really needed, but not enough to be noticeable, at first. As the program grows, and I start reusing the component that uses IntArray.setValue with a bogus parameter, one day I realize I have mysterious memory issues, making the program inexplicably heavy.

.getValue is dangerous too. As it returns 0 in case I ask for a nonsensically high value by mistake, such forgiveness may cover up bugs that will end up biting me much later, and badly.

Conclusions

Allocating memory is a serious business. A .setValue method which can lead to accidental allocation of large amounts of memory is misleading and outright dangerous.

The boundary checks in the conventional array-like structures serve as a good early warning system. I will know immediately if I try to get/set values out of bound, and catch bugs earlier, when they are easier both to debug and to correct.

I suggest to cut out this unusual behavior from IntArray. Stick to the implementation of the clever logic that dynamic growth is inexpensive compared to a regular array. Make it usable exactly the same way as a regular array-like data structure. It will be a great structure like that.

If you need a structure where setValue and getValue behave this way, then create that as a separate class, and implement in terms of an IntArray (compose, if possible), and give it a name that somehow implies this unusual behavior. To make things even more clear, I would rename getValue and setValue something better. (Naming is tough, as always. Some ideas that might be not nearly good enough: getValueOrDefault and setValueAndEnsureStorage.)

\$\endgroup\$
1
  • \$\begingroup\$ Your points about rolfl's usage of setValue is excellent. This really explains why LibGDX chose to have one add method and one set method, where the set method throws an exception if the array is not big enough. \$\endgroup\$ Commented Mar 2, 2015 at 13:47
7
\$\begingroup\$

Naming

Okay this one is so easy that I just have to point it out.

private int hwm = -1; // high water mark

What have we said about shortening variable names like that only to add a comment about what the abbreviation means afterward?

private int highWaterMark = -1;

Right. That is what we have said.

But what is a high water mark really? I don't see your class having anything to do with measuring the height of water in a river.

Suggestion: Either name it highestAccessedIndex or add javadoc to it, explaining what you mean by "high water mark".

Copy

IntArray a = new IntArray();
Random random = new Random();
for (int i = 0; i < 1000; i++) {
    a.set(random.nextInt(1000), random.nextInt(1000));
}
a.set(420, 42);
a.set(41, 73);
a.set(41, 73);
// How do I copy this thing!?

Please provide a copy-constructor!

To be or not to be equal ?

Consider this code:

IntArray a = new IntArray();
IntArray b = new IntArray();

a.set(42, 10);
b.set(42, 10);
System.out.println(a.equals(b)); // Prints true
System.out.println(a.get(4000)); // Prints 0
System.out.println(a.equals(b)); // Prints false

It is a bit weird that calling a get method modifies the result of a.equals(b).

This is definitely not something I would expect. Additionally, the JavaDoc for get does not inform about this.

I think that the "high water mark" should not be considered in equals.

Luckily, you are not breaking the hashCode and equals contract here. Your hashCode method does not consider hwm. Did the hwm accidentally sneak into the equals method?

Usability

This is probably the worst news.

Your class seem to be really useful. However... as you are using java.util.streams there are certain platforms where this class is unusable so far. Such as GWT and Android

GWT support for Java 8 will be added in GWT 2.8, but Java 8 and Android does not go well together. Although there is a retrolambda gradle plugin, that does not support the Java 8 Stream API.

If you want to support more than Java 8, that is up to you.

How do I...?

The class is called IntArray (although it is resizable), but there are some features I'm not sure how to use that are available on other arrays / list classes..

  • Add a value at the end of the array, such as array.add(42)
  • Check if the array contains a value, array.contains(42)
  • Remove a value from the array, array.remove(42)
  • Add all values from an int[], array.addAll(new int[]{ 1, 2, 3, 4 })
  • Check the size of the array, array.size()

Also, what is the index position of your array good for? There is no way to check which indexes has a value and which does not have a value. The concept reminds me of SparseIntArray in Android which is similar to Map<Integer, Integer>, but with the limitation that your class does not provide a way to remove values, and does not allow negative key indexes.

Speed comparison

Using a particular benchmarking library, I found that your class performs well when compared to IntArray from LibGDX

Although your two classes provides quite different ways of doing things, the IntArray from LibGDX provides all the add, contains, remove methods, etc. but does not behave well when using sparse indexes.

public class SpeedTest {

    private static final int SIZE = 1000000;

    @Test
    public void test() {
        UBench bench = new UBench("Speed");
        bench.addIntTask("Rolfl", () -> rolfl(), i -> i == SIZE);
        bench.addIntTask("Libgdx", () -> gdx(), i -> i == SIZE);
        bench.report("Result", bench.press(1000));
    }

    private int gdx() {
        IntArray array = new IntArray();
        for (int i = 0; i < SIZE; i++) {
            array.add(i);
        }
        return SIZE;
    }

    private int rolfl() {
        net.tuis.primutils.IntArray array = new net.tuis.primutils.IntArray();
        for (int i = 0; i < SIZE; i++) {
            array.set(i, i);
        }
        return SIZE;
    }

}

Results:

Task Speed -> Rolfl: (Unit: MILLISECONDS)
  Count    :     1000      Average  :   3.4577
  Fastest  :   2.9710      Slowest  :  19.9286
  95Pctile :   4.6342      99Pctile :   5.6237
  TimeBlock : 4.153 3.642 3.421 3.742 3.237 3.238 3.262 3.279 3.308 3.297
  Histogram :   992     7     1

Task Speed -> Libgdx: (Unit: MILLISECONDS)
  Count    :     1000      Average  :   5.0694
  Fastest  :   4.0500      Slowest  :  14.9409
  95Pctile :   7.6653      99Pctile :   8.9765
  TimeBlock : 6.373 5.877 4.951 5.576 4.662 4.634 4.642 4.632 4.648 4.699
  Histogram :   968    32
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.