Array manipulation: add a value to each of the array elements between two given indices

Question

This is a Hackerrank problem: https://www.hackerrank.com/challenges/crush/problem

You are given a list of size \$N\$, initialized with zeroes. You have to perform \$M\$ operations on the list and output the maximum of final values of all the \$N\$ elements in the list. For every operation, you are given three integers \$a, b\$ and \$k\$ and you have to add value to all the elements ranging from index \$a\$ to \$b\$ (both inclusive).

Input Format

First line will contain two integers \$N\$ and \$M\$ separated by a single space. Next \$M\$ lines will contain three integers \$a, b\$ and \$k\$ separated by a single space. Numbers in list are numbered from \$1\$ to \$N\$.

Constraints

\$3 \leq N \leq 10^7\$

\$1\leq M \leq 2*10^5\$

\$1 \leq a \leq b \leq N\$

\$ 0 \leq k \leq 10^9\$

Output Format

A single line containing maximum value in the updated list.

Sample Input
5 3
1 2 100
2 5 100
3 4 100
Sample Output
200

My code:

def arrayManipulation(n, queries):
    nums = [0] * (n + 1)
    for q in queries:
        nums[q[0]-1] += q[2]
        nums[q[1]] -= q[2]
    current = 0
    max = 0
    for i in nums:
        current += i
        if current > max: max = current
    return max

Is there any way to optimize this?

I don't think your program would pass the test cases. You are not modifying nums for the range q[0] to q[1]. — hjpotter92
– hjpotter92, Commented Oct 22, 2020 at 1:09
One efficient solution looks something like order the sub intervals defined by the a, b’s. Then start at 0 and go to N and each point you can work out which intervals you have lost the contribution from and which you have gained. — Countingstuff
– Countingstuff, Commented Oct 22, 2020 at 2:03
@hjpotter92 He increments a by k and decrements b by k, then sums all the numbers and calculates the max on the way. The code is actually fine. — Marc
– Marc, Commented Oct 22, 2020 at 2:14
@Marc yes, you're right. Although, this approach should be explained in the question. — hjpotter92
– hjpotter92, Commented Oct 22, 2020 at 3:50

Marc · Accepted Answer · 2020-10-22 15:01:57Z

6

Nice implementation, it's already very efficient. Few suggestions:

Expand the variables in the for-loop from for q in queries to for a, b, k in queries. Given the problem description it's easier to read.
A better name for the variable current can be running_sum.
Avoid calling a variable max, since it's a built-in function in Python. An alternative name can be result.
If you change the name of the variable max then you can have result = max(result,running_sum).
As @hjpotter92 said, is better to add a description of your approach in the question, you will likely get more reviews. Few bullet points or some comments in the code is better than nothing.

Applying the suggestions:

def arrayManipulation(n, queries):
    nums = [0] * (n + 1)
    for a, b, k in queries:
        nums[a - 1] += k
        nums[b] -= k
    running_sum = 0
    result = 0
    for i in nums:
        running_sum += i
        result = max(result, running_sum)
    return result

It's already an efficient solution that runs in \$O(n+m)\$, so I wouldn't worry about performances. However, there is an alternative solution running in \$O(m*log(m))\$ in the Editorial of HackerRank.

I implemented it in Python:

def arrayManipulation(n, queries):
    indices = []
    for a, b, k in queries:
        indices.append((a, k))
        indices.append((b + 1, -k))
    indices.sort()
    running_sum = 0
    result = 0
    for _, k in indices:
        running_sum += k
        result = max(result, running_sum)
    return result

It's based on the fact that it's enough finding the running sum on the sorted indices.

FYI in the Editorial (or Discussion) section of HackerRank there are optimal solutions and detailed explanations.

Thanks to @superbrain for the corrections provided in the comments.

edited Oct 22, 2020 at 15:01

answered Oct 22, 2020 at 6:48

Marc

5,7342 gold badges15 silver badges35 bronze badges

1

\$\begingroup\$ The original is O(n+m). And apparently n is not large enough compared to m in order for O(m log m) to be better :-( \$\endgroup\$

superb rain
– superb rain

2020-10-22 13:10:12 +00:00
Commented Oct 22, 2020 at 13:10
1

\$\begingroup\$ @superbrain I agree about O(n+m), in this case is better to keep the term m even if smaller than n. But the results I am getting running your benchmark show that the sorting approach is still a bit faster, only with_accumulate is slightly better than "my" approach. Maybe a different environment? \$\endgroup\$

Marc
– Marc

2020-10-22 14:44:30 +00:00
Commented Oct 22, 2020 at 14:44
1

\$\begingroup\$ Yeah, what environment did you use? I added times with 32-bit Python to my answer now and that looks similar to yours. \$\endgroup\$

superb rain
– superb rain

2020-10-22 15:17:42 +00:00
Commented Oct 22, 2020 at 15:17
\$\begingroup\$ Windows Home 10 64-bit, Python 3.8.6. \$\endgroup\$

Marc
– Marc

2020-10-22 16:12:00 +00:00
Commented Oct 22, 2020 at 16:12
\$\begingroup\$ I suspect the most influential factor is the one you left out :-). The Python bit-version. I guess you used 32-bit? Until 3.8.6 I mainly used 32-bit Python as well, as that was the default offered for download. Now for 3.9.0, the default download appears to be 64-bit, so that's what I'm mainly using now. \$\endgroup\$

superb rain
– superb rain

2020-10-22 16:24:42 +00:00
Commented Oct 22, 2020 at 16:24

| Show 1 more comment

superb rain · Accepted Answer · 2020-10-23 16:37:29Z

List vs Python array vs NumPy array

To my surprise, my solution using Reinderien's suggestion to use a Python array was fastest in my benchmark in 64-bit Python (and not bad in 32-bit Python). Here I look into that.

Why was I surprised? Because I had always considered array to be rather pointless, like a "NumPy without operations". Sure, it provides compact storage of data, but I have plenty of memory, so I'm not very interested in that. More interested in speed. And whenever you do something with the array's elements, there's overhead from always converting between a Python int object (or whatever type you use in the array) and the array's fixed-size element data. Contrast that with NumPy, where you do operations like arr += 1 or arr1 += arr2 and NumPy speedily operates on all the array elements. And if you treat NumPy arrays like lists and work on them element-wise yourself, it's sloooow. I thought Python arrays are similarly slower at that, and the are, but a lot less so:

                          |   a[0]     a[0] += 1
--------------------------+---------------------
a = [0]                   |   27 ns     67 ns
a = array('q', [0])       |   35 ns    124 ns
a = np.zeros(1, np.int64) |  132 ns    504 ns

Accessing a list element or incrementing it is by far the fastest with a list, and by faaar the slowest with a NumPy array.

Let's add a (bad) NumPy version to the mix, where I badly use a NumPy array instead of a list or a Python array:

def bad_numpy(n, queries):
    nums = np.zeros(n + 1, np.int64)
    for a, b, k in queries:
        nums[a - 1] += k
        nums[b] -= k
    return max(accumulate(nums))

Times with my worst case benchmark:

python_list     565 ms   576 ms   577 ms
python_array    503 ms   514 ms   517 ms
numpy_array    2094 ms  2124 ms  2171 ms

So the bad NumPy usage is far slower, as expected.

The solution has three steps: Initialization of the list/array, the loop processing the queries, and accumulating/maxing. Let's measure them separately to see where each version spends how much time.

Initialization

I took out everything after the nums = ... line and measured again:

python_list      52 ms    52 ms    55 ms
python_array     30 ms    31 ms    32 ms
numpy_array       0 ms     0 ms     0 ms

The list is slowest and NumPy is unbelievably fast. Actually 0.016 ms, for an array of ten million int64s, which is 5000 GB/s. I think it must be cheating somehow. Anyway, we see that the array solutions get a head start in the benchmark due to faster initialization.

The list [0] * (n + 1) gets initialized like this, copying the 0 again and again and incrementing its reference count again and again:

for (i = 0; i < n; i++) {
    items[i] = elem;
    Py_INCREF(elem);
}

The Python array repeats faster, using memcpy to repeatedly double the elements (1 copy => 2 copies, 4 copies, 8 copies, 16 copies, etc)

Py_ssize_t done = oldbytes;
memcpy(np->ob_item, a->ob_item, oldbytes);
while (done < newbytes) {
    Py_ssize_t ncopy = (done <= newbytes-done) ? done : newbytes-done;
    memcpy(np->ob_item+done, np->ob_item, ncopy);
    done += ncopy;
}

After seeing this, I'm actually surprised the Python array isn't much faster than the list.

Processing the queries

Times for the loop processing the queries:

python_list     122 ms   125 ms   121 ms
python_array     96 ms    99 ms    95 ms
numpy_array     303 ms   307 ms   305 ms

What? But earlier we saw that the Python array is faster at processing elements! Well, but that was for a[0], i.e., always accessing/incrementing the same element. But with the worst-case data, it's random access, and the array solutions are apparently better with that. If I change the indexes from randint(1, n) to randint(1, 100), the picture looks different:

python_list      35 ms    43 ms    47 ms
python_array     77 ms    72 ms    72 ms
numpy_array     217 ms   225 ms   211 ms

Not quite sure yet why, as all three containers use 80 Mb of continuous memory, so that should be equally cache-friendly. So I think it's about the int objects that get created with += k and -= k and that they stay alive in the list but not in the arrays.

Anyway, with the worst case data, the Python array increases its lead, and the NumPy array falls from first to last place. Total times for initialization and query-processing:

python_list     174 ms   177 ms   176 ms
python_array    126 ms   130 ms   127 ms
numpy_array     303 ms   307 ms   305 ms

Accumulate and max

Times for max(accumulate(nums)):

python_list     391 ms   399 ms   401 ms
python_array    377 ms   384 ms   390 ms
numpy_array    1791 ms  1817 ms  1866 ms

So this part actually takes the longest, for all three versions. Of course in reality, in NumPy I'd use nums.cumsum().max(), which takes about 50 ms here.

Summary, moral of the story

Why is the Python array faster than the Python list in the benchmark?

Initialization: Because the array's initialization is less work.
Processing the queries: I think because the list keeps a lot of int objects alive and that's costly somehow.
Accumulate/max: I think because iterating the list involves accessing all the different int objects in random order, i.e., randomly accessing memory, which is not that cache-friendly.

What I take away from this all is that misusing NumPy arrays as lists is indeed a bad idea, but that using Python arrays is not equally bad but can in fact not only use less memory but also be faster than lists. While the conversion between objects and array entries does take extra time, other effects can more than make up for that lost time. That said, keep in mind that the array version was slower in my 32-bit Python benchmark and slower in query processing in 64-bit Python when I changed the test data to use smaller/fewer indexes. So it really depends on the problem. But using an array can be faster than using a list.

Very nice sleuthing ! Would be interesting to see how bytearray performs on these operations. I know it can only be used to store bytes so it does not have the same functionality as other options here. But it is still something similar and can be an option for small integers. — GZ0
– GZ0, Commented Oct 24, 2020 at 15:27

superb rain · Accepted Answer · 2020-10-22 15:32:47Z

You could use itertools.accumulate to shorten your second part a lot and make it faster:

def arrayManipulation(n, queries):
    nums = [0] * (n + 1)
    for a, b, k in queries:
        nums[a - 1] += k
        nums[b] -= k
    return max(accumulate(nums))

Can be used on Marc's version as well. Benchmarks with various solutions on three worst case inputs:

CPython 3.9.0 64-bit on Windows 10 Pro 2004 64-bit:

       original   798 ms   787 ms   795 ms
       with_abk   785 ms   790 ms   807 ms
with_accumulate   581 ms   581 ms   596 ms
           Marc   736 ms   737 ms   736 ms
    optimized_1   698 ms   702 ms   698 ms
    optimized_2   696 ms   694 ms   690 ms
    optimized_3   692 ms   683 ms   684 ms
     Reinderien   516 ms   512 ms   511 ms

CPython 3.9.0 32-bit on Windows 10 Pro 2004 64-bit:

       original  1200 ms  1229 ms  1259 ms
       with_abk  1167 ms  1203 ms  1174 ms
with_accumulate   939 ms   937 ms   934 ms
           Marc   922 ms   927 ms   923 ms
    optimized_1   865 ms   868 ms   869 ms
    optimized_2   863 ms   863 ms   868 ms
    optimized_3   851 ms   847 ms   842 ms
     Reinderien   979 ms   959 ms   983 ms

Code:

from timeit import repeat
from random import randint
from itertools import accumulate
from array import array

def original(n, queries):
    nums = [0] * (n + 1)
    for q in queries:
        nums[q[0]-1] += q[2]
        nums[q[1]] -= q[2]
    current = 0
    max = 0
    for i in nums:
        current += i
        if current > max: max = current
    return max

def with_abk(n, queries):
    nums = [0] * (n + 1)
    for a, b, k in queries:
        nums[a - 1] += k
        nums[b] -= k
    current = 0
    max = 0
    for i in nums:
        current += i
        if current > max: max = current
    return max

def with_accumulate(n, queries):
    nums = [0] * (n + 1)
    for a, b, k in queries:
        nums[a - 1] += k
        nums[b] -= k
    return max(accumulate(nums))

def Marc(n, queries):
    indices = []
    for a, b, k in queries:
        indices.append((a, k))
        indices.append((b + 1, -k))
    indices.sort()
    running_sum = 0
    result = 0
    for _, k in indices:
        running_sum += k
        result = max(result, running_sum)
    return result

def optimized_1(n, queries):
    changes = []
    for a, b, k in queries:
        changes.append((a, k))
        changes.append((b + 1, -k))
    changes.sort()
    return max(accumulate(k for _, k in changes))

def optimized_2(n, queries):
    changes = []
    append = changes.append
    for a, b, k in queries:
        append((a, k))
        append((b + 1, -k))
    changes.sort()
    return max(accumulate(k for _, k in changes))

def optimized_3(n, queries):
    changes = [(a, k) for a, _, k in queries]
    changes += [(b + 1, -k) for _, b, k in queries]
    changes.sort()
    return max(accumulate(k for _, k in changes))

def Reinderien(n, queries):
    nums = array('q', [0]) * (n + 1)
    for a, b, k in queries:
        nums[a - 1] += k
        nums[b] -= k
    return max(accumulate(nums))


funcs = original, with_abk, with_accumulate, Marc, optimized_1, optimized_2, optimized_3, Reinderien
names = [func.__name__ for func in funcs]

def worst_case():
    n = 10**7
    m = 2 * 10**5
    queries = [sorted([randint(1, n), randint(1, n)]) + [randint(0, 10**9)]
               for _ in range(m)]
    return n, queries

# Check correctness
n, queries = worst_case()
expect = funcs[0](n, queries)
for func in funcs[1:]:
    print(func(n, queries) == expect, func.__name__)

# Benchmark
tss = [[] for _ in funcs]
for _ in range(3):
    n, queries = worst_case()
    for func, ts in zip(funcs, tss):
        t = min(repeat(lambda: func(n, queries), number=1))
        ts.append(t)
    print()
    for name, ts in zip(names, tss):
        print(name.rjust(max(map(len, names))),
              *(' %4d ms' % (t * 1000) for t in ts))

optimized_3 could be changed to use a generator expression on the right-hand side of += to save some memory and potentially also some time. BTW, another approach is changes = sorted(itertools.chain(*(((a, k), (b + 1, -k)) for a, b, k in queries))). — GZ0
– GZ0, Commented Oct 24, 2020 at 16:11
@GZ0 Yeah I guess with a generator expression it would take less memory, but I wasn't concerned with that here. Tried now anyway, appears to be very slightly slower or equally fast. The chain one is significantly slower. Loop with changes += (a, k), (b+1, -k) seems slightly faster. — superb rain
– superb rain, Commented Oct 24, 2020 at 18:47

Reinderien · Accepted Answer · 2020-10-22 14:12:54Z

2

Given that your array is a simple list of uniform type, you might see some small benefit in switching to https://docs.python.org/3.8/library/array.html , which is built specifically for this kind of thing. It's a compromise that uses built-ins without needing to install Numpy.

answered Oct 22, 2020 at 14:12

Reinderien

71.1k5 gold badges76 silver badges256 bronze badges

\$\begingroup\$ What kind of benefit? \$\endgroup\$

superb rain
– superb rain

2020-10-22 15:11:11 +00:00
Commented Oct 22, 2020 at 15:11
\$\begingroup\$ @superbrain At the very least, the one that the array module itself claims - compactness. \$\endgroup\$

Reinderien
– Reinderien

2020-10-22 15:13:27 +00:00
Commented Oct 22, 2020 at 15:13
4

\$\begingroup\$ Well that one is clear, so I thought you wouldn't have said "might" for that :-). And I thought converting between int objects and fixed-size array elements all the time would be slower, but apparently it can even be faster. I'm pleasantly surprised and wondering how it's fast. Added that to my answer, let me know if I did it suboptimally. \$\endgroup\$

superb rain
– superb rain

2020-10-22 15:40:24 +00:00
Commented Oct 22, 2020 at 15:40
1

\$\begingroup\$ Looked a bit into that now, how array is fast. \$\endgroup\$

superb rain
– superb rain

2020-10-23 16:04:11 +00:00
Commented Oct 23, 2020 at 16:04

Add a comment |

l0b0 · Accepted Answer · 2020-10-22 18:36:41Z

2

I don't know of any way to optimize this; I suspect you've cracked the way it was intended to be implemented. The following are just general recommendations.

Using black to format the code will make it closer to idiomatic style, with no manual work.

After formatting I would recommend running flake8 to find remaining non-idiomatic code. For example, function names should be written in snake_case.

In Python 3.8 onwards you can use the walrus operator to change the last conditional to if (current := current + i) > max:. Not sure if that's a good idea though; I find that syntax clunky.

edited Oct 22, 2020 at 18:36

answered Oct 22, 2020 at 6:45

l0b0

9,11722 silver badges36 bronze badges

\$\begingroup\$ Oops, copy/paste error. Thanks, @superbrain \$\endgroup\$

l0b0
– l0b0

2020-10-22 18:37:02 +00:00
Commented Oct 22, 2020 at 18:37
\$\begingroup\$ Ah, that's what I thought :-) I had even googled "make it closer to idiomatic style" to see whether you reused that sentence from an earlier answer, but there were no results. \$\endgroup\$

superb rain
– superb rain

2020-10-23 03:28:35 +00:00
Commented Oct 23, 2020 at 3:28

Add a comment |

Stack Exchange Network

Array manipulation: add a value to each of the array elements between two given indices

5 Answers 5

List vs Python array vs NumPy array

Initialization

Processing the queries

Accumulate and max

Summary, moral of the story

You must log in to answer this question.

Linked

Hot Network Questions

Array manipulation: add a value to each of the array elements between two given indices

5 Answers 5

List vs Python array vs NumPy array

Initialization

Processing the queries

Accumulate and max

Summary, moral of the story

You must log in to answer this question.

Linked

Related

Hot Network Questions