List vs Python array vs NumPy array
To my surprise, my solution using Reinderien's suggestion to use a Python array was fastest in my benchmark in 64-bit Python (and not bad in 32-bit Python). Here I look into that.
Why was I surprised? Because I had always considered array to be rather pointless, like a "NumPy without operations". Sure, it provides compact storage of data, but I have plenty of memory, so I'm not very interested in that. More interested in speed. And whenever you do something with the array's elements, there's overhead from always converting between a Python int object (or whatever type you use in the array) and the array's fixed-size element data. Contrast that with NumPy, where you do operations like arr += 1 or arr1 += arr2 and NumPy speedily operates on all the array elements. And if you treat NumPy arrays like lists and work on them element-wise yourself, it's sloooow. I thought Python arrays are similarly slower at that, and the are, but a lot less so:
                          |   a[0]     a[0] += 1
--------------------------+---------------------
a = [0]                   |   27 ns     67 ns
a = array('q', [0])       |   35 ns    124 ns
a = np.zeros(1, np.int64) |  132 ns    504 ns
Accessing a list element or incrementing it is by far the fastest with a list, and by faaar the slowest with a NumPy array.
Let's add a (bad) NumPy version to the mix, where I badly use a NumPy array instead of a list or a Python array:
def bad_numpy(n, queries):
    nums = np.zeros(n + 1, np.int64)
    for a, b, k in queries:
        nums[a - 1] += k
        nums[b] -= k
    return max(accumulate(nums))
Times with my worst case benchmark:
python_list     565 ms   576 ms   577 ms
python_array    503 ms   514 ms   517 ms
numpy_array    2094 ms  2124 ms  2171 ms
So the bad NumPy usage is far slower, as expected.
The solution has three steps: Initialization of the list/array, the loop processing the queries, and accumulating/maxing. Let's measure them separately to see where each version spends how much time.
Initialization
I took out everything after the nums = ... line and measured again:
python_list      52 ms    52 ms    55 ms
python_array     30 ms    31 ms    32 ms
numpy_array       0 ms     0 ms     0 ms
The list is slowest and NumPy is unbelievably fast. Actually 0.016 ms, for an array of ten million int64s, which is 5000 GB/s. I think it must be cheating somehow. Anyway, we see that the array solutions get a head start in the benchmark due to faster initialization.
The list [0] * (n + 1) gets initialized like this, copying the 0 again and again and incrementing its reference count again and again:
for (i = 0; i < n; i++) {
    items[i] = elem;
    Py_INCREF(elem);
}
The Python array repeats faster, using memcpy to repeatedly double the elements (1 copy => 2 copies, 4 copies, 8 copies, 16 copies, etc)
Py_ssize_t done = oldbytes;
memcpy(np->ob_item, a->ob_item, oldbytes);
while (done < newbytes) {
    Py_ssize_t ncopy = (done <= newbytes-done) ? done : newbytes-done;
    memcpy(np->ob_item+done, np->ob_item, ncopy);
    done += ncopy;
}
After seeing this, I'm actually surprised the Python array isn't much faster than the list.
Processing the queries
Times for the loop processing the queries:
python_list     122 ms   125 ms   121 ms
python_array     96 ms    99 ms    95 ms
numpy_array     303 ms   307 ms   305 ms
What? But earlier we saw that the Python array is faster at processing elements! Well, but that was for a[0], i.e., always accessing/incrementing the same element. But with the worst-case data, it's random access, and the array solutions are apparently better with that. If I change the indexes from randint(1, n) to randint(1, 100), the picture looks different:
python_list      35 ms    43 ms    47 ms
python_array     77 ms    72 ms    72 ms
numpy_array     217 ms   225 ms   211 ms
Not quite sure yet why, as all three containers use 80 Mb of continuous memory, so that should be equally cache-friendly. So I think it's about the int objects that get created with += k and -= k and that they stay alive in the list but not in the arrays.
Anyway, with the worst case data, the Python array increases its lead, and the NumPy array falls from first to last place. Total times for initialization and query-processing:
python_list     174 ms   177 ms   176 ms
python_array    126 ms   130 ms   127 ms
numpy_array     303 ms   307 ms   305 ms
Accumulate and max
Times for max(accumulate(nums)):
python_list     391 ms   399 ms   401 ms
python_array    377 ms   384 ms   390 ms
numpy_array    1791 ms  1817 ms  1866 ms
So this part actually takes the longest, for all three versions. Of course in reality, in NumPy I'd use nums.cumsum().max(), which takes about 50 ms here.
Summary, moral of the story
Why is the Python array faster than the Python list in the benchmark?
- Initialization: Because the array's initialization is less work.
 
- Processing the queries: I think because the list keeps a lot of 
int objects alive and that's costly somehow. 
- Accumulate/max: I think because iterating the list involves accessing all the different 
int objects in random order, i.e., randomly accessing memory, which is not that cache-friendly. 
What I take away from this all is that misusing NumPy arrays as lists is indeed a bad idea, but that using Python arrays is not equally bad but can in fact not only use less memory but also be faster than lists. While the conversion between objects and array entries does take extra time, other effects can more than make up for that lost time. That said, keep in mind that the array version was slower in my 32-bit Python benchmark and slower in query processing in 64-bit Python when I changed the test data to use smaller/fewer indexes. So it really depends on the problem. But using an array can be faster than using a list.
     
    
numsfor the rangeq[0]toq[1]. \$\endgroup\$abykand decrementsbbyk, then sums all the numbers and calculates the max on the way. The code is actually fine. \$\endgroup\$