5
\$\begingroup\$

Problem statement

There are two sorted arrays nums1 and nums2 of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be \$\mathcal{O}(\log (m+n))\$.

Implementation

 def median(A, B):
    m, n = len(A), len(B)
    if m > n:
        A, B, m, n = B, A, n, m
    if n == 0:
        raise ValueError

    imin, imax, half_len = 0, m, (m + n + 1) / 2
    while imin <= imax:
        i = (imin + imax) / 2
        j = half_len - i
        if j > 0 and i < m and B[j-1] > A[i]:
            # i is too small, must increase it
            imin = i + 1
        elif i > 0 and j < n and A[i-1] > B[j]:
            # i is too big, must decrease it
            imax = i - 1
        else:
            # i is perfect
            if i == 0: max_of_left = B[j-1]
            elif j == 0: max_of_left = A[i-1]
            else: max_of_left = max(A[i-1], B[j-1])

            if (m + n) % 2 == 1:
                return max_of_left

            if i == m: min_of_right = B[j]
            elif j == n: min_of_right = A[i]
            else: min_of_right = min(A[i], B[j])

            return (max_of_left + min_of_right) / 2.0

Concern

I suspect that it is safe to change this line of code

j > 0 and i < m and B[j-1] > A[i]

to

i < m and B[j-1] > A[i]

and also it is safe to change this line of code

i > 0 and j < n and A[i-1] > B[j]

to

i > 0 and A[i-1] > B[j]

I think remove the condition check of j is safe since we already making sure size of A is no bigger than size of B.

\$\endgroup\$
10
  • 2
    \$\begingroup\$ Please note that on Code Review, answers may address any aspect of the code, not necessarily your specific concerns. \$\endgroup\$ Commented Apr 10, 2016 at 1:21
  • 2
    \$\begingroup\$ This question was cross-posted from Stack Overflow. Please declare such cross-posts in the future. \$\endgroup\$ Commented Apr 10, 2016 at 1:31
  • 1
    \$\begingroup\$ What is the median of two arrays? \$\endgroup\$ Commented Apr 10, 2016 at 4:28
  • 2
    \$\begingroup\$ Sorry still not clear. I know what is the median of an array. What is the the median of two arrays? \$\endgroup\$ Commented Apr 10, 2016 at 6:40
  • 1
    \$\begingroup\$ @LinMa IIRC median is a value which is not less than the half items of the collection and not greater than the half of items. For arrays 1,2,2,2,3 and '2,7' there exist no such element which is bigger than half and smaller than half of all items, anyway the median exists and it is 2. Am I wrong? \$\endgroup\$ Commented Apr 21, 2016 at 7:31

2 Answers 2

3
+50
\$\begingroup\$

First lines can be written as (it's just a personal taste):

A, B = sorted([A, B], key=len, reverse=True)

Also the (m + n + 1) / 2 is possibly a bug, use (m + n + 1) // 2 instead (in python / is floating point division and // is integer division)

I am still looking for further improvements

To be honest I still can't get why it's O(log(n+m)) :-\

Edit:

I can't tell by sure whether the changes will break or not, but I've tested it with 10^6 random inputs multiples times and it seems that the changes are safe:

#!/usr/bin/env python

import random

def median(A, B):
    m, n = len(A), len(B)
    if m > n:
        A, B, m, n = B, A, n, m
    if n == 0:
        raise ValueError

    imin, imax, half_len = 0, m, (m + n + 1) / 2
    while imin <= imax:
        i = (imin + imax) / 2
        j = half_len - i
        if i < m and B[j-1] > A[i]:
            # i is too small, must increase it
            imin = i + 1
        elif i > 0 and A[i-1] > B[j]:
            # i is too big, must decrease it
            imax = i - 1
        else:
            # i is perfect
            if i == 0: max_of_left = B[j-1]
            elif j == 0: max_of_left = A[i-1]
            else: max_of_left = max(A[i-1], B[j-1])

            if (m + n) % 2 == 1:
                return max_of_left

            if i == m: min_of_right = B[j]
            elif j == n: min_of_right = A[i]
            else: min_of_right = min(A[i], B[j])

            return (max_of_left + min_of_right) / 2.0

def random_seq(max_len):
    l = random.randint(0, max_len)
    return list(sorted(random.choice(range(max_len)) for i in range(l)))

def median_slow(A, B):
    C = A + B
    C.sort()
    if not C: raise ValueError()
    m1 = (len(C) - 1) / 2
    m2 = len(C) / 2
    return (C[m1] + C[m2]) / 2.0

for i in range(1000000):
    A = random_seq(10)
    B = random_seq(10)
    if len(A) + len(B) > 0:
        me = median(A, B)
        ma = median_slow(A, B)
        assert me == ma

print "Validated!"

Last Edit:

m + n is larger than one so m + n + 1 is larger than two so j = half_length - 1 will always be positive. also n > m so always n + m <= 2*n so half_length = (n + m + 1) / 2 is always n or smaller but if you subtract one (j = half_length - 1) then it will be always less than n. So these checks are really redundant

\$\endgroup\$
7
  • 1
    \$\begingroup\$ Thanks AmirHossein, vote up for your advice. Do you have any advice on my original concerns? \$\endgroup\$ Commented Apr 21, 2016 at 22:58
  • 2
    \$\begingroup\$ I still can't get why it's O(log(n+m)): maybe you missed sorted (or its implications) from the title/problem statement. (A bit difficult to tell without you stating a tighter upper bound - or argue the O(log(n+m)) overly optimistic. But, then, you suggest sorting where merging would produce a combined sorted array.) \$\endgroup\$ Commented Apr 22, 2016 at 4:18
  • 1
    \$\begingroup\$ @greybeard, yes and vote up, original array is sorted, good catch. I will update question. And appreciate if you could comment on my original question. :) \$\endgroup\$ Commented Apr 22, 2016 at 16:50
  • 1
    \$\begingroup\$ Hi AmirHossein, my original two arrays are both sorted (asc). Do you have any advice on my original concerns? Thanks. \$\endgroup\$ Commented Apr 22, 2016 at 16:51
  • 1
    \$\begingroup\$ @greybeard there are algorithms for finding median within this bound but this particular one is moving by one on every iteration and I don't think that there is such a magical heuristic for every distribution of variables that can guarantee this bound. specially if the data comes from a skewed distribution it's likely that the heuristic just don't work. \$\endgroup\$ Commented Apr 22, 2016 at 19:55
3
\$\begingroup\$

Sorry for being late, but had no time to study your code until today. The results are the same which AmirHossein presented before.

After you conditionally swap the input arrays:

    if m > n:
        A, B, m, n = B, A, n, m

you have m <= n, so

    half_len = (m + n + 1) // 2

(after the fix by AmirHossein) is not less than m. That makes

    j = half_len - i

greater than zero if i < m, so j > 0 and i < m is redundant.

Of course half_len is also is not greater than n, so similary j < n if i > 0.

As for the complexity, the code bisects the A array in each iteration, so the complexity is actually \$\mathcal{O}(\log m)\$ where m is the length of A after the conditional swap, or \$\mathcal{O}(\log (\min(m,n)))\$ in terms of input values of m,n. And this can be bounded by \$\mathcal{O}(\log (m+n))\$.

\$\endgroup\$
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.