Algorithm: Count the minimum number of distinct sorted subsequence for each update query

Question

This question is asked to me in an interview:
Distinct sorted subsquence containing adjacent values is defined as either its length is one or it only contains adjacent numbers when sorted. Each element can belong to only 1 such subsequence. I have Q queries, each updating a single value in A and I have to answer for each query, how many parts would be in the partition of A into distinct sorted subsequences if the number of parts was minimized.

For example, the number of parts for A = [1,2,3,4,3,5] can be minimized by partitioning it in the following two ways, both of which contain only two parts:

1) [1,2,3] && [4,3,5] ==> answer 2 (4,3,5 sorted is 3,4,5 all adjacent values)

2) [1,2,3,4,5] && [3] ==> answer 2

Approach I tried: Hashing and forming sets but all test cases were not cleared because of Timeout.

Problem Statment PDF : PDF

Constraints:
1<= N,Q <= 3*10^5
1< A(i)< 10^9

I hope I never have to answer this question in an interview! — Brien Foss
– Brien Foss, Commented Jan 27, 2018 at 5:45
@Brien haha its too much time taking I know. But can't figure out if segment tree is applicable in some sense. — Optimus Prime
– Optimus Prime, Commented Jan 27, 2018 at 5:47
I made some edits to your question, trying to clarify the English. Please let me know if I misunderstood anything. — גלעד ברקן
– גלעד ברקן, Commented Jan 27, 2018 at 13:55
I recommend you do not try to require people to click an external link. — Kenny Ostrom
– Kenny Ostrom, Commented Jan 27, 2018 at 15:48

xskxzr · Accepted Answer · 2018-01-27 17:42:25Z

Preprocessing

First you can preprocess A before all queries and generate a table (say times_of) such that when given a number n, one can efficiently obtain the number of times n appears in A through expression like times_of[n]. In the following example assuming A is of type int[N], we use an std::map to implement the table. Its construction costs O(NlogN) time.

auto preprocess(int *begin, int *end)
{
    std::map<int, std::size_t> times_of;
    while (begin != end)
    {
        ++times_of[*begin];
        ++begin;
    }
    return times_of;
}

Let min and max be the minimum and maximum elements of A respectively. Then the following lemma applies:

The minimum number of distinct sorted subsequences is equal to max{0, times_of[min] - times_of[min-1]} + ... + max{0, times_of[max] - times_of[max-1]}.

A rigorous proof is a bit technical, so I omit it from this answer. Roughly speaking, consider numbers from small to large. If n appears more than n-1, it has to bring extra times_of[n]-times_of[n-1] subsequences.

With this lemma, we can compute initially the minimum number of distinct sorted subsequences result in O(N) time (by iterating through times_of, not by iterating from min to max). The following is a sample code:

std::size_t result = 0;
auto prev = std::make_pair(min - 1, static_cast<std::size_t>(0));
for (auto &cur : times_of)
{
    // times_of[cur.first-1] == 0
    if (cur.first != prev.first + 1) result += cur.second;
    // times_of[cur.first-1] == times_of[prev.first]
    else if (cur.second > prev.second) result += cur.second - prev.second;

    prev = cur;
}

Queries

To deal with a query A[u] = v, we first update times_of[A[u]] and times_of[v] which costs O(logN) time. Then according to the lemma, we need only to recompute constant (i.e. 4) related terms to update result. Each recomputation costs O(logN) time (to find the previous or next element in times_of), so a query takes O(logN) time in total.

can you explain your lemma a bit as : times_of[min-1] will be zero always. any code to compute it?
@OptimusPrime Yes, I use times_of[min-1] instead of 0 for consistent of the formula.
can you check my implementation : HERE its calling abort. may be using too much resources
I have even not used map instead i used unordered_map for lowering down the complexity further. can we optimize it further

Kenny Ostrom · Accepted Answer · 2018-01-27 16:11:57Z

Keep a list of clusters on the first pass. Each has a collection of values, with a minimum and maximum value. These clusters could very well be stored in a segment tree (making it easy to merge in case they ever touch).

Loop through your N numbers, and for each number, either add it to an existing cluster (possibly triggering a merge), or create a new cluster. This may be easier if your clusters store min-1 and max+1.

Now you are done with the initial input of N numbers, and you have several clusters, all of which are likely to be of a reasonable size for radix sort.

However, you do not want to finish the radix sort. Generate the list of counts, then use this to count adjacent clusters. Loop through this, and every time the count decreases, you have found (difference) many of your final distinct sorted subsequences. (Using max+1 pays off again, because the guaranteed zero at the end means you don't have to add a special case after the loop.)

I may have misunderstood the queries. They are changing one of the inputs? I think this would still work with minor modification.

Collectives™ on Stack Overflow

Algorithm: Count the minimum number of distinct sorted subsequence for each update query

2 Answers 2

Preprocessing

Queries

12 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Preprocessing

Queries

12 Comments

1 Comment

Linked

Related