Skip to main content
memchr FTW! (since strchr has to worry about encountering a terminating NUL)
Source Link
J_H
  • 42.3k
  • 3
  • 38
  • 157

But strchr()memchr() has definitely been benchmarked to within an inch of its life. It certainly won't run slower than your loop. It might exploit SIMD vector processing if that's available on the target system which runs the queries. So use such standard building blocks where feasible.

But strchr() has definitely been benchmarked to within an inch of its life. It certainly won't run slower than your loop. It might exploit SIMD vector processing if that's available on the target system which runs the queries. So use such standard building blocks where feasible.

But memchr() has definitely been benchmarked to within an inch of its life. It certainly won't run slower than your loop. It might exploit SIMD vector processing if that's available on the target system which runs the queries. So use such standard building blocks where feasible.

added 619 characters in body
Source Link
J_H
  • 42.3k
  • 3
  • 38
  • 157

use libc

You wrote some simple loops, and they're very nice. An -O3 optimizing compiler likely manipulates them, perhaps by unrolling.

But strchr() has definitely been benchmarked to within an inch of its life. It certainly won't run slower than your loop. It might exploit SIMD vector processing if that's available on the target system which runs the queries. So use such standard building blocks where feasible.

BTW, I do hope you specified -march, or otherwise told the compiler it is allowed to use non-portable features of a particular chipset.

use libc

You wrote some simple loops, and they're very nice. An -O3 optimizing compiler likely manipulates them, perhaps by unrolling.

But strchr() has definitely been benchmarked to within an inch of its life. It certainly won't run slower than your loop. It might exploit SIMD vector processing if that's available on the target system which runs the queries. So use such standard building blocks where feasible.

BTW, I do hope you specified -march, or otherwise told the compiler it is allowed to use non-portable features of a particular chipset.

Source Link
J_H
  • 42.3k
  • 3
  • 38
  • 157

index type

You seem to choose (signed) long or int apparently at random when writing a loop. Prefer (unsigned) size_t i for array indexes.

names

I'm not super fond of uppercase identifiers like Q, but I suppose you were trying to match the problem statement, so OK, fine.

In code like this it is conventional for s to point at a string, and for c or ch to be a character within a string. So char s was slightly jarring, but again I suppose you were going for consistency with the problem statement.

dynamic allocation

    char ..., str[N];

The problem gave us a bound of a million on N. Prefer a static allocation here, to give the optimizer a little more to work with.


it scored rather poorly

I assume the score mostly depends on elapsed running time.

You mentioned there are some compiler warnings to be cleaned up, which might possibly cost demerits.

algorithm

set membership

We repeatedly test whether str[i] is within a set of length two:

        if (str[i] == s1 || str[i] == s2) {

IDK, maybe allocate a vector of length 256, clear it to zeros, and then set the s1 and s2 elements to 1. So we do a single dereference rather than a pair of comparisons. Bench it. Maybe it's faster?

pre-processing

We issue Q queries (many queries) against a megabyte string. So taking a moment to pre-process the string could be helpful, as Boyer & Moore teach us, when they tackle a different kind of problem. The idea here is to anticipate possible queries and memoize the results, just in case the result becomes relevant when we see the queries.

Also, we assume chief cost of these extremely simple queries will be dragging that giant megabyte string through the memory hierarchy, so the more we can do with each retrieved character, the better.

type 1

print a single number which is the position of the k-th occurrence of symbol s

(Note that 20 bits is enough to represent an index of a million.)

  • Define a tuple struct, using 8 bits for s and 24 bits for position.
  • Scan the N characters of the input string, outputting N tuples.
  • qsort() the tuples, costing O(N log N) time.

At query time create a (s, 0) tuple, and use binary search to identify where the s entries start. Advance the index by k to immediately read off the answer. Total query cost is O(log N) time, a significant savings over the O(N) time of your proposed solution.

type 2

print the symbol s1 or s2, depending on which occurs first to the right of position k

Exploit that same sorted datastructure. Do a pair of binary searches to find the beginning of the s1 entries and of the s2 entries.

Simple approach: linearly scan through the entries, looking for an index that exceeds k. Do this again. Compare the indexes you found, and return the appropriate character.

Fancy approach: Let succ(s1) be the successor of that character, so e.g. succ('X') == 'Y'. Do a pair of binary searches for the successors of s1 and s2. Now you know their exact ranges, and hence are able to do binary searches for indexes exceeding k.

Both approaches have time complexity that improves on the O(N) time of your proposed solution.

type 3

print the number of occurrences of symbol s between positions k1 and k2

Again we can adopt a simple or fancy way of exploiting our sorted datastructure. Assume we choose fancy.

Do a pair of binary searches to identify where s entries and succ(s) entries begin. Do a pair of binary searches to identify where the k1 and k2 indexes appear within those entries. Use subtraction to report the number of occurrences.

Again the time complexity significantly improves on the O(N) time of your proposed solution.