Skip to main content
2 of 2
replaced http://codereview.stackexchange.com/ with https://codereview.stackexchange.com/

Mysterious special case

For any inputString with length 1, you print two lines of output: 0 followed by 1. The incorrect 0 comes from the special case

if (len(inputString) == 1):
  print frequency

which as far as I can tell should not be there at all.

Style

Consistent indentation is important, especially in Python, where whitespace matters. Use 4 spaces, as recommended in PEP 8.

inputString should be input_string to be consistent with the naming style and Python guidelines.

It is customary to omit parentheses after while, if, and for.

You should split your code according to responsibilities. I recommend the following outline:

def string_root(s):
    …
    return n

def lines(terminator='*'):
    s = raw_input()
    while s != terminator:
        yield s
        s = raw_input()

if __name__ == '__main__':
    for line in lines():
        print string_root(line)

Advantages include:

  • The main loop read like English. It makes it obvious that each line is treated as an independent problem, unaffected by all other lines.
  • The two calls to raw_input() are close to each other. (Unfortunately, there's no do-while loop in Python.)
  • The string_root() function is independently testable.

Algorithm

You've implemented a brute-force solution. The complexity is \$O(L^2)\$, there \$L\$ is the length of inputString. (You can tell because for current_symbol in inputString is \$O(L)\$, and scanned_string.startswith(matched_characters) is also \$O(L)\$, and there are no break statements anywhere to serve as shortcuts.)

To bring it down to \$O(L)\$, you would need a smarter string-matching algorithm. The Z Algorithm is useful here: it detects string self-similarities in \$O(L)\$ time.

>>> compute_z('abc' * 4)
[12, 0, 0, 9, 0, 0, 6, 0, 0, 3, 0, 0]

The result above means that 'abcabcabcabc' has

  • 12 leading characters in common with itself
  • no leading characters in common with itself after dropping the initial 'a'
  • no leading characters in common with itself after dropping the initial 'ab'
  • 9 leading characters in common with itself after dropping the initial 'abc'

This implementation works in \$O(L)\$ time.

def string_root(s):
    z = compute_z(s)
    z_iter = enumerate(z)
    _, s_len = next(z_iter)
    for block_len, remain_len in z_iter:
        if remain_len == s_len - block_len and remain_len % block_len == 0:
            return s_len // block_len
    return 1

Documentation and testing

A doctest would be useful here.

def string_root(s):
    """Returns the maximum number of repeating blocks into which string s
    can be split.
    
    >>> string_root('abcabcabcabc')
    4
    >>> string_root('abcdefgh012')
    1       
    >>> string_root('aaaaaaaaaa')
    10
    >>> string_root('abcabcabcabcd')
    1
    >>> string_root('abcabcdabcabcd')
    2
    >>> string_root('abcabcabcabx')
    1
    """
    z = compute_z(s)
    …

Then you can test the code using

import findsr
import doctest
doctest.testmod(findsr)
200_success
  • 145.6k
  • 22
  • 191
  • 481