For practice, I decided to write a run-length decoder. For example:
basic_run_length_decode("1A2B3C")
=> 'ABBCCC'
basic_run_length_decode("5A10Z10J")
=> 'AAAAAZZZZZZZZZZJJJJJJJJJJ'
It ended up getting long, but that's mostly because I wanted try incorporating the partition-by function from Clojure. It seemed like it would work nicely here, and it did. It required me to write that function though since Python doesn't seem to have anything equivalent. I also "stole" grouper from the "recipe" section of the itertools docs. Clojure has both of these functions as part of its standard library, so that's what I thought of while writing this.
Basically, it works by cutting the string into groups of digits and non-digits, pairing them into [digits, word] pairs, then parsing the digits and doing some string multiplication.
What I'd like commented on mainly:
partition_by. I tried to see if there was a sane way of using something liketakewhile+drop_while, but both of those functions consume an extra element after the predicate flips, which caused problems. I opted to just do manual iteration, but it's quite verbose.Anything else that I may be doing the hard way.
Note, it actually allows for multiple characters per number.
basic_run_length_decode("10AB10CD")
=> 'ABABABABABABABABABABCDCDCDCDCDCDCDCDCDCD'
I thought about raising an error, but decided to just allow it. This is apparently a type of possible encoding, and it cost me nothing to allow it.
from itertools import zip_longest
from typing import Callable, Any, Iterable, TypeVar, Generator, List
T = TypeVar("T")
# Emulating https://clojuredocs.org/clojure.core/partition-by
def _partition_by(f: Callable[[T], Any], iterable: Iterable[T]) -> Generator[List[T], None, None]:
"""Splits off a new chunk everytime f returns a new value.
list(partition_by(lambda n: n < 5, [1, 2, 3, 6, 3, 2, 1])) => [[1, 2, 3], [6], [3, 2, 1]]
"""
last_value = object() # Dummy object that will never be equal to anything else
chunk = []
for x in iterable:
returned = f(x)
if returned == last_value:
chunk.append(x)
else:
if chunk:
yield chunk
chunk = [x]
last_value = returned
if chunk:
yield chunk
# Recipe from https://docs.python.org/3.8/library/itertools.html#itertools.takewhile
def _grouper(iterable, n, fillvalue=None):
"""Collect data into fixed-length chunks or blocks"""
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def _invalid_encoded_error(message: str) -> ValueError:
return ValueError(f"Invalid encoded string. {message}")
def basic_run_length_decode(encoded_string: str) -> str:
"""Decodes a string in the form of <number><chars><number><chars>. . .
basic_run_length_decode("1A2B3C") => 'ABBCCC'"""
if encoded_string and not encoded_string[0].isdigit():
raise _invalid_encoded_error("First character must be a number.")
chunks = _partition_by(lambda c: c.isdigit(), encoded_string)
acc = []
for raw_n, raw_word in _grouper(chunks, 2):
if not raw_word:
raise _invalid_encoded_error("Trailing numbers without character at end of encoded string.")
n = int("".join(raw_n)) # Should never throw as long as the very first char is a digit?
word = "".join(raw_word)
acc.append(n * word)
return "".join(acc)