Minor, but here:
''.join(_ for _ in splitted_doc if _ is not None)
_ should be reserved for when you need to bind a object to a name, but don't need the variable. Here is a good resource on the topic. You are in fact using _ though as both the final result and in the check, so I'd give it a proper name.
If none of the valid strings are falsey (empty), you can also use filter here with None as the function:
''.join(filter(None, splitted_doc))
This will remove all falsey elements; not just Nones. That shouldn't affect anything though since the "".join will effectively get rid of any empty strings anyway.
Also, not directly related to the code, but "splitted" doesn't sound idiomatic. I'd call it just split_doc.
I'd be careful about repeated concatenation of strings using +. Each concatenation requires making a copy of the accumulated string. Performance likely won't be a concern here, but I would consider changing this so it instead appends each piece to a list, then joins the list in the end to create the string all at once. This is Python's equivalent of the StringBuilder that some other languages have.
You can see from some quick benchmarking in Python 3.8 that it can make a fair difference:
from timeit import timeit
def repeated_concat(n):
acc = ""
for _ in range(n):
acc = acc + "a"
return acc
def repeated_concat_pe(n):
acc = ""
for _ in range(n):
acc += "b"
return acc
def join_concat_li(n):
return "".join(["c" for _ in range(n)])
def join_concat_ge(n):
return "".join("d" for _ in range(n))
N = int(3e5)
TRIALS = int(1e3)
print(timeit(lambda: repeated_concat(N), number=TRIALS))
print(timeit(lambda: repeated_concat_pe(N), number=TRIALS))
print(timeit(lambda: join_concat_li(N), number=TRIALS))
print(timeit(lambda: join_concat_ge(N), number=TRIALS))
40.2907047
40.597058800000006
13.207587099999998
19.380548700000006
Also, apparently list comprehensions can perform better than a generator expression when joining? I'm going to have to test that more.