Skip to main content
added 965 characters in body
Source Link
Carcigenicate
  • 16.6k
  • 3
  • 37
  • 82

Minor, but here:

''.join(_ for _ in splitted_doc if _ is not None)

_ should be reserved for when you need to bind a object to a name, but don't need the variable. Here is a good resource on the topic. You are in fact using _ though as both the final result and in the check, so I'd give it a proper name.

If none of the valid strings are falsey (empty), you can also use filter here with None as the function:

''.join(filter(None, splitted_doc))

This will remove all falsey elements; not just Nones. That shouldn't affect anything though since the "".join will effectively get rid of any empty strings anyway.


Also, not directly related to the code, but "splitted" doesn't sound idiomatic. I'd call it just split_doc.


I'd be careful about repeated concatenation of strings using +. Each concatenation requires making a copy of the accumulated string. Performance likely won't be a concern here, but I would consider changing this so it instead appends each piece to a list, then joins the list in the end to create the string all at once. This is Python's equivalent of the StringBuilder that some other languages have.

You can see from some quick benchmarking in Python 3.8 that it can make a fair difference:

from timeit import timeit

def repeated_concat(n):
    acc = ""
    for _ in range(n):
        acc = acc + "a"
    return acc

def repeated_concat_pe(n):
    acc = ""
    for _ in range(n):
        acc += "b"
    return acc

def join_concat_li(n):
    return "".join(["c" for _ in range(n)])

def join_concat_ge(n):
    return "".join("d" for _ in range(n))

N = int(3e5)
TRIALS = int(1e3)

print(timeit(lambda: repeated_concat(N), number=TRIALS))
print(timeit(lambda: repeated_concat_pe(N), number=TRIALS))
print(timeit(lambda: join_concat_li(N), number=TRIALS))
print(timeit(lambda: join_concat_ge(N), number=TRIALS))

40.2907047
40.597058800000006
13.207587099999998
19.380548700000006

Also, apparently list comprehensions can perform better than a generator expression when joining? I'm going to have to test that more.

Minor, but here:

''.join(_ for _ in splitted_doc if _ is not None)

_ should be reserved for when you need to bind a object to a name, but don't need the variable. Here is a good resource on the topic. You are in fact using _ though as both the final result and in the check, so I'd give it a proper name.

If none of the valid strings are falsey (empty), you can also use filter here with None as the function:

''.join(filter(None, splitted_doc))

This will remove all falsey elements; not just Nones. That shouldn't affect anything though since the "".join will effectively get rid of any empty strings anyway.


Also, not directly related to the code, but "splitted" doesn't sound idiomatic. I'd call it just split_doc.


I'd be careful about repeated concatenation of strings using +. Each concatenation requires making a copy of the accumulated string. Performance likely won't be a concern here, but I would consider changing this so it instead appends each piece to a list, then joins the list in the end to create the string all at once. This is Python's equivalent of the StringBuilder that some other languages have.

Minor, but here:

''.join(_ for _ in splitted_doc if _ is not None)

_ should be reserved for when you need to bind a object to a name, but don't need the variable. Here is a good resource on the topic. You are in fact using _ though as both the final result and in the check, so I'd give it a proper name.

If none of the valid strings are falsey (empty), you can also use filter here with None as the function:

''.join(filter(None, splitted_doc))

This will remove all falsey elements; not just Nones. That shouldn't affect anything though since the "".join will effectively get rid of any empty strings anyway.


Also, not directly related to the code, but "splitted" doesn't sound idiomatic. I'd call it just split_doc.


I'd be careful about repeated concatenation of strings using +. Each concatenation requires making a copy of the accumulated string. Performance likely won't be a concern here, but I would consider changing this so it instead appends each piece to a list, then joins the list in the end to create the string all at once. This is Python's equivalent of the StringBuilder that some other languages have.

You can see from some quick benchmarking in Python 3.8 that it can make a fair difference:

from timeit import timeit

def repeated_concat(n):
    acc = ""
    for _ in range(n):
        acc = acc + "a"
    return acc

def repeated_concat_pe(n):
    acc = ""
    for _ in range(n):
        acc += "b"
    return acc

def join_concat_li(n):
    return "".join(["c" for _ in range(n)])

def join_concat_ge(n):
    return "".join("d" for _ in range(n))

N = int(3e5)
TRIALS = int(1e3)

print(timeit(lambda: repeated_concat(N), number=TRIALS))
print(timeit(lambda: repeated_concat_pe(N), number=TRIALS))
print(timeit(lambda: join_concat_li(N), number=TRIALS))
print(timeit(lambda: join_concat_ge(N), number=TRIALS))

40.2907047
40.597058800000006
13.207587099999998
19.380548700000006

Also, apparently list comprehensions can perform better than a generator expression when joining? I'm going to have to test that more.

added 83 characters in body
Source Link
Carcigenicate
  • 16.6k
  • 3
  • 37
  • 82

Minor, but here:

''.join(_ for _ in splitted_doc if _ is not None)

_ should be reserved for when you need to bind a object to a name, but don't need the variable. Here is a good resource on the topic. You are in fact using _ though as both the final result and in the check, so I'd give it a proper name.

If none of the valid strings are falsey (empty), you can also use filter here with None as the function:

''.join(filter(None, splitted_doc))

This will remove all falsey elements; not just Nones. That shouldn't affect anything though since the "".join will effectively get rid of any empty strings anyway.


Also, not directly related to the code, but "splitted" doesn't sound idiomatic. I'd call it just split_doc.


I'd be careful about repeated concatenation of strings using +. Each concatenation requires making a copy of the accumulated string. Performance likely won't be a concern here, but I would consider changing this so it instead appends each piece to a list, then joins the list in the end to create the string all at once. This is Python's equivalent of the StringBuilder that some other languages have.

Minor, but here:

''.join(_ for _ in splitted_doc if _ is not None)

_ should be reserved for when you need to bind a object to a name, but don't need the variable. Here is a good resource on the topic. You are in fact using _ though as both the final result and in the check, so I'd give it a proper name.

If none of the valid strings are falsey (empty), you can also use filter here with None as the function:

''.join(filter(None, splitted_doc))

This will remove all falsey elements; not just Nones. That shouldn't affect anything though since the "".join will effectively get rid of any empty strings anyway.


Also, not directly related to the code, but "splitted" doesn't sound idiomatic. I'd call it just split_doc.


I'd be careful about repeated concatenation of strings using +. Each concatenation requires making a copy of the accumulated string. Performance likely won't be a concern here, but I would consider changing this so it instead appends each piece to a list, then joins the list in the end to create the string all at once.

Minor, but here:

''.join(_ for _ in splitted_doc if _ is not None)

_ should be reserved for when you need to bind a object to a name, but don't need the variable. Here is a good resource on the topic. You are in fact using _ though as both the final result and in the check, so I'd give it a proper name.

If none of the valid strings are falsey (empty), you can also use filter here with None as the function:

''.join(filter(None, splitted_doc))

This will remove all falsey elements; not just Nones. That shouldn't affect anything though since the "".join will effectively get rid of any empty strings anyway.


Also, not directly related to the code, but "splitted" doesn't sound idiomatic. I'd call it just split_doc.


I'd be careful about repeated concatenation of strings using +. Each concatenation requires making a copy of the accumulated string. Performance likely won't be a concern here, but I would consider changing this so it instead appends each piece to a list, then joins the list in the end to create the string all at once. This is Python's equivalent of the StringBuilder that some other languages have.

Source Link
Carcigenicate
  • 16.6k
  • 3
  • 37
  • 82

Minor, but here:

''.join(_ for _ in splitted_doc if _ is not None)

_ should be reserved for when you need to bind a object to a name, but don't need the variable. Here is a good resource on the topic. You are in fact using _ though as both the final result and in the check, so I'd give it a proper name.

If none of the valid strings are falsey (empty), you can also use filter here with None as the function:

''.join(filter(None, splitted_doc))

This will remove all falsey elements; not just Nones. That shouldn't affect anything though since the "".join will effectively get rid of any empty strings anyway.


Also, not directly related to the code, but "splitted" doesn't sound idiomatic. I'd call it just split_doc.


I'd be careful about repeated concatenation of strings using +. Each concatenation requires making a copy of the accumulated string. Performance likely won't be a concern here, but I would consider changing this so it instead appends each piece to a list, then joins the list in the end to create the string all at once.