Here's aFirst change I'd be tempted to make - extract iteration logic to its own generator:
def markdown_files(config):
sources_dir = config['sources_dir']
for root, dirnames, filenames in os.walk(sources_dir):
for filename in fnmatch.filter(filenames, '*.md'):
yield os.path.join(root, filename)
def main():
config = get_user_config(os.getcwd())
for path in markdown_files(config):
html_block = transform_html_into_markdown(path)
# ...
For one, this eliminates a level of nesting, which is always a good thing
Also you can add things like exclude paths to your config at a later date, without having to change your processing code. Note that this might require you to rethink passing sources_dir into get_html_file_path, as any filtering logic might end up duplicated...
Second change: wrap the entire processing routine in a single function:
def process_file(config, path):
html_block = transform_html_into_markdown(path)
# ...
def main():
config = get_user_config(os.getcwd())
for md_path in markdown_files(config):
process_file(config, md_path)
So that later you can parallelize it, if you really need to:
import multiprocessing
import functools # we can't directly pass a lambda into map
pool = multiprocessing.Pool()
pool.map(
functools.partial(process_file, config=config),
markdown_files(config)
)