Return to Answer

added 672 characters in body

Source Link

edited Jul 26, 2015 at 21:38

Here's aFirst change I'd be tempted to make - extract iteration logic to its own generator:

def markdown_files(config):
    sources_dir = config['sources_dir']
    for root, dirnames, filenames in os.walk(sources_dir):
        for filename in fnmatch.filter(filenames, '*.md'):
            yield os.path.join(root, filename)

def main():
    config = get_user_config(os.getcwd())
    for path in markdown_files(config):    
        html_block = transform_html_into_markdown(path)

        # ...

For one, this eliminates a level of nesting, which is always a good thing

Also you can add things like exclude paths to your config at a later date, without having to change your processing code. Note that this might require you to rethink passing sources_dir into get_html_file_path, as any filtering logic might end up duplicated...

Second change: wrap the entire processing routine in a single function:

def process_file(config, path):  
    html_block = transform_html_into_markdown(path)

    # ...

    
def main():
    config = get_user_config(os.getcwd())
    for md_path in markdown_files(config):
        process_file(config, md_path)

So that later you can parallelize it, if you really need to:

import multiprocessing
import functools  # we can't directly pass a lambda into map

pool = multiprocessing.Pool()
pool.map(
    functools.partial(process_file, config=config),
    markdown_files(config)
)

Here's a change I'd be tempted to make - extract iteration logic to its own generator:

def markdown_files(config):
    sources_dir = config['sources_dir']
    for root, dirnames, filenames in os.walk(sources_dir):
        for filename in fnmatch.filter(filenames, '*.md'):
            yield os.path.join(root, filename)

def main():
    config = get_user_config(os.getcwd())
    for path in markdown_files(config):    
        html_block = transform_html_into_markdown(path)

        # ...

For one, this eliminates a level of nesting, which is always a good thing

Also you can add things like exclude paths to your config at a later date, without having to change your processing code.

First change - extract iteration logic to its own generator:

def markdown_files(config):
    sources_dir = config['sources_dir']
    for root, dirnames, filenames in os.walk(sources_dir):
        for filename in fnmatch.filter(filenames, '*.md'):
            yield os.path.join(root, filename)

def main():
    config = get_user_config(os.getcwd())
    for path in markdown_files(config):    
        html_block = transform_html_into_markdown(path)

        # ...

For one, this eliminates a level of nesting, which is always a good thing

Second change: wrap the entire processing routine in a single function:

def process_file(config, path):  
    html_block = transform_html_into_markdown(path)

    # ...

    
def main():
    config = get_user_config(os.getcwd())
    for md_path in markdown_files(config):
        process_file(config, md_path)

So that later you can parallelize it, if you really need to:

import multiprocessing
import functools  # we can't directly pass a lambda into map

pool = multiprocessing.Pool()
pool.map(
    functools.partial(process_file, config=config),
    markdown_files(config)
)

Source Link

answered Jul 26, 2015 at 21:26

Eric

Here's a change I'd be tempted to make - extract iteration logic to its own generator:

def markdown_files(config):
    sources_dir = config['sources_dir']
    for root, dirnames, filenames in os.walk(sources_dir):
        for filename in fnmatch.filter(filenames, '*.md'):
            yield os.path.join(root, filename)

def main():
    config = get_user_config(os.getcwd())
    for path in markdown_files(config):    
        html_block = transform_html_into_markdown(path)

        # ...

For one, this eliminates a level of nesting, which is always a good thing

Also you can add things like exclude paths to your config at a later date, without having to change your processing code.