Skip to main content
added 672 characters in body
Source Link
Eric
  • 340
  • 1
  • 7

Here's aFirst change I'd be tempted to make - extract iteration logic to its own generator:

def markdown_files(config):
    sources_dir = config['sources_dir']
    for root, dirnames, filenames in os.walk(sources_dir):
        for filename in fnmatch.filter(filenames, '*.md'):
            yield os.path.join(root, filename)

def main():
    config = get_user_config(os.getcwd())
    for path in markdown_files(config):    
        html_block = transform_html_into_markdown(path)

        # ...

For one, this eliminates a level of nesting, which is always a good thing

Also you can add things like exclude paths to your config at a later date, without having to change your processing code. Note that this might require you to rethink passing sources_dir into get_html_file_path, as any filtering logic might end up duplicated...


Second change: wrap the entire processing routine in a single function:

def process_file(config, path):  
    html_block = transform_html_into_markdown(path)

    # ...

    
def main():
    config = get_user_config(os.getcwd())
    for md_path in markdown_files(config):
        process_file(config, md_path)

So that later you can parallelize it, if you really need to:

import multiprocessing
import functools  # we can't directly pass a lambda into map

pool = multiprocessing.Pool()
pool.map(
    functools.partial(process_file, config=config),
    markdown_files(config)
)

Here's a change I'd be tempted to make - extract iteration logic to its own generator:

def markdown_files(config):
    sources_dir = config['sources_dir']
    for root, dirnames, filenames in os.walk(sources_dir):
        for filename in fnmatch.filter(filenames, '*.md'):
            yield os.path.join(root, filename)

def main():
    config = get_user_config(os.getcwd())
    for path in markdown_files(config):    
        html_block = transform_html_into_markdown(path)

        # ...

For one, this eliminates a level of nesting, which is always a good thing

Also you can add things like exclude paths to your config at a later date, without having to change your processing code.

First change - extract iteration logic to its own generator:

def markdown_files(config):
    sources_dir = config['sources_dir']
    for root, dirnames, filenames in os.walk(sources_dir):
        for filename in fnmatch.filter(filenames, '*.md'):
            yield os.path.join(root, filename)

def main():
    config = get_user_config(os.getcwd())
    for path in markdown_files(config):    
        html_block = transform_html_into_markdown(path)

        # ...

For one, this eliminates a level of nesting, which is always a good thing

Also you can add things like exclude paths to your config at a later date, without having to change your processing code. Note that this might require you to rethink passing sources_dir into get_html_file_path, as any filtering logic might end up duplicated...


Second change: wrap the entire processing routine in a single function:

def process_file(config, path):  
    html_block = transform_html_into_markdown(path)

    # ...

    
def main():
    config = get_user_config(os.getcwd())
    for md_path in markdown_files(config):
        process_file(config, md_path)

So that later you can parallelize it, if you really need to:

import multiprocessing
import functools  # we can't directly pass a lambda into map

pool = multiprocessing.Pool()
pool.map(
    functools.partial(process_file, config=config),
    markdown_files(config)
)
Source Link
Eric
  • 340
  • 1
  • 7

Here's a change I'd be tempted to make - extract iteration logic to its own generator:

def markdown_files(config):
    sources_dir = config['sources_dir']
    for root, dirnames, filenames in os.walk(sources_dir):
        for filename in fnmatch.filter(filenames, '*.md'):
            yield os.path.join(root, filename)

def main():
    config = get_user_config(os.getcwd())
    for path in markdown_files(config):    
        html_block = transform_html_into_markdown(path)

        # ...

For one, this eliminates a level of nesting, which is always a good thing

Also you can add things like exclude paths to your config at a later date, without having to change your processing code.