Skip to main content
added 405 characters in body
Source Link
J_H
  • 42.3k
  • 3
  • 38
  • 157

helper defaults

def create_lag_and_delta_features(df_input, features_to_lag, temp_features_for_delta,
        lag_period=1):

That last parameter is great, and self explanatory. There's no question about what kinds of inputs it accepts, and what a sensible input value would be.

Consider making two other params similarly self explanatory:

_default_features_to_lag = ("precipitation", "temp_max", "temp_min", "wind", "weather_encoded")

def create_lag_and_delta_features(
    df_input,
    features_to_lag=_default_features_to_lag,
    temp_features_for_delta=("temp_max", "temp_min"),
    lag_period=1,
):
    ...

This simplifies the call site, and makes reading the body of the helper a little easier since we can see expected param values right next to it.

As a fine point, we might adopt a _private name for the _create_lag_and_delta_features function, showing that it's not part of our Public API.

As an even finer point, I used an immutable sequence of feature names where a list would have been more natural. Why? To avoid the "mutable default arg" gotcha that lies in wait for the poor unsuspecting engineer. And to ensure it would lint cleanly.

overly eager helper

Thank you for adding that helper -- the code is much better off for it. But "and" is in the name, which suggests it might be doing too much, suggests it is not just doing one thing.

The current call site is nice, but this might be slightly nicer.

df_with_lags = _create_delta_feat(
                   _create_lag_feat(df_for_lagged_processing))

Notice that a pair of unit tests could now exercise both bits of functionality independently from one another.

train / test split

Imagine there was a weather model that was fairly accurate, but its poorest performance happened during cold months. We would want to know about its weaknesses before deploying into production.

Of the four years of available examples, you held out 20% to assess accuracy. Weather behavior is of course very seasonal. There's a couple of weeks with no assessment. Prefer a 75 - 25 split, so we're evaluating against a full year's worth of data.

helper defaults

def create_lag_and_delta_features(df_input, features_to_lag, temp_features_for_delta,
        lag_period=1):

That last parameter is great, and self explanatory. There's no question about what kinds of inputs it accepts, and what a sensible input value would be.

Consider making two other params similarly self explanatory:

_default_features_to_lag = ("precipitation", "temp_max", "temp_min", "wind", "weather_encoded")

def create_lag_and_delta_features(
    df_input,
    features_to_lag=_default_features_to_lag,
    temp_features_for_delta=("temp_max", "temp_min"),
    lag_period=1,
):
    ...

This simplifies the call site, and makes reading the body of the helper a little easier since we can see expected param values right next to it.

As a fine point, we might adopt a _private name for the _create_lag_and_delta_features function, showing that it's not part of our Public API.

As an even finer point, I used an immutable sequence of feature names where a list would have been more natural. Why? To avoid the "mutable default arg" gotcha that lies in wait for the poor unsuspecting engineer. And to ensure it would lint cleanly.

overly eager helper

Thank you for adding that helper -- the code is much better off for it. But "and" is in the name, which suggests it might be doing too much, suggests it is not just doing one thing.

The current call site is nice, but this might be slightly nicer.

df_with_lags = _create_delta_feat(
                   _create_lag_feat(df_for_lagged_processing))

Notice that a pair of unit tests could now exercise both bits of functionality independently from one another.

helper defaults

def create_lag_and_delta_features(df_input, features_to_lag, temp_features_for_delta,
        lag_period=1):

That last parameter is great, and self explanatory. There's no question about what kinds of inputs it accepts, and what a sensible input value would be.

Consider making two other params similarly self explanatory:

_default_features_to_lag = ("precipitation", "temp_max", "temp_min", "wind", "weather_encoded")

def create_lag_and_delta_features(
    df_input,
    features_to_lag=_default_features_to_lag,
    temp_features_for_delta=("temp_max", "temp_min"),
    lag_period=1,
):
    ...

This simplifies the call site, and makes reading the body of the helper a little easier since we can see expected param values right next to it.

As a fine point, we might adopt a _private name for the _create_lag_and_delta_features function, showing that it's not part of our Public API.

As an even finer point, I used an immutable sequence of feature names where a list would have been more natural. Why? To avoid the "mutable default arg" gotcha that lies in wait for the poor unsuspecting engineer. And to ensure it would lint cleanly.

overly eager helper

Thank you for adding that helper -- the code is much better off for it. But "and" is in the name, which suggests it might be doing too much, suggests it is not just doing one thing.

The current call site is nice, but this might be slightly nicer.

df_with_lags = _create_delta_feat(
                   _create_lag_feat(df_for_lagged_processing))

Notice that a pair of unit tests could now exercise both bits of functionality independently from one another.

train / test split

Imagine there was a weather model that was fairly accurate, but its poorest performance happened during cold months. We would want to know about its weaknesses before deploying into production.

Of the four years of available examples, you held out 20% to assess accuracy. Weather behavior is of course very seasonal. There's a couple of weeks with no assessment. Prefer a 75 - 25 split, so we're evaluating against a full year's worth of data.

Source Link
J_H
  • 42.3k
  • 3
  • 38
  • 157

helper defaults

def create_lag_and_delta_features(df_input, features_to_lag, temp_features_for_delta,
        lag_period=1):

That last parameter is great, and self explanatory. There's no question about what kinds of inputs it accepts, and what a sensible input value would be.

Consider making two other params similarly self explanatory:

_default_features_to_lag = ("precipitation", "temp_max", "temp_min", "wind", "weather_encoded")

def create_lag_and_delta_features(
    df_input,
    features_to_lag=_default_features_to_lag,
    temp_features_for_delta=("temp_max", "temp_min"),
    lag_period=1,
):
    ...

This simplifies the call site, and makes reading the body of the helper a little easier since we can see expected param values right next to it.

As a fine point, we might adopt a _private name for the _create_lag_and_delta_features function, showing that it's not part of our Public API.

As an even finer point, I used an immutable sequence of feature names where a list would have been more natural. Why? To avoid the "mutable default arg" gotcha that lies in wait for the poor unsuspecting engineer. And to ensure it would lint cleanly.

overly eager helper

Thank you for adding that helper -- the code is much better off for it. But "and" is in the name, which suggests it might be doing too much, suggests it is not just doing one thing.

The current call site is nice, but this might be slightly nicer.

df_with_lags = _create_delta_feat(
                   _create_lag_feat(df_for_lagged_processing))

Notice that a pair of unit tests could now exercise both bits of functionality independently from one another.