What is a good method/practice I can employ to keep identical code snippits in two places in sync? Also, help documenting functionals

Question

If I could get some input on the design of this, I would be grateful as well.

Note that I'm programming in python.

There's a function F that takes lots of data, runs some analysis on it (taking maybe a minute or more) to compute some crucial fruits theta, and then spits out a function g(x), that has a lot of functionality that it can do efficiently solely by accessing theta, the fruits of analyzing the data.

Now one might wish to save this g function. So I designed the following functionality

theta = g(mode="get theta") # gets the `theta` value that `g` was using
# perhaps store `theta` as a pickle file, or a json file, whatever
# read `theta` from its storage
restored_g = restore_g(theta) # restore_g is a function that 
                              # takes a `theta` and gives you 
                              # a `g` that runs based off that 
                              # `theta`

If you want a concrete example to think about, think interploation. F gets a bunch of data points, and after processing, spits out an interpolation function g. You can't save a function though, so you save the theta that g was using, and then you can theoretically restore the interpolator later on with a restore_g function using that saved theta.

The thing is, though, that the code for F and restore_g will look like this

def F(data):
  theta = do_tons_of_processing(data)
  def g(args):
    return do_stuff(args, theta)
  return g

def restore_g(theta):
  def g(args):
    return do_stuff(args, theta)
  return g

The problem here is that

def g(args):
  return do_stuff(args, theta)

appears twice, exactly the same, seemingly by necessity. I can't think of a way around editing that snippet of code in both places whenever I want to make a change to g, like what arguments it takes, the description of what it does, etc. How can I best address this?

Two more related questions I have are: what is the best practice for describing the functions?

Normally, one would do something like

def f(x):
  """concise description

  longer description

  inputs
  ------
  x : int
    what the input means

  returns
  -------
  y : float
    what the return value is

  maybe some examples
  """
  return 0.2*x

But my F and restore_g themselves return a function g, whose inputs and outputs should also be described. So where should this description happen? And how can it be maximally synced between F and restore_g with minimal redundancy?

Finally, what is the "best" (or at least, a good) practice for going about g having multiple orthogonal purposes? Sometimes, it might take an x and a y array as arguments to spit out something. Sometimes it could just take an x value to spit out something. And sometimes, it'll take "get theta" so it knows to spit out theta. Is it considered inappropriate to just overload the x argument so that if it's fed "get theta" (or some other keyword), then g will follow the "get theta" functionality? Is it better to create a whole other argument called mode or some such that can be set to "get theta" or what have you?

@Mark Personally, no. But further down the line, someone might want to access it, so unless it might give significant gains otherwise, theta should remain accessible based on principle. — chausies
– chausies, Commented Mar 20, 2019 at 10:36

Karl Bielefeldt · Accepted Answer · 2019-03-20 03:11:00Z

9

Why not call restore_g from F?

def F(data):
  theta = do_tons_of_processing(data)
  return restore_g(theta)

def restore_g(theta):
  def g(args):
    return do_stuff(args, theta)
  return g

answered Mar 20, 2019 at 3:11

Karl Bielefeldt

149k38 gold badges285 silver badges485 bronze badges

That's brilliant, no idea why I didn't think of that >.<. But question, would this be relatively fine design: in the description for F, go over the inputs, and mention the output is a function g, and one should check out the description of restore_g for more details on how g operates? Or what? Upon thinking a bit about it, I still think that, overall, in terms of design, OOP is the more correct way to go, in how it naturally allows me to give descriptors where and how they should be.

chausies
– chausies

2019-03-20 07:16:29 +00:00
Commented Mar 20, 2019 at 7:16

Add a comment |

Joe · Accepted Answer · 2019-03-20 04:03:06Z

What is a good method/practice I can employ to keep identical code snippets in two places in sync?

To never have identical code snippets in any more than a single place, ever. Don’t repeat yourself.

There is no good reason to repeat yourself regardless of what paradigm you are using. In your particular case of theta and g, you are correct that OOP makes sense because it seems like theta is necessary data that seems to naturally want to be a instance variable. The key benefit of this is that it solves your problem by eliminating the repeated code snippet.

But, you don’t necessarily need to use OOP either; it seems as though theta can be thought of as the canonical form of data, in that you always need and operate on theta. So, as long as you have function that returns g and takes theta as input, you don’t need any duplicated code, you just do different processing to produce the theta you want before calling restore_g.

Your reflexive assumption anytime you duplicate code should be that you are doing something wrong, that you are introducing a place in which error will be introduced later specifically because keeping things in sync like this is a tedious, unnecessary pain.

chausies · Accepted Answer · 2019-03-20 02:30:17Z

Actually, after meditating upon it a bit more (and getting over my personal aversion for it), I realize now that OOP is the obvious solution.

The entire thing should be designed in the OOP paradigm, not the functional programming paradigm.

Something like this:

class F(object):
  """description of what `F` does

  attributes
  ----------
  theta
  """

  def __init__(self, data=None, theta=None):
    """If data provided, computes theta. Else uses provided theta."""
    if theta is None:
      self.theta = compute_stuff(data)
    else:
      self.theta = theta

  def get_g(self):
    """here's how to use `g`"""
    def g(args):
      return do_stuff(args, self.theta)
    return g

If I'm not mistaken, that takes care of all my problems. Optionally, I could also add a get_theta method as well.

Stack Exchange Network

What is a good method/practice I can employ to keep identical code snippits in two places in sync? Also, help documenting functionals

3 Answers 3

Linked

Hot Network Questions

What is a good method/practice I can employ to keep identical code snippits in two places in sync? Also, help documenting functionals

3 Answers 3

Linked

Related

Hot Network Questions