If I could get some input on the design of this, I would be grateful as well.
Note that I'm programming in python.
There's a function F that takes lots of data, runs some analysis on it (taking maybe a minute or more) to compute some crucial fruits theta, and then spits out a function g(x), that has a lot of functionality that it can do efficiently solely by accessing theta, the fruits of analyzing the data.
Now one might wish to save this g function. So I designed the following functionality
theta = g(mode="get theta") # gets the `theta` value that `g` was using
# perhaps store `theta` as a pickle file, or a json file, whatever
# read `theta` from its storage
restored_g = restore_g(theta) # restore_g is a function that
# takes a `theta` and gives you
# a `g` that runs based off that
# `theta`
If you want a concrete example to think about, think interploation. F gets a bunch of data points, and after processing, spits out an interpolation function g. You can't save a function though, so you save the theta that g was using, and then you can theoretically restore the interpolator later on with a restore_g function using that saved theta.
The thing is, though, that the code for F and restore_g will look like this
def F(data):
theta = do_tons_of_processing(data)
def g(args):
return do_stuff(args, theta)
return g
def restore_g(theta):
def g(args):
return do_stuff(args, theta)
return g
The problem here is that
def g(args):
return do_stuff(args, theta)
appears twice, exactly the same, seemingly by necessity. I can't think of a way around editing that snippet of code in both places whenever I want to make a change to g, like what arguments it takes, the description of what it does, etc. How can I best address this?
Two more related questions I have are: what is the best practice for describing the functions?
Normally, one would do something like
def f(x):
"""concise description
longer description
inputs
------
x : int
what the input means
returns
-------
y : float
what the return value is
maybe some examples
"""
return 0.2*x
But my F and restore_g themselves return a function g, whose inputs and outputs should also be described. So where should this description happen? And how can it be maximally synced between F and restore_g with minimal redundancy?
Finally, what is the "best" (or at least, a good) practice for going about g having multiple orthogonal purposes? Sometimes, it might take an x and a y array as arguments to spit out something. Sometimes it could just take an x value to spit out something. And sometimes, it'll take "get theta" so it knows to spit out theta. Is it considered inappropriate to just overload the x argument so that if it's fed "get theta" (or some other keyword), then g will follow the "get theta" functionality? Is it better to create a whole other argument called mode or some such that can be set to "get theta" or what have you?
thetashould remain accessible based on principle.