157

Let's say I am doing a larger data analysis in Jupyter/Ipython notebook with lots of time consuming computations done. Then, for some reason, I have to shut down the jupyter local server I, but I would like to return to doing the analysis later, without having to go through all the time-consuming computations again.


What I would like love to do is pickle or store the whole Jupyter session (all pandas dataframes, np.arrays, variables, ...) so I can safely shut down the server knowing I can return to my session in exactly the same state as before.

Is it even technically possible? Is there a built-in functionality I overlooked?


EDIT: based on this answer there is a %store magic which should be "lightweight pickle". However you have to store the variables manually like so:

#inside a ipython/nb session
foo = "A dummy string"
%store foo
closing seesion, restarting kernel
%store -r foo # r for refresh
print(foo) # "A dummy string"

which is fairly close to what I would want, but having to do it manually and being unable to distinguish between different sessions makes it less useful.

2
  • 2
    Any progress on this? I only noticed there is a workspace in Spyder IDE that can save variables into *.mat. But not sure if this could be ported into Jupyter Notebook. Commented Jun 6, 2016 at 2:15
  • Have you considered pypi.python.org/pypi/dill ? "dill also provides the capability to: - save and load python interpreter sessions" That's python though, not sure what else is involved with ipython or a kernel Commented Apr 13, 2018 at 16:01

5 Answers 5

114

I think Dill (pip install dill) answers your question well.

Use dill.dump_session to save a Notebook session:

import dill
dill.dump_session('notebook_env.db')

Use dill.load_session to restore a Notebook session:

import dill
dill.load_session('notebook_env.db')

(source)

Sign up to request clarification or add additional context in comments.

6 Comments

fails when there are generators (which kind of makes sense when I think about it), but it seems that this is as close we can hope for!
Worked great for me. Couple things to keep in mind: First, If you have pyodbc connection objects hanging around, you'll need to close them and then set them all to None otherwise, you get a "TypeError: can't pickle pyodbc.Connection objects" error. Second, the notebook state does not include graphs that were generated by your code, so you'll need to rerun the cells to bring these back.
But it doesn't work I used the saved file on another machine
Installed dill. Do I do import dill dill.dump_session('notebook_env.db') from command line?
No, you'll need to do it while running the Jupyter notebook. Both the dump_session and load_session should be through the notebook. Your load_session can be at the start of the notebook. And the dump_session can be at the very end of the notebook.
|
30

(I'd rather comment than offer this as an actual answer, but I need more reputation to comment.)

You can store most data-like variables in a systematic way. What I usually do is store all dataframes, arrays, etc. in pandas.HDFStore. At the beginning of the notebook, declare

backup = pd.HDFStore('backup.h5')

and then store any new variables as you produce them

backup['var1'] = var1

At the end, probably a good idea to do

backup.close()

before turning off the server. The next time you want to continue with the notebook:

backup = pd.HDFStore('backup.h5')
var1 = backup['var1']

Truth be told, I'd prefer built-in functionality in ipython notebook, too. You can't save everything this way (e.g. objects, connections), and it's hard to keep the notebook organized with so much boilerplate codes.

2 Comments

This is a very interesting workaround, but I can literally feel the pain associated with maintaining such system. Thanks for the tip tho :)
This is a good workaround. Just to put it out, this solution will probably require installing tables module to be able to create the backup file.
20

This question is related to: How to cache in IPython Notebook?

To save the results of individual cells, the caching magic comes in handy.

%%cache longcalc.pkl var1 var2 var3
var1 = longcalculation()
....

When rerunning the notebook, the contents of this cell is loaded from the cache.

This is not exactly answering your question, but it might be enough to when the results of all the lengthy calculations are recovered fast. This in combination of hitting the run-all button on top of the notebook is for me a workable solution.

The cache magic cannot save the state of a whole notebook yet. To my knowledge there is no other system yet to resume a "notebook". This would require to save all the history of the python kernel. After loading the notebook, and connecting to a kernel, this information should be loaded.

Comments

2

Edit

Since first posting this answer I made pypi package https://pypi.org/project/jupyter-save-load-vars. It can be installed with

pip install jupyter-save-load-vars

Since dill.dump_session fails on any object that cannot be pickled and it apparently cannot be configured to simply ignore such objects, I wrote a pair of static functions savevars(file) and loadvars() that are in this ne1 library for use with our neuromorphic engineering jupyter notebook class chip exercises.

This approach is based on the useful post Get locals from calling namespace in Python

Example of use:

from jupyter_save_load_vars import savevars,loadvars
a=1
b=[2,3]
c='string'
o=(i for i in []) # make generator that cannot be pickled

savevars('testvars')
del a,b,c
loadvars('testvars')
print(a)
print(b)
print(c)

Output:

[INFO]: 2023-10-02 08:28:32,878 - NE1 - saved to testvars.dill variables [ a b c ] (File "/home/ne1/CoACH-labs/ne1.py", line 132, in savevars)
[WARNING]: 2023-10-02 08:28:32,879 - NE1 - could not pickle: ['o'] (File "/home/ne1/CoACH-labs/ne1.py", line 134, in savevars)
[INFO]: 2023-10-02 08:28:32,881 - NE1 - from testvars.dill loaded variables ['a', 'b', 'c'] (File "/home/ne1/CoACH-labs/ne1.py", line 158, in loadvars)
1
[2, 3]
string

For completeness to use the code below, I also put the custom Logger that formats the logging messages. If you don't want this logging output, replace log.XXX( with print(

The savevars(filename) function:

def savevars(filename):
    """
    saves all local variables to a file with dill
    
    :param filename: the name of the file. The suffix .dill is added if there is not a suffix already.
    """
    if filename is None:
        log.error('you must supply a filename')
        return
    from pathlib import Path
    p=Path(filename)
    if p.suffix=='': # if suffix is missing add .dill
        p = p.parent / (p.name + _DILL)

    import inspect,dill
    locals=None
    frame = inspect.currentframe().f_back
    try:
        locals = frame.f_locals
    finally:
        del frame
    if locals is None: return

    data={}
    could_not_pickle=[]

    from types import ModuleType
    s=f'saved to {p} variables [ '
    for k,v in locals.items():
        # don't try to pickle any pyplane objects
        if k.startswith('_') or k=='tmp' or k=='In' or k=='Out' or hasattr(v, '__call__') \
            or isinstance(v,ModuleType) or isinstance(v,logging.Logger): 
            continue
        try:
            if not dill.pickles(v):
                could_not_pickle.append(k)
                continue
        except:
            could_not_pickle.append(k)
            continue
        s=s+k+' '
        data[k]=v
    s=s+']'
    try:
       with open(p,'wb') as f:
            try:
                dill.dump(data,f)
                log.info(f'{s}')
                if len(could_not_pickle)>0:
                    log.warning(f'could not pickle: {could_not_pickle}')
            except TypeError as e:
                log.error(f'\n Error: {e}')
    except Exception as e:
        log.error(f'could not save data to {p}')

The loadvars() function

def loadvars(filename):
    """ Loads variables from file into the current workspace
    
    :param filename: the dill file to load from, e.g. lab1. The suffix .dill is added automatically unless there is already a suffix.

    This function loads the variables found in filename into the parent workspace.
    """
    import dill
    from pathlib import Path
    p=Path(filename)
    if p.suffix=='': # if suffix is missing add .dill
        p = p.parent / (p.name + _DILL)
    if not p.exists:
        log.error(f'{p} does not exist')
        return
    try:
        with open(p,'rb') as f:
            data=dill.load(f)
            log.info(f'from {p} loaded variables {list(data.keys())}')
            import inspect
            try:
                frame = inspect.currentframe().f_back # get the workspace frame (jupyter workspace frame)
                locals = frame.f_locals # get its local variable dict
                for k in data:
                    try:
                        locals[k] = data[k] # set a value in it
                    except Exception as e:
                        log.error(f'could not set variable {k}')
            finally:
                del frame
    except Exception as e:
        log.error(f'could not load; got {e}')

The custom Logger with formatting and code line hyperlink that works in pycharm:

import logging
# general logger. Produces nice output format with live hyperlinks for pycharm users
# to use it, just call log=get_logger() at the top of your python file
# all these loggers share the same logger name 'Control_Toolkit'

_LOGGING_LEVEL = logging.DEBUG # usually INFO is good

class CustomFormatter(logging.Formatter):
    """Logging Formatter to add colors and count warning / errors"""
    # see https://stackoverflow.com/questions/384076/how-can-i-color-python-logging-output/7995762#7995762

    # \x1b[ (ESC[) is the CSI introductory sequence for ANSI https://en.wikipedia.org/wiki/ANSI_escape_code
    # The control sequence CSI n m, named Select Graphic Rendition (SGR), sets display attributes.
    grey = "\x1b[2;37m" # 2 faint, 37 gray
    yellow = "\x1b[33;21m"
    cyan = "\x1b[0;36m" # 0 normal 36 cyan
    green = "\x1b[31;21m" # dark green
    red = "\x1b[31;21m" # bold red
    bold_red = "\x1b[31;1m"
    light_blue = "\x1b[1;36m"
    blue = "\x1b[1;34m"
    reset = "\x1b[0m"
    # File "{file}", line {max(line, 1)}'.replace("\\", "/")
    format = '[%(levelname)s]: %(asctime)s - %(name)s - %(message)s (File "%(pathname)s", line %(lineno)d, in %(funcName)s)'

    FORMATS = {
        logging.DEBUG: grey + format + reset,
        logging.INFO: cyan + format + reset,
        logging.WARNING: red + format + reset,
        logging.ERROR: bold_red + format + reset,
        logging.CRITICAL: bold_red + format + reset
    }

    def format(self, record):
        log_fmt = self.FORMATS.get(record.levelno)
        formatter = logging.Formatter(log_fmt)
        return formatter.format(record).replace("\\", "/") #replace \ with / for pycharm links


def get_logger():
    """ Use get_logger to define a logger with useful color output and info and warning turned on according to the global LOGGING_LEVEL.

    :returns: the logger.
    """
    # logging.basicConfig(stream=sys.stdout, level=logging.INFO)
    logger = logging.getLogger('NE1') # tobi changed so all have same name so we can uniformly affect all of them
    logger.setLevel(_LOGGING_LEVEL)
    # create console handler if this logger does not have handler yet
    if len(logger.handlers)==0:
        ch = logging.StreamHandler()
        ch.setFormatter(CustomFormatter())
        logger.addHandler(ch)
    return logger

log=get_logger()

2 Comments

Note that this loadvars() silently ovewrites any existing variables. For updated version see code.ini.uzh.ch/CoACH/CoACH-labs/-/blob/master/ne1.py
Ignore that first version, instead see github.com/tobidelbruck/jupyter-save-load-vars/tree/main which is now a pypi package pypi.org/project/jupyter-save-load-vars
0

Came here looking for a solution as well. I work with dataframes and lists, so that is basically what I wanted to save. Ended up using pickle to save a subset of the global dictionary:

def save_context():
    keys = [v for v in globals().keys()
       if not v.startswith('_') and v != 'In' and
        (str(type(globals()[v])) == "<class 'pandas.core.frame.DataFrame'>"
            or str(type(globals()[v])) == "<class 'list'>")]

    context = dict()
    for key in keys:
        context[key] = globals()[key]

    with open('data/context.pkl', 'wb') as file:
        pkl.dump(context, file)


def restore_context():
    with open('data/context.pkl', 'rb') as file:
        context = pkl.load(file)

    for key in context.keys():
        globals()[key] = context[key]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.