Python: How to RECURSIVELY remove None values from a NESTED data structure (lists and dictionaries)?

Question

Here is some nested data, that includes lists, tuples, and dictionaries:

data1 = ( 501, (None, 999), None, (None), 504 )
data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]

Goal: Remove any keys or values (from "data") that are None. If a list or dictionary contains a value, that is itself a list, tuple, or dictionary, then RECURSE, to remove NESTED Nones.

Desired output:

[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})]))]

Or more readably, here is formatted output:

StripNones(data)= list:
. [22, (), ()]
. tuple:
. . (202,)
. . {32: 302, 33: (501, (999,), 504)}
. . OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})])

I will propose a possible answer, as I have not found an existing solution to this. I appreciate any alternatives, or pointers to pre-existing solutions.

EDIT I forgot to mention that this has to work in Python 2.7. I can't use Python 3 at this time.

Though it IS worth posting Python 3 solutions, for others. So please indicate which python you are answering for.

What types do you need to handle? There's no fully general solution. And what if something like (None, None) appears as a key? — user2357112
– user2357112, Commented Dec 13, 2013 at 3:57
(1) I have now posted my answer, which shows my thinking. (2) I decided not to mess with keys, except if they are directly "None". IMHO is better to sanitize keys at the time they are entered into dictonary, rather than later. So would do key = StripNones(key) BEFORE putting it into the dictionary, rather than trying to fix it afterwards. — ToolmakerSteve
– ToolmakerSteve, Commented Dec 13, 2013 at 4:02

mgilson · Accepted Answer · 2013-12-13 04:43:16Z

If you can assume that the __init__ methods of the various subclasses have the same signature as the typical base class:

def remove_none(obj):
  if isinstance(obj, (list, tuple, set)):
    return type(obj)(remove_none(x) for x in obj if x is not None)
  elif isinstance(obj, dict):
    return type(obj)((remove_none(k), remove_none(v))
      for k, v in obj.items() if k is not None and v is not None)
  else:
    return obj

from collections import OrderedDict
data1 = ( 501, (None, 999), None, (None), 504 )
data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]
print remove_none(data)

Note that this won't work with a defaultdict for example since the defaultdict takes and additional argument to __init__. To make it work with defaultdict would require another special case elif (before the one for regular dicts).

Also note that I've actually constructed new objects. I haven't modified the old ones. It would be possible to modify the old objects if you didn't need to support modifying immutable objects like tuple.

Finally getting to your answer. type(obj)(..) -- very nice. isinstance with multiple types, for those that can use same comprehension -- good to know.
I've accepted this answer as most to my tastes. Future readers also look at @perreal's answer, and mgilson & my comments on that. Some use of logic from perreal's answer could be useful to cover other cases.
(And, future reader, if you wish to MODIFY an existing large data structure, rather than create a new copy, some of the logic in my original attempt, far below, may be useful. Specifically, gathering keys to remove, and items to update.)

Mahmoud Hashemi · Accepted Answer · 2016-06-15 00:50:30Z

If you want a full-featured, yet succinct approach to handling real-world nested data structures like these, and even handle cycles, I recommend looking at the remap utility from the boltons utility package.

After pip install boltons or copying iterutils.py into your project, just do:

from collections import OrderedDict
from boltons.iterutils import remap

data1 = ( 501, (None, 999), None, (None), 504 )
data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]

drop_none = lambda path, key, value: key is not None and value is not None

cleaned = remap(data, visit=drop_none)

print(cleaned)

# got:
[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})]))]

This page has many more examples, including ones working with much larger objects (from Github's API).

It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.

thefourtheye · Accepted Answer · 2013-12-13 04:39:29Z

11

def stripNone(data):
    if isinstance(data, dict):
        return {k:stripNone(v) for k, v in data.items() if k is not None and v is not None}
    elif isinstance(data, list):
        return [stripNone(item) for item in data if item is not None]
    elif isinstance(data, tuple):
        return tuple(stripNone(item) for item in data if item is not None)
    elif isinstance(data, set):
        return {stripNone(item) for item in data if item is not None}
    else:
        return data

Sample Runs:

print stripNone(data1)
print stripNone(data2)
print stripNone(data3)
print stripNone(data)

(501, (999,), 504)
{'four': 'sixty', 1: 601}
{12: 402, 14: {'four': 'sixty', 1: 601}}
[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, {12: 402, 14: {'four': 'sixty', 1: 601}})]

edited Dec 13, 2013 at 4:39

answered Dec 13, 2013 at 3:53

thefourtheye

241k53 gold badges466 silver badges505 bronze badges

8 Comments

user2357112 Over a year ago

This removes zeros, empty strings, and other falsy non-Nones.

inspectorG4dget Over a year ago

Change elif data to elif data is not None

thefourtheye Over a year ago

@user2357112 Fixed it :) Please check now.

mgilson Over a year ago

@inspectorG4dget -- How about changing elif data: just to else. If data is None it doesn't really matter. OP's function will just return None anyway. After all, it has to return something. Maybe more to the point, what do we expect stripNone(None) to return? (or maybe it should raise an Exception?)_

user2357112 Over a year ago

If the data is not None check fails, you'll fall off the end of the function and return None... which is data, so you might as well use an else.

|

perreal · Accepted Answer · 2013-12-13 04:46:44Z

2

def purify(o):
    if hasattr(o, 'items'):
        oo = type(o)()
        for k in o:
            if k != None and o[k] != None:
                oo[k] = purify(o[k])
    elif hasattr(o, '__iter__'):
        oo = [ ] 
        for it in o:
            if it != None:
                oo.append(purify(it))
    else: return o
    return type(o)(oo)

print purify(data)

Gives:

[[22, (), ()], ((202,), {32: 302, 33: (501, (999,), 504)}, OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})]))]

edited Dec 13, 2013 at 4:46

answered Dec 13, 2013 at 4:21

perreal

98.6k23 gold badges159 silver badges187 bronze badges

6 Comments

mgilson Over a year ago

seems like your __iter__ clause could be streamlined a bit: oo = (purify(it) for it in o if it is not None) (but I didn't test it, so no guarantees).

mgilson Over a year ago

Also, for it in iter(o) has a completely useless call to iter in it :P

ToolmakerSteve Over a year ago

@mgilson I like how perreal uses hasattr, rather than relying on explicitly naming types. Any disadvantages to that, versus what you did? Overall, I prefer how compact your (mgilson's) solution is, but maybe that can be done here as well?

mgilson Over a year ago

And, while this works works with defaultdict and mine doesn't, you don't preserve the defaultdict's default_factory attribute, so you don't really preserve the original...

mgilson Over a year ago

@ToolmakerSteve -- There are things that I like about it and things I don't. It will work with more objects, but there is no guarantee that objects with a .items attribute are anything like a dict. So it might not work with some objects that some of the other answers would work with. As another commenter said, this is a more or less impossible problem to solve for completely arbitrary objects. you need to draw a line and say that you'll only accept objects which conform to some API.

|

ToolmakerSteve · Accepted Answer · 2013-12-13 05:07:46Z

This is my original attempt, before posting the question. Keeping it here, as it may help explain the goal.

It also has some code that would be useful if one wants to MODIFY an existing LARGE collection, rather than duplicating the data into a NEW collection. (The other answers create new collections.)

# ---------- StripNones.py Python 2.7 ----------

import collections, copy

# Recursively remove None, from list/tuple elements, and dict key/values.
# NOTE: Changes type of iterable to list, except for strings and tuples.
# NOTE: We don't RECURSE KEYS.
# When "beImmutable=False", may modify "data".
# Result may have different collection types; similar to "filter()".
def StripNones(data, beImmutable=True):
    t = type(data)
    if issubclass(t, dict):
        return _StripNones_FromDict(data, beImmutable)

    elif issubclass(t, collections.Iterable):
        if issubclass(t, basestring):
            # Don't need to search a string for None.
            return data

        # NOTE: Changes type of iterable to list.
        data = [StripNones(x, beImmutable) for x in data if x is not None]
        if issubclass(t, tuple):
            return tuple(data)

    return data

# Modifies dict, removing items whose keys are in keysToRemove.
def RemoveKeys(dict, keysToRemove):
    for key in keysToRemove:
        dict.pop(key, None) 

# Recursively remove None, from dict key/values.
# NOTE: We DON'T RECURSE KEYS.
# When "beImmutable=False", may modify "data".
def _StripNones_FromDict(data, beImmutable):
    keysToRemove = []
    newItems = []
    for item in data.iteritems():
        key = item[0]
        if None in item:
            # Either key or value is None.
            keysToRemove.append( key )
        else:
            # The value might change when stripped.
            oldValue = item[1]
            newValue = StripNones(oldValue, beImmutable)
            if newValue is not oldValue:
                newItems.append( (key, newValue) )

    somethingChanged = (len(keysToRemove) > 0) or (len(newItems) > 0)
    if beImmutable and somethingChanged:
        # Avoid modifying the original.
        data = copy.copy(data)

    if len(keysToRemove) > 0:
        # if not beImmutable, MODIFYING ORIGINAL "data".
        RemoveKeys(data, keysToRemove)

    if len(newItems) > 0:
        # if not beImmutable, MODIFYING ORIGINAL "data".
        data.update( newItems )

    return data



# ---------- TESTING ----------
# When run this file as a script (instead of importing it):
if (__name__ == "__main__"):
    from collections import OrderedDict

    maxWidth = 100
    indentStr = '. '

    def NewLineAndIndent(indent):
        return '\n' + indentStr*indent
    #print NewLineAndIndent(3)

    # Returns list of strings.
    def HeaderAndItems(value, indent=0):
        if isinstance(value, basestring):
            L = repr(value)
        else:
            if isinstance(value, dict):
                L = [ repr(key) + ': ' + Repr(value[key], indent+1) for key in value ]
            else:
                L = [ Repr(x, indent+1) for x in value ]
            header = type(value).__name__ + ':'
            L.insert(0, header)
        #print L
        return L

    def Repr(value, indent=0):
        result = repr(value)
        if (len(result) > maxWidth) and \
          isinstance(value, collections.Iterable) and \
          not isinstance(value, basestring):
            L = HeaderAndItems(value, indent)
            return NewLineAndIndent(indent + 1).join(L)

        return result

    #print Repr( [11, [221, 222], {'331':331, '332': {'3331':3331} }, 44] )

    def printV(name, value):
        print( str(name) + "= " + Repr(value) )

    print '\n\n\n'
    data1 = ( 501, (None, 999), None, (None), 504 )
    data2 = { 1:601, 2:None, None:603, 'four':'sixty' }
    data3 = OrderedDict( [(None, 401), (12, 402), (13, None), (14, data2)] )
    data = [ [None, 22, tuple([None]), (None,None), None], ( (None, 202), {None:301, 32:302, 33:data1}, data3 ) ]
    printV( 'ORIGINAL data', data )
    printV( 'StripNones(data)', StripNones(data) )
    print '----- beImmutable = True -----'
    #printV( 'data', data )
    printV( 'data2', data2 )
    #printV( 'data3', data3 )
    print '----- beImmutable = False -----'
    StripNones(data, False)
    #printV( 'data', data )
    printV( 'data2', data2 )
    #printV( 'data3', data3 )
    print

Output:

ORIGINAL data= list:
. [None, 22, (None,), (None, None), None]
. tuple:
. . (None, 202)
. . {32: 302, 33: (501, (None, 999), None, None, 504), None: 301}
. . OrderedDict:
. . . None: 401
. . . 12: 402
. . . 13: None
. . . 14: {'four': 'sixty', 1: 601, 2: None, None: 603}
StripNones(data)= list:
. [22, (), ()]
. tuple:
. . (202,)
. . {32: 302, 33: (501, (999,), 504)}
. . OrderedDict([(12, 402), (14, {'four': 'sixty', 1: 601})])
----- beImmutable = True -----
data2= {'four': 'sixty', 1: 601, 2: None, None: 603}
----- beImmutable = False -----
data2= {'four': 'sixty', 1: 601}

Key points:

if issubclass(t, basestring): avoids searching inside of strings, as that makes no sense, AFAIK.
if issubclass(t, tuple): converts the result back to a tuple.
For dictionaries, copy.copy(data) is used, to return an object of the same type as the original dictionary.
LIMITATION: Does not attempt to preserve collection/iterator type for types other than: list, tuple, dict (& its subclasses).
Default usage copies data structures, if a change is needed. Passing in False for beImmutable can result in higher performance when a LOT of data, but will alter the original data, including altering nested pieces of the data -- which might be referenced by variables elsewhere in your code.

Wow, this is way longer than what the others wrote. Using issubclass instead of isinstance seems a bit strange. Looking at this, there are a few places that look like they might not work. I'll see if I can construct a failing test case.
If you want to remove a key from a dict, just use del d[k]. pop is for when you want the value.
@user2357112 -- Sometimes I'll use pop(k, None) even if I don't want the value so that I don't need to handle the KeyError myself...
@user2357112 Like mgilson said: the reason I used pop is to swallow any incorrect keys. Whether that is a good thing or a bad thing depends on the situation. Indeed, in THIS situation, only valid keys are passed in, so I could have used del. Maybe that would be more pythonic, to allow KeyError to propagate to caller, in case of a coding mistake.
@ToolmakerSteve: That's because your test case doesn't trigger the newItems update. In most uses of Counter, you won't have values containing None, but it's a use case the class was designed to allow. Try StripNones(collections.Counter({1: (None, None)})).

Collectives™ on Stack Overflow

Python: How to RECURSIVELY remove None values from a NESTED data structure (lists and dictionaries)?

5 Answers 5

3 Comments

Comments

8 Comments

6 Comments

10 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

Comments

8 Comments

6 Comments

10 Comments

Linked

Related