Skip to main content
2 of 2
added 929 characters in body
Eric Duminil
  • 4k
  • 1
  • 19
  • 27

As you mentioned, Pandas or at least NumPy would do just fine. They're fast and the syntax is clean and straightforward for this example.

With NumPy

You just need to define a mask as a boolean array:

import numpy as np
mask = np.array([1, 0, 0, 1, 1, 1, 1, 1, 0, 1], dtype=np.bool)

And apply the mask or its invert:

val = np.array([45, 12, 36, 48, 48, 59, 5, 4, 32, 7])
val[mask]
# array([45, 48, 48, 59,  5,  4,  7])
val[~mask]
# array([12, 36, 32])

mask really needs to be a boolean array. You'd get an incorrect result otherwise:

val = np.array([45, 12, 36, 48, 48, 59, 5, 4, 32, 7])
mask = np.array([1, 0, 0, 1, 1, 1, 1, 1, 0, 1])
val[mask]
# array([12, 45, 45, 12, 12, 12, 12, 12, 45, 12])

With Pandas

You're working with dicts of arrays? That's basically what pandas.DataFrames are for!

import pandas as pd
import numpy as np
d = {
  'prof': [1,0,0,1,1,1,1,1,0,1],
  'val': [45,12,36,48,48,59,5,4,32,7],
  'test': [1, 2, 3, 4, 5, 6, 7, 8, 9,10]
}
key = 'prof'

Define your mask first, as with numpy:

mask = np.array(d.pop(key), dtype=np.bool)

Define your dataframe:

df = pd.DataFrame(d)

Mask it and export it as a dict of lists:

df[mask].to_dict('list')
# {'test': [1, 4, 5, 6, 7, 8, 10], 'val': [45, 48, 48, 59, 5, 4, 7]}

df[~mask].to_dict('list')
# {'test': [2, 3, 9], 'val': [12, 36, 32]}

Done! The huge advantage is that anyone with some experience of numpy or pandas will understand the code right away.

Eric Duminil
  • 4k
  • 1
  • 19
  • 27