Return to Answer

added 929 characters in body

Source Link

edited Apr 20, 2018 at 13:05

As you mentioned, Pandas or at least NumPy would do just fine. They're fast and the syntax is clean and straightforward for this example.

With NumPy

You just need to define a mask as a boolean array:

import numpy as np
mask = np.array([1, 0, 0, 1, 1, 1, 1, 1, 0, 1], dtype=np.bool)

And apply the mask or its invert:

val = np.array([45, 12, 36, 48, 48, 59, 5, 4, 32, 7])
val[mask]
# array([45, 48, 48, 59,  5,  4,  7])
val[~mask]
# array([12, 36, 32])

mask really needs to be a boolean array. You'd get an incorrect result otherwise:

val = np.array([45, 12, 36, 48, 48, 59, 5, 4, 32, 7])
mask = np.array([1, 0, 0, 1, 1, 1, 1, 1, 0, 1])
val[mask]
# array([12, 45, 45, 12, 12, 12, 12, 12, 45, 12])

With Pandas

You're working with dicts of arrays? That's basically what pandas.DataFrames are for!

import pandas as pd
import numpy as np
d = {
  'prof': [1,0,0,1,1,1,1,1,0,1],
  'val': [45,12,36,48,48,59,5,4,32,7],
  'test': [1, 2, 3, 4, 5, 6, 7, 8, 9,10]
}
key = 'prof'

Define your mask first, as with numpy:

mask = np.array(d.pop(key), dtype=np.bool)

Define your dataframe:

df = pd.DataFrame(d)

Mask it and export it as a dict of lists:

df[mask].to_dict('list')
# {'test': [1, 4, 5, 6, 7, 8, 10], 'val': [45, 48, 48, 59, 5, 4, 7]}

df[~mask].to_dict('list')
# {'test': [2, 3, 9], 'val': [12, 36, 32]}

Done! The huge advantage is that anyone with some experience of numpy or pandas will understand the code right away.

As you mentioned, Pandas or at least NumPy would do just fine. They're fast and the syntax is clean and straightforward for this example.

You just need to define a mask as a boolean array:

import numpy as np
mask = np.array([1, 0, 0, 1, 1, 1, 1, 1, 0, 1], dtype=np.bool)

And apply the mask or its invert:

val = np.array([45, 12, 36, 48, 48, 59, 5, 4, 32, 7])
val[mask]
# array([45, 48, 48, 59,  5,  4,  7])
val[~mask]
# array([12, 36, 32])

mask really needs to be a boolean array. You'd get an incorrect result otherwise:

val = np.array([45, 12, 36, 48, 48, 59, 5, 4, 32, 7])
mask = np.array([1, 0, 0, 1, 1, 1, 1, 1, 0, 1])
val[mask]
# array([12, 45, 45, 12, 12, 12, 12, 12, 45, 12])

As you mentioned, Pandas or at least NumPy would do just fine. They're fast and the syntax is clean and straightforward for this example.

With NumPy

You just need to define a mask as a boolean array:

import numpy as np
mask = np.array([1, 0, 0, 1, 1, 1, 1, 1, 0, 1], dtype=np.bool)

And apply the mask or its invert:

val = np.array([45, 12, 36, 48, 48, 59, 5, 4, 32, 7])
val[mask]
# array([45, 48, 48, 59,  5,  4,  7])
val[~mask]
# array([12, 36, 32])

mask really needs to be a boolean array. You'd get an incorrect result otherwise:

val = np.array([45, 12, 36, 48, 48, 59, 5, 4, 32, 7])
mask = np.array([1, 0, 0, 1, 1, 1, 1, 1, 0, 1])
val[mask]
# array([12, 45, 45, 12, 12, 12, 12, 12, 45, 12])

With Pandas

You're working with dicts of arrays? That's basically what pandas.DataFrames are for!

import pandas as pd
import numpy as np
d = {
  'prof': [1,0,0,1,1,1,1,1,0,1],
  'val': [45,12,36,48,48,59,5,4,32,7],
  'test': [1, 2, 3, 4, 5, 6, 7, 8, 9,10]
}
key = 'prof'

Define your mask first, as with numpy:

mask = np.array(d.pop(key), dtype=np.bool)

Define your dataframe:

df = pd.DataFrame(d)

Mask it and export it as a dict of lists:

df[mask].to_dict('list')
# {'test': [1, 4, 5, 6, 7, 8, 10], 'val': [45, 48, 48, 59, 5, 4, 7]}

df[~mask].to_dict('list')
# {'test': [2, 3, 9], 'val': [12, 36, 32]}

Done! The huge advantage is that anyone with some experience of numpy or pandas will understand the code right away.

Source Link

answered Apr 20, 2018 at 12:44

Eric Duminil

As you mentioned, Pandas or at least NumPy would do just fine. They're fast and the syntax is clean and straightforward for this example.

You just need to define a mask as a boolean array:

import numpy as np
mask = np.array([1, 0, 0, 1, 1, 1, 1, 1, 0, 1], dtype=np.bool)

And apply the mask or its invert:

val = np.array([45, 12, 36, 48, 48, 59, 5, 4, 32, 7])
val[mask]
# array([45, 48, 48, 59,  5,  4,  7])
val[~mask]
# array([12, 36, 32])

mask really needs to be a boolean array. You'd get an incorrect result otherwise:

val = np.array([45, 12, 36, 48, 48, 59, 5, 4, 32, 7])
mask = np.array([1, 0, 0, 1, 1, 1, 1, 1, 0, 1])
val[mask]
# array([12, 45, 45, 12, 12, 12, 12, 12, 45, 12])