Reduce size of array based on multiple column criteria in python

Question

I need to reduce the size of an array, based on criteria found on another array; I need to look into the relationships and change the value based on the new information. Here is a simplified version of my problem.

I have an array (or dataframe) with my data:

data = np.array([[[[1, 2, 3, 4], [5, 6, 7, 8]]]]).reshape((4,2))

I have another file, of different size, that holds information about the values in the data array:

a = np.array([[1, 1, 2],[2, 3, 4],[3, 5, 6], [4, 7, 8]  ]).reshape((4,3))

The information I have in a tells me how I can reduce the size of data, for example a[0] tells me that data[0][0:2] == a[0][1:].

so I can replace the unique value a[0][0:1] with data[0][0:2] (effectively reducing the size of array data

To clarify, array a holds three pieces of information per position, a[0] has the information 1, 1, 2 - now I want to scan through the data array, and when the a[i][1:] is equal to any of the data[i][0:2] or data[i][2:] then I want to replace the value with the a[i][0:1] - is that any clearer?

my final array should be like this:

new_format = np.array([[[[1, 2], [3,4]]]]).reshape((2,2))

There are questions like the following: Filtering a DataFrame based on multiple column criteria but are only based on filtering based on certain numerical criteria.

I can't understand further than 'The information I have in a tells me how I can reduce the size of data'. How exactly it tells you this? I really don't understand — roman
– roman, Commented Feb 13, 2016 at 15:00

Dimitris · Accepted Answer · 2016-02-15 18:24:35Z

0

I figured out a way to do it, using the pandas library. Probably not the best solution, but worked from me. In my case I read the data in the pandas library, but for the posted example I can convert the arrays to dataframes

datas = pd.DataFrame(data) ##convert to dataframe
az = pd.DataFrame(a)
datas= datas.rename(columns={'0': '1', '1': '2'}) ## rename columns for comparison with a array
new_format= pd.merge(datas, az, how='right') #do the comparison

new_format = new_format.drop(['1','2'],1) #drop the old columns, keeping only the new format

answered Feb 15, 2016 at 18:24

Dimitris

4452 gold badges9 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Reduce size of array based on multiple column criteria in python

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related