2
\$\begingroup\$

I'm using a function to determine if resources can be used again or not. This is the numpy array I'm using.

 'Resource_id', 'Start_date', 'end_date', 'start_time', 'end_time', 'overload'
   [548, '2019-05-16', '2019-05-16', '08:45:00', '17:40:00',2],
   [546, '2019-05-16', '2019-05-16', '08:45:00', '17:40:00',2],
   [546, '2019-05-16', '2019-05-16', '08:45:00', '17:40:00',2],
   [543, '2019-05-16', '2019-05-16', '08:45:00', '17:40:00',1],
  1. First step is to find all resource available on a date, for example (2019-05-16 from 8:30 to 17:30). To achieve this I used np.where like the example below:

     av_resource_np = resource_availability_np[np.where(
     (resource_availability_np[:,1] <= '2019-05-16')
     & (resource_availability_np[:,2] >= '2019-05-16')
     & (resource_availability_np[:,3] <= '17:30:00') 
     & (resource_availability_np[:,4] >= '08:30:00'))]
    
  2. Here I try to find unique resource ids and the sum of their overload factor using np.unique():

     unique_id, count_nb = np.unique(av_resource_np[:,(0,5)], axis=0, return_counts=True)
     availability_mat = np.column_stack((unique_id, count_nb ))
    

Which yields the following results:

'Resource_id' 'overload' 'Count'
548           2           1
546           2           2
543           1           1
  1. A simple filtering is done to select which resource hat can't be used in this date. If a resource is used in the same date more or equal (>=) to itsoverload, then we can't use it again.

      rejected_resources = availability_mat [np.where(availability_mat [:, 2] >= availability_mat [:, 1])]
    

Result here should be both resource 543 and 546 which can't be used again.

So this is main idea behind my function. The problem here is that it takes more than 60% of the whole program runtime, and I would appreciate any advice about how to make it more efficient/faster.

Full code:

def get_available_rooms_on_date_x(date, start_time, end_time, resource_availability_np):

    av_resource_np = resource_availability_np[np.where(
    (resource_availability_np[:,1] <= date)
    & (resource_availability_np[:,2] >= date)
    & (resource_availability_np[:,3] <= end_time) 
    & (resource_availability_np[:,4] >= start_time))]

    unique_id, count_nb = np.unique(av_resource_np[:,(0,5)], axis=0, return_counts=True)

    availability_mat = np.column_stack((unique_id, count_nb ))
    
    rejected_resources = availability_mat [np.where(availability_mat [:, 2] >= availability_mat [:, 1])]
    
    return rejected_resources
\$\endgroup\$
6
  • \$\begingroup\$ Welcome to the Code Review Community. This question could be improved if you change the title to something like Room Reservation System and include the entire program. You state that the function takes 60% of the program execution time, it would help us optimize the code if we could see the rest of the program. The title should be about what the code does, and not what your concerns are about the code. Actual questions about the code should be in the body of the post. Please read How do I ask a good question? for more details. \$\endgroup\$ Commented Jan 12, 2021 at 12:28
  • \$\begingroup\$ I see you tagged this with pandas, but didn't mention it in your question or use it in your example code. Are you considering using a pandas dataframe instead of your array? \$\endgroup\$ Commented Jan 13, 2021 at 2:00
  • \$\begingroup\$ Also, is your numpy array a record array for an ndarray with an object dtype? \$\endgroup\$ Commented Jan 13, 2021 at 2:01
  • \$\begingroup\$ @PaulH, i was using this with pandas but it gave me worse performance so i tried it with numpy. Also ndarray has the correct type for each column. \$\endgroup\$ Commented Jan 13, 2021 at 9:01
  • \$\begingroup\$ Could you add code that generates the ndarray? \$\endgroup\$ Commented Jan 13, 2021 at 15:22

1 Answer 1

1
\$\begingroup\$

Unless you have a really good reason (you don't seem to), start_date and start_time should never be represented independently, and should always be represented as a combined datetime. Same with the ends.

Moving from Pandas to Numpy (a) wasn't particularly done correctly, because all of your dtypes were degraded to object since you didn't use record arrays, and (b) probably shouldn't be done at all. Just use Pandas correctly. Doing anything else is probably premature optimisation (maybe even anti-optimisation) and harms the legibility and functionality of your program.

This:

Here I try to find unique resource ids and the sum of their overload factor

is a lie. Your unique() does a count, not a sum; let's assume that you actually wanted a count.

unique shouldn't be used at all, though; what you're looking for is a simple Pandas groupby.

The name get_available_rooms_on_date_x also seems like a lie, since it doesn't return available rooms; it returns rejected rooms.

All together,

import numpy as np
import pandas as pd


def get_rejected_rooms_on_date(
    needed_start: pd.Timestamp,
    needed_end: pd.Timestamp,
    resource_availability: pd.DataFrame,
) -> pd.DataFrame:
    available = resource_availability.query(
        '(start_datetime <= @needed_end) & (end_datetime >= @needed_start)'
    )

    availability_mat = (
        available.groupby('resource_id')['overload']
        .agg(['first', 'count'])
        .rename(columns={'first': 'overload'})
    )

    rejected_resources = availability_mat.query('count >= overload')
    return rejected_resources


def demo() -> None:
    resource_availability = pd.DataFrame({
        'resource_id': (548, 546, 546, 543),
        'start_datetime': np.array(
            ('2019-05-16T08:45', '2019-05-16T08:45', '2019-05-16T08:45', '2019-05-16T08:45'),
            dtype='datetime64[s]',
        ),
        'end_datetime': np.array(
            ('2019-05-16T17:40', '2019-05-16T17:40', '2019-05-16T17:40', '2019-05-16T17:40'),
            dtype='datetime64[s]',
        ),
        'overload': (2, 2, 2, 1),
    })
    print('Available:')
    print(resource_availability)
    print()

    rejected = get_rejected_rooms_on_date(
        needed_start=pd.Timestamp('2019-05-16T08:30'),
        needed_end=pd.Timestamp('2019-05-16T17:30'),
        resource_availability=resource_availability,
    )
    print('Rejected:')
    print(rejected)


if __name__ == '__main__':
    demo()
Available:
   resource_id      start_datetime        end_datetime  overload
0          548 2019-05-16 08:45:00 2019-05-16 17:40:00         2
1          546 2019-05-16 08:45:00 2019-05-16 17:40:00         2
2          546 2019-05-16 08:45:00 2019-05-16 17:40:00         2
3          543 2019-05-16 08:45:00 2019-05-16 17:40:00         1

Rejected:
             overload  count
resource_id                 
543                 1      1
546                 2      2
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.