Reference link to the problem:
https://www.hackerrank.com/challenges/fraudulent-activity-notifications/problem
*The problem is the following: *
We need to monitor a client's expenditure over a period and issue notifications if the expenditure for a given day is at least twice the median of the expenditure for the preceding 'd' days.
- So what is a median:
The median of a finite list of numbers is the "middle" number when those numbers are listed in order from smallest to greatest.
ref: https://en.wikipedia.org/wiki/Median
The solution I tried before looking out for optimization options is the following:
/*
* Complete the 'activityNotifications' function below.
*
* The function is expected to return an INTEGER.
* The function accepts following parameters:
* 1. INTEGER_ARRAY expenditure
* 2. INTEGER d
*/
public static int activityNotifications(List<Integer> expenditure, int d) {
int noticesNumber = 0;
for (int currentDay = d; currentDay < expenditure.size(); currentDay++) {
// Extract and sort the sublist of the last 'd' expenditures
List<Integer> sublistToCheck = expenditure
.subList(currentDay - d, currentDay)
.stream()
.sorted()
.collect(Collectors.toList());
// Calculate the median
double medianValue;
if (d % 2 == 1) {
// Odd number of days: single middle value
medianValue = sublistToCheck.get(d / 2);
} else {
// Even number of days: average of the two middle values
medianValue = (sublistToCheck.get(d / 2 - 1)
+ sublistToCheck.get(d / 2)) / 2.0;
}
// Check if the current day's expenditure is at least twice the median
if (expenditure.get(currentDay) >= medianValue * 2) {
noticesNumber++;
}
}
return noticesNumber;
}
*Non-optimized algorithm analysis: *
- Initialization ` 'noticesNumber' is initialized to 0
The main loop runs from "currentDay=d" because we need at least 'd' days of data to consider issuing a notification.
`
- Extract and sort a sublist:
` For each day 'currentDay' we extract the sublist from 'currentDay - d' to 'currentDay'.
The sublist represents the last 'd' days' expenditures.
We sort this sublist to facilitate median calculation.
`
Median calculation:
`if 'd' is odd, the median is the middle element of the sorted sublist.
if 'd' is even, the median is the average of the two middle elements.
- Note that for even 'd', the average should be calculated using floating point division to ensure precision.`
Notification check :
` We compare the current day's expenditure twice the calculated median.
If the current day's expenditure is at least twice the median, we increment 'noticesNumber'.
`
*Time complexity analysis of the non-optimized solution: *
- main loop
The main loop runs n-d times if 'expenditure.size()' is equal to 'n'
- *sublist extraction and sorting *
For each iteration of the loop, we extract a sublist of size 'd' from 'expenditure'. This operation has a time complexity of O(d).
The sublist is then sorted using stream.sorted(), which has time complexity of O(dlogd).
- median calculation
Calculating the median involves accessing one or two elements of the sorted sublist. The operation is O(1).
- notification check
The comparison of the current day's expenditure with twice the median value is simple arithmetic operation, which is O(1)
- total time complexity
Sublist extraction and sorting: O(d) + O(dlogd) = O(dlogd) for each iteration.
Median calculation and comparison: O(1) for each iteration.
Since the dominant term is O(dlogd) for each iteration,
The total time complexity of the entire loop is:
_(n-d) * O(dlogd) = O((n-d) * dlogd) _
Which simplified is O(n*dlogd)
Analysis of Min-Max Heap optimized solution
public static int activityNotifications(List<Integer> expenditure, int d) {
if (expenditure == null || expenditure.size() < d) return 0;
int noticesNumber = 0;
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
for (int i = 0; i < d; i++) {
addNumber(expenditure.get(i), maxHeap, minHeap);
}
for (int i = d; i < expenditure.size(); i++) {
double median = getMedian(maxHeap, minHeap);
if (expenditure.get(i) >= 2 * median) {
noticesNumber++;
}
if (!maxHeap.remove(expenditure.get(i - d))) {
minHeap.remove(expenditure.get(i - d));
}
addNumber(expenditure.get(i), maxHeap, minHeap);
}
return noticesNumber;
}
private static void addNumber(int num, PriorityQueue<Integer> maxHeap, PriorityQueue<Integer> minHeap) {
if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
maxHeap.add(num);
} else {
minHeap.add(num);
}
if (maxHeap.size() > minHeap.size() + 1) {
minHeap.add(maxHeap.poll());
} else if (minHeap.size() > maxHeap.size()) {
maxHeap.add(minHeap.poll());
}
}
private static double getMedian(PriorityQueue<Integer> maxHeap, PriorityQueue<Integer> minHeap) {
if (maxHeap.size() == minHeap.size()) {
return ((double) maxHeap.peek() + minHeap.peek()) / 2;
} else {
return maxHeap.peek();
}
}
The solution above uses two heaps :
a min-heap :
and a max heap
You can brush up your heap knowledge from this article:
https://www.geeksforgeeks.org/difference-between-min-heap-and-max-heap/
-
In a Min-Heap the minimum key element is present at the root.
- In a Max-Heap the maximum key element is present at the root.
Algorithm breakdown:
- Initialization
'noticesNumber' is initialized at 0;
Two heaps are used:
- maxHeap (a max-heap) stores the lower half of the numbers.
- minHeap (a min-heap) stores the upper half of the numbers.
Initial Population
...
for (int i = 0; i < d; i++) {
addNumber(expenditure.get(i), maxHeap, minHeap);
}
...
private static void addNumber(int num,
PriorityQueue<Integer> maxHeap,
PriorityQueue<Integer> minHeap) {
/*
peek - Retrieves, but does not remove, the head of this queue,
or returns null if this queue is empty.
*/
if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
maxHeap.add(num);
} else {
minHeap.add(num);
}
if (maxHeap.size() > minHeap.size() + 1) {
/* poll - Retrieves and removes the head of this queue,
or returns null if this queue is empty */
minHeap.add(maxHeap.poll());
} else if (minHeap.size() > maxHeap.size()) {
maxHeap.add(minHeap.poll());
}
}
For the first 'd' days, add each expenditure to the appropriate heap using 'addNumber' method.
- Processing subsequent days
For each day from 'd' to the end of the expenditure list:
Compute the median using the 'getMedian' method:
Check if the current day's expenditure is at least twice the median. If so increment 'noticesNumber'
Remove the expenditure From 'd' days ago from the appropriate heap.
Add the current day's expenditure to the appropriate heap.
- Heap operations:
'addNumber': Adds a number to one of the heaps and balances the heaps if necessary
private static void addNumber(int num, PriorityQueue<Integer> maxHeap, PriorityQueue<Integer> minHeap) {
if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
maxHeap.add(num);
} else {
minHeap.add(num);
}
if (maxHeap.size() > minHeap.size() + 1) {
minHeap.add(maxHeap.poll());
} else if (minHeap.size() > maxHeap.size()) {
maxHeap.add(minHeap.poll());
}
}
'getMedian': Computes the median based on the sizes and top elements of the heap
private static double getMedian(PriorityQueue<Integer> maxHeap, PriorityQueue<Integer> minHeap) {
if (maxHeap.size() == minHeap.size()) {
return ((double) maxHeap.peek() + minHeap.peek()) / 2;
} else {
return maxHeap.peek();
}
}
Time complexity analysis
- Heap operation
Insertion('addNumber'): Each insertion operation into a heaps takes O(logd)
- The initial population of Heaps
For the first 'd' elements, we perform 'addNumber' which is O(logd) per insertion, therefore it takes *O(dlogd) *
private static void addNumber(int num, PriorityQueue<Integer> maxHeap, PriorityQueue<Integer> minHeap) {
if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
maxHeap.add(num);
} else {
minHeap.add(num);
}
if (maxHeap.size() > minHeap.size() + 1) {
minHeap.add(maxHeap.poll());
} else if (minHeap.size() > maxHeap.size()) {
maxHeap.add(minHeap.poll());
}
}
** Time complexity of insertion into a heap: **
Inserting an element into a heap takes O(logd) for a heap of size 'd' because the heap needs to maintain its properties (binary tree structure and heap property)
Rebalancing heaps:
When 'maxHeap' and 'minHeap' are unbalanced, the method involves two primary operations:
- Removing the root of one heap, which takes O(logd)
- Inserting this element into the other heap, which also takes *O(log d) *
Since rebalancing involves one removal and one insertion, the total time for rebalancing is:
O(logd) + O(logd) =2*O(logd) = O(logd)
In conclusion the Total time complexity for each call to 'addNumber' involves:
- One insertion into a heap: O(logd)
- Potentially one rebalancing operation O(logd)
Since these operations are sequential and not nested,
*the total time complexity for each call to 'addNumber' remains O(logd) *
- Processing subsequent days
For each of the remaining n - d days:
- Compute the median O(1)
- Remove the old expenditure O(logd)
- Add the new expenditure: O(logd)
- Each day's operation takes O(logd), and we perform these operations for n-d days, resulting in O((n-d) logd)
Overall the total time complexity of the min-max heap solution is
Initial heap population: O(dlogd)
processing remaining days: O((n-d)logd)
=> O(dlogd) + O((n-d)*logd) = O(nlogd)
Which is significantly more efficient than the O(n*dlogd) time complexity of the non-optimized solution.
*Summary of the min-max heap solution: *
The min-max heap solution efficiency maintains the median for each sliding window of size 'd' using 2 heaps. By leveraging the properties of the heaps, it achieves a time complexity of O(nlogd), making it well-suited for large datasets, compared to the non-optimized sorting-based approach.
Counting sort solution
The implementation of the Counting sort optimization is the following:
import java.util.List;
public class Result {
public static int activityNotifications(List<Integer> expenditure, int d) {
int noticesNumber = 0;
// Assuming expenditures are between 0 and 200
// as stated nin th task description.
int[] count = new int[201];
// Initialize the count array with the first 'd' expenditures
for (int i = 0; i < d; i++) {
count[expenditure.get(i)]++;
}
for (int i = d; i < expenditure.size(); i++) {
int medianValue = getMedian(count, d);
if (expenditure.get(i) >= 2 * medianValue) {
noticesNumber++;
}
// Update the count array for the sliding window
count[expenditure.get(i)]++;
count[expenditure.get(i - d)]--;
}
return noticesNumber;
}
/*
If d is even, we need to find the two middle values and take their average.
If d is odd, we find the single middle value.
We traverse the count array, accumulating counts until we reach the required positions to determine the median.
*/
private static int getMedian(int[] count, int d) {
int sum = 0;
int median1 = -1;
int median2 = -1;
if (d % 2 == 0) {
for (int i = 0; i < count.length; i++) {
sum += count[i];
if (median1 == -1 && sum >= d / 2) {
median1 = i;
}
if (sum >= d / 2 + 1) {
median2 = i;
break;
}
}
return (median1 + median2) / 2;
} else {
for (int i = 0; i < count.length; i++) {
sum += count[i];
if (sum > d / 2) {
return i;
}
}
}
return 0;
}
}
Algorithm explanation:
- Initialization:
'noticesNumber' initialized to 0.
A count array of size 201 is used to keep track of the frequency of expenditures which are restricted by the task conditions within a range of 0 to 200.
- The initial population of the count array:
for (int i = 0; i < d; i++) {
count[expenditure.get(i)]++;
}
The first 'd' elements of the 'expenditure' list are used to populate the 'count' array.
Processing subsequent days:
` for (int i = d; i < expenditure.size(); i++) {
int medianValue = getMedian(count, d);
if (expenditure.get(i) >= 2 * medianValue) {
noticesNumber++;
}
// Update the count array for the sliding window
count[expenditure.get(i)]++;
count[expenditure.get(i - d)]--;
}`
For each day from 'd' to the end of the 'expenditure' list:
Compute the median using the 'getMedian' method
Check if the current day's expenditure is at least twice the median. If so, increment 'noticesNumber'
Update the 'count' array to reflect the new sliding window
*Median calculation *
private static int getMedian(int[] count, int d) {
int sum = 0;
int median1 = -1;
int median2 = -1;
if (d % 2 == 0) {
for (int i = 0; i < count.length; i++) {
sum += count[i];
if (median1 == -1 && sum >= d / 2) {
median1 = i;
}
if (sum >= d / 2 + 1) {
median2 = i;
break;
}
}
return (median1 + median2) / 2;
} else {
for (int i = 0; i < count.length; i++) {
sum += count[i];
if (sum > d / 2) {
return i;
}
}
}
return 0;
}
The median method traverses the 'count' array to find the median
For even 'd', it finds the two median values and returns their average
For odd 'd' it finds the single middle value.
Time complexity analysis
The initial population of the count array
Involves iterating through the first 'd' elements of the expenditure list: ** O(d)**
- Processing subsequent days:
For each of the remaining n-d days:
*Median Calculation ('getMedian' method) * :
Traversing the 'count' array to find the median takes O(R), where R is the range of possible expenditures (201 in our case)
Updating the count array
Updating the count array involves two constant-time operations for each day.
Time complexity for each day: O(R) + O(1) = O(R)
Since we perform these operations for n-d days, the total time complexity of the loop is O((n-d)*R)
- Total time complexity - O(n)
Combining the complexities, we get:
Initial population of the count array: O(d)
Processing the remaining days : O((n-d) *R)
Overal, the total time complexity is:
O(d) + O((n-d) R) = O(d) + (n*R) = O(n*R)
and given that R is 201 we can simplify to:
O(n)
Comparison of the min-max heap solution to the counting sort solution
Based on time complexity:
The counting sort approach has a time complexity of O(nd)
while the min-max heap approach has a time complexity of O(nlogd)
- Winner: The min-max heap approach is generally more efficient in terms of time complexity, especially when 'd' is large.
For most practical scenarios where efficiency is crucial, and 'd' is relatively large, the min-max heap approach is better. However, for cases with a small fixed range of expenditures, the counting sort approach might be sufficient and simpler to implement.
Top comments (0)