I've implemented a multithreaded merge sort using Java's ForkJoin framework, and I wanted to gather feedback on its correctness, efficiency, and scalability.
Here's my implementation:
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.ForkJoinTask;
import java.util.concurrent.RecursiveTask;
public class CustomRecursiveTaskForArraysSorting extends RecursiveTask<List<Integer>> {
private static final int THRESHOLD = 2;
private final List<Integer> workload;
public CustomRecursiveTaskForArraysSorting(List<Integer> workload) {
if (workload == null) {
throw new IllegalArgumentException("Workload cannot be null");
}
this.workload = workload;
}
@Override
protected List<Integer> compute() {
if (workload.size() > THRESHOLD) {
return ForkJoinTask.invokeAll(createSubtasks(workload))
.stream()
.map(ForkJoinTask::join)
.reduce(this::merge)
.orElseThrow();
} else {
return processing(workload);
}
}
private List<CustomRecursiveTaskForArraysSorting> createSubtasks(List<Integer> workload) {
List<CustomRecursiveTaskForArraysSorting> subtasks = new ArrayList<>();
List<Integer> firstHalf = workload.subList(0, workload.size() / 2);
List<Integer> secondHalf = workload.subList(workload.size() / 2, workload.size());
subtasks.add(new CustomRecursiveTaskForArraysSorting(new ArrayList<>(firstHalf)));
subtasks.add(new CustomRecursiveTaskForArraysSorting(new ArrayList<>(secondHalf)));
return subtasks;
}
private List<Integer> processing(List<Integer> workload) {
Collections.sort(workload);
return workload;
}
private List<Integer> merge(List<Integer> a, List<Integer> b) {
List<Integer> merged = new ArrayList<>();
int i = 0, j = 0;
while (i < a.size() && j < b.size()) {
if (a.get(i) < b.get(j)) {
merged.add(a.get(i++));
} else {
merged.add(b.get(j++));
}
}
merged.addAll(a.subList(i, a.size()));
merged.addAll(b.subList(j, b.size()));
return merged;
}
}
Key Features:
- I use RecursiveTask to divide the array into smaller parts recursively until the threshold size is met, after which the smaller arrays are sorted directly
- Merge functionality combines two sorted subarrays into a single sorted array
- Small arrays are sorted using Collections.sort()
Questions:
- Are there any edge cases or performance bottlenecks that I might have overlooked?
- Could this implementation be optimized further for larger datasets or higher concurrency?
- How does this compare to manual thread management in terms of performance and readability?
I appreciate any constructive feedback and suggestions to improve the implementation.