Timeline for Evaluation of a variation of quick-sort (pivot-selection)

Current License: CC BY-SA 4.0

34 events

when toggle format	what		by	license	comment
May 27, 2019 at 20:02	history	bumped	CommunityBot		This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Jan 27, 2019 at 20:01	history	bumped	CommunityBot		This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Dec 28, 2018 at 19:07	answer	added	Ravindra HV		timeline score: 1
Dec 26, 2018 at 7:53	history	edited	Jamal	CC BY-SA 4.0	deleted 324 characters in body
Dec 26, 2018 at 7:49	history	rollback	Jamal		Rollback to Revision 2
Dec 25, 2018 at 15:31	history	edited	Ravindra HV	CC BY-SA 4.0	Updated conclusion w.r.t algorithm complexity
Nov 26, 2018 at 2:17	history	edited	Ravindra HV	CC BY-SA 4.0	Corrected the worst case calculation. The explanation remains same.
Nov 17, 2018 at 18:51	comment	added	Ravindra HV		And if its already known, is there any specific reason its not already used in popular libraries - such as java - arrays - sort.
Nov 17, 2018 at 18:45	comment	added	Ravindra HV		Ok. So basically this variation of determining pivot is already known. Are you aware of any online reference?
Nov 17, 2018 at 18:42	comment	added	greybeard		(The most simple handling of "the range problem in arith. mean of two `int`s" may be `return lowVal + (int)(((long)highVal - lowVal) / 2);`)
Nov 17, 2018 at 18:42	comment	added	greybeard		`6)[…] logical extension of [pivot is arith. mean of min&max] would be [pivot is arith. mean of all keys in range]` which has been discussed in publications and would only be marginally more expensive to evaluate.
Nov 17, 2018 at 18:34	comment	added	greybeard		In my answer, I tried to stick to remarks about the code presented. Mentioning something in a review does not mean I think it should be changed or I don't like it. The preliminary remarks include a motivation why I care to suggest handling licensing information differently.
Nov 17, 2018 at 16:40	history	edited	Ravindra HV	CC BY-SA 4.0	Updated question to answer queries and concerns and also added updated understanding w.r.t worst case scenario
Nov 17, 2018 at 16:33	history	edited	Ravindra HV	CC BY-SA 4.0	Updated question to answer queries and concerns and also added updated understanding w.r.t worst case scenario
Nov 15, 2018 at 13:43	answer	added	greybeard		timeline score: 0
Nov 15, 2018 at 13:41	comment	added	greybeard		The apparent lack of enthusiasm for this question may be contributed to by difficulty of finding where in the code `the current approach to handle identical dataset is to simply consider the range as sorted`; especially as it seems to suggest itself to consider a range sorted where `highVal` equals `lowVal`.
Nov 13, 2018 at 21:50	comment	added	Ravindra HV		How well the implementation works is what is required to be evaluated.
Nov 13, 2018 at 21:46	comment	added	Ravindra HV		@greybeard - the current approach to handle identical dataset is to simply consider the range as sorted.
Nov 12, 2018 at 19:48	comment	added	greybeard		Your analysis of the case of only as many distinct values as there are bits in the key requires a way to efficiently handle identical keys.
Nov 12, 2018 at 19:01	comment	added	Ravindra HV		@greybeard - With the pairwise approach, it would indeed take n/2 steps rather than n. But as of now i’m more concerned regarding the worst case scenario that could result in nn iterations instead of somwhere around nlog(n). Thanks though.
Nov 11, 2018 at 21:48	comment	added	greybeard		(This exchange is bound to get too long for comments - too bad I don't feel chatty.) Without prior knowledge, you can "simultaneously" determine min&max comparing the smaller of each pair to min, greater to max. In each recursive call (no iteration in your code as of 18/11/11), you know the min of all elements no greater than pivot and the max of those no less.
Nov 11, 2018 at 16:03	comment	added	Ravindra HV		@greybeard Java uses dual-pivot-quicksort from jdk 1.7 onwards.. Also w.r.t 4/2n, don't see how it can be a constant (3/2) since we're required to determine the high-and low values per iteration.
Nov 10, 2018 at 18:26	comment	added	Ravindra HV		Have tried to present why powers of two is not a concern at least for large data-sets in the above description. Believe the code already handles identical datasets and if not can add a check for that. Looking for a confirmation on this understanding. If not it would be helpful in taking it further if anyone an suggest a different dataset that will result in n*n operations.
Nov 10, 2018 at 18:21	comment	added	Ravindra HV		The intent is to determine if the approach is good enough to stand on its own ( like the median of three approach ). For that, from what I’ve looked into it - number sequence in powers of two and as you suggested same numbers are two possible inputs.
Nov 10, 2018 at 17:15	comment	added	domen		If your intent is to learn, then sure, go ahead try to figure them out, but chances are there will always be a worst case scenario to be found unless you got to something like Median-of-medians which is provably good. I guess it boils down to whatever is your intention behind this (discovering a new pivot selection algorithm, trying to figure out how existing ones work, finding an optimal one), and what do you actually want the site visitors to do with your question.
Nov 9, 2018 at 17:22	history	edited	mdfst13	CC BY-SA 4.0	added 27 characters in body
Nov 9, 2018 at 13:56	comment	added	Ravindra HV		@domen got that. Which is why i’m Trying to identify all worst case scenarios. One is power of two sequence, which is noted above. The other as you mentioned is identical datasets, which with some changes to implementation (skip and continue), can still be overcome. Am looking for confirmation on the above understanding and also for worst case scenario data-set other than those two.
Nov 9, 2018 at 13:34	comment	added	domen		Pivot selection is `n` operations, and if bad pivot is selected, you have to perform that `n` times (each time reducing the unsorted array by 1 element) - `O(n*n)`.
Nov 9, 2018 at 10:16	comment	added	Ravindra HV		@domen yes all unique elements also will qualify for worst case. Need to add a check to skip it. Can you elaborate on the scenario where it’s n*n ? I can skip sorting for identical values. For number sequence in powers of two, have included my understanding in the post above. Thanks!
Nov 9, 2018 at 10:10	comment	added	Ravindra HV		@greybeard Large datasets are in scope. I do remember java using MoT and from memory it’s Arrays.sort(). Need to figure out 3/2 approach. I only logged recursion count to confirm n.log(n) behaviour.
Nov 9, 2018 at 9:39	comment	added	domen		"requires a maximum of 2nlog2(n)." Worst case this will be O(n*2). You need pivot selection that guarantees a better split (Median of medians for example) to have a worst case of O(nlog(n)). Doesn't the case of all numbers being the same also trigger worst case for you?
Nov 9, 2018 at 8:23	comment	added	greybeard		(Did you check the Java runtime you plan to use to compare does use MoT in `Arrays.sort()`?) Decide and document whether you care about "huge" sorts (cache small compared to data set). In that case, memory access pattern can dwarf number of "CPU operations" in impact on run time. `determinePivot()` as a separate method is a clean separation of concerns, but almost mandates additional passes - which I can't find entirely warranted. Your `determinePivot()` use 4/2n comparisons for min&max where 3/2 is "standard". "Recursion level min&max" won't change: you only need low part max & high part min.
Nov 8, 2018 at 21:10	review	First posts
Nov 8, 2018 at 21:30
Nov 8, 2018 at 21:08	history	asked	Ravindra HV	CC BY-SA 4.0

toggle format