I am working with a huge 2d dataset and need a range query for every point, returning the neighbours within a range as a set I already have tested using an index with KD Tree form sk learn, but the problem is, it returns the index as a list and the converting to a set takes too long. Is there a data structure, which returns the points from a range query as a set and not as a list?
-
1Are you sure that constructing a Set from a List takes too long? You make it sound like it takes as long as or even longer than the range query itself, in that case I suspect there is something wrong with your creation of a set. Maybe you could show us some timing numbers? Also, every spatial tree would also have to internally add the result points to a Set, so that cannot really be faster than when you iterate over a list to create a Set...TilmannZ– TilmannZ2017-05-28 11:11:26 +00:00Commented May 28, 2017 at 11:11
-
to your question here is the answer stackoverflow.com/questions/44224696/… :)Felix Ha– Felix Ha2017-05-30 17:07:34 +00:00Commented May 30, 2017 at 17:07
-
Well the reference tells only why conversion to a Set takes longer than conversion to a List. My question was whether it is really the bottleneck, usually range query should take longer than the creation of a set or list. Also, as I mentioned, if you want a Set, someone has to bear the cost of creating it, whether it is you or whether it is the KD-Tree doing it internally...TilmannZ– TilmannZ2017-05-31 18:30:53 +00:00Commented May 31, 2017 at 18:30
Add a comment
|
1 Answer
The result is not natively a list.
Get the source code of the k-d-tree, and modify so it directly writes to a set, instead of a list.
But I highly doubt this will solve your actual problem. Converting a small list to a set should barely be a performance issue... but well, you are using python. A traditional python set() will be a lot lot lot slower than an numpy array. But don't blame the data structure for not using a slow set.