Timeline for intersection beween 2 files (values in file 1 which fall in range of values in file 2)
Current License: CC BY-SA 3.0
17 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Nov 28, 2017 at 15:08 | comment | added | Gnudiff | @igal I was not completely certain of that. However, in case you are right, there probably are some rows that somehow reset the result? | |
| Nov 28, 2017 at 13:33 | comment | added | igal | @Gnudiff I had the same initial thought as you, but we're being told that the program runs to completion without error. I would have expected it to hand or crash if it ran out of memory - don't you think? | |
| Nov 28, 2017 at 10:53 | comment | added | Gnudiff | If the files are very large, processing them in memory might take a long time and/or exhaust memory. In that case, you would probably be better off changing this solution into sqlite one, putting the rows into SQL db and running a query on them. For sql purposes, the structure seems very simple. | |
| Nov 28, 2017 at 5:29 | comment | added | igal | @Anna1364 You can sign up for GitHub for free. Or you can share a public link from a cloud hosting server (e.g. Google Drive, DropBox, etc.). | |
| Nov 28, 2017 at 5:06 | comment | added | Anna1364 | @igal, I do not have any GitHub page! Any other idea where can I share my data with you? Your code gives exactly what I want, just does not work with the entire data. Thanks for your help! – | |
| Nov 27, 2017 at 6:12 | comment | added | igal | @Anna1364 I'd rather not post my email address. Can you put it somewhere public? | |
| Nov 26, 2017 at 21:51 | comment | added | igal | @Anna1364 If you post your data somewhere (e.g. GitHub or something) then I'll running the script on your data myself. | |
| Nov 26, 2017 at 21:49 | comment | added | igal | @Anna1364 Does it actually terminate without producing any output or does it just run for a really, really long time? I didn't put any effort into making it efficient. If your input is really large than it might take a long time or possibly hang or crash. | |
| Nov 26, 2017 at 21:15 | comment | added | Anna1364 | @igal, thanks so much. I tried your python code with a small subset of data which works perfectly fine. But there is a problem when I run it for the entire dataset! I have nearly 10 million SNPs, when I run the script for the entire data-set it does not produce any output! I wonder what might be wrong....? | |
| Nov 26, 2017 at 4:10 | history | edited | igal | CC BY-SA 3.0 |
Removed double-quotes to match updated question.
|
| Nov 26, 2017 at 2:28 | comment | added | igal | @iruvar Thank you for the feedback - updated. | |
| Nov 26, 2017 at 2:27 | history | edited | igal | CC BY-SA 3.0 |
deleted 18 characters in body
|
| Nov 26, 2017 at 2:25 | comment | added | iruvar |
very good, +1. int(start) <= int(position) and int(position) <= int(end) is idiomatically int(start) <= int(position) <= int(end)
|
|
| Nov 26, 2017 at 2:22 | history | edited | igal | CC BY-SA 3.0 |
Corrected solution.
|
| Nov 26, 2017 at 2:05 | history | edited | igal | CC BY-SA 3.0 |
added 987 characters in body
|
| Nov 26, 2017 at 0:59 | history | edited | igal | CC BY-SA 3.0 |
added 1729 characters in body; added 74 characters in body; added 8 characters in body
|
| Nov 26, 2017 at 0:52 | history | answered | igal | CC BY-SA 3.0 |