Timeline for Project Reduplication of Deduplication - Unix & Linux

Current License: CC BY-SA 3.0

6 events

when toggle format	what		by	license	comment
Nov 25, 2016 at 8:18	comment	added	Dmitry Grigoryev		@Monozygotic Ok, thanks for letting me know and good luck!
Nov 25, 2016 at 3:55	comment	added	Monozygotic		@muru and Dmitry, I had a discussion with the other researchers, but we've decided not to add known duplicates. The main reason is that it's virtually impossible to eliminate all bias. Apparently a lot of research has been done on this by psychologists. So instead, we'll try to get as many annotations as possible, to make the received labels as trustworthy as possible. Different sites have different numbers of duplicates, so we might be able to use this info to identify cases where there's a lot of bias too. We'll have to experiment when we have all the data.
Nov 23, 2016 at 4:43	comment	added	Monozygotic		@muru, ha, I didn't know that! Great, that's encouraging. I'll speak to the other researchers tomorrow, so will bring this point up.
Nov 23, 2016 at 2:00	comment	added	muru		@Monozygotic nah, SE already does that to us via review audits, and most users are OK with it. People who want to help with this should be mostly fine with it too.
Nov 22, 2016 at 22:17	comment	added	Monozygotic		Interesting idea. I have not considered that. I was actually worried I would get comments about questions already having been tagged as duplicates on the site, but not in my dataset yet, because the latest dump is always a little bit behind on the actual status. I'm basically worried about annoying people if I show them known duplicates. You do have a point though, and this is definitely something I can discuss with the other researchers.
Nov 22, 2016 at 17:52	history	answered	Dmitry Grigoryev	CC BY-SA 3.0