I am reading external data using read.table() in R like:
student_record <- read.table("Address of data",fill = TRUE,col.names=c("student_id","name"))
Student id is a 20 character long string of the format say STU01000001010001001 and I want to keep rows where student id satisfy following conditions:
( 0 – 2 = STU) AND
(5 – 9 != 11111) AND
(10 – 11 != (00 or 10)) AND
(12 – 17 != 111111) AND
(18-19 = 04)
Here 0,2 and so on represent index of character in student id. How can I filter out records using such filter conditions?
I executed this after read.table() to filter:
stu_record <- student_record[grepl("^STU.{2}(?!11111).(?!(00|10)).(?!111111).04", student_record[,1], perl=T),]
but the output doesn't seems to come correct because everything gets filtered out and I get an empty frame
When I executed this:
stu_record <- student_record[grepl("^STU.{2}(?!11111).(?!(00|10)).(?!111111)04", student_record[,1], perl=T),]
then I see records but they don't seems to be correct as I can see records like STU13120600500000002 which should not come as last two index should be 04
UPDATE: few rows that I see after executing above command are(The ids dont get filtered correctly as las two digits should be 04 but I see 01):
student_id Name
"STU01115000000000001" "A"
"STU01115000000000001" "B"
"STU01115000000000001" "C"
"STU01115000000000001" "D"
"STU01115000000000001" "E"
"STU01115000000000001" "F"
"STU01115000000000001" "G"
"STU01115000000000001" "H"
"STU01115000000000001" "I"
while some of the ids which should have been there but got filtered out are:
"STU01155000000000004" "F"
"STU01135000000000004" "G"
"STU01145000000000004" "H"
"STU01125000000000004" "I"
NOTE: There are certain index in string for which there is no condition like for index 3 and 4 there is no filtering condition so they can be anything.