I have a data.table with 1 Million rows with each cell looking like this:
ENST00000408384 // ENSEMBL // ncrna:miRNA chromosome:GRCh37:1:30366:30503:1 gene:ENSG00000221311 gene_biotype:miRNA transcript_biotype:miRNA // chr1 // 100 // 100 // 0 // --- // 0 /// ENST00000469289 // ENSEMBL // havana:known chromosome:GRCh38:1:30267:31109:1 gene:ENSG00000243485 gene_biotype:lincRNA transcript_biotype:lincRNA // chr1 // 100 // 100 // 0 // --- // 0 /// ENST00000473358 // ENSEMBL // havana:known chromosome:GRCh38:1:29554:31097:1 gene:ENSG00000243485 gene_biotype:lincRNA transcript_biotype:lincRNA // chr1 // 100 // 100 // 0 // --- // 0 /// OTTHUMT00000002840 // Havana transcript // novel transcript[gene_biotype:lincRNA transcript_biotype:lincRNA] // chr1 // 100 // 100 // 0 // --- // 0 /// OTTHUMT00000002841 // Havana transcript // novel transcript[gene_biotype:lincRNA transcript_biotype:lincRNA] // chr1 // 100 // 100 // 0 // --- // 0
I need to extract what comes immediately after "gene_biotype:" (in this case it would "miRNA"). How to do that?
I tried finding a solution with stringR and regex and gave up after several hours. Appreciate your help. Thanks.