I would like to remove columns which contain the string --
in any row.
Number 138 139 140 141 143 144 147 148 149 150 151 152 14 15 N…
nm4804 A B -- A B A A -- A A A A A -- A
nm7574 B A A A A A A A A A A A A -- A
nm8723 B -- B B B -- A -- B B B B -- -- A
N… B A A A A B A -- A A B -- -- -- A
I would like to count the -- frequency, if there is any column have more than 50% of -- in the columns, that column will be removed.
Desired result:
Number 138 140 141 143 147 149 150 151 N…
nm4804 A A -- B A A A A A
nm7574 B A A A A A A A A
nm8723 B B A B -- B B B A
N… B A A A A A A B A
Data (thanks bgoldst)
df <- data.frame(Number=c('nm4804','nm7574','nm8723','N…'),`138`=c('A','B','B','B'),`139`=c(
'B','A','--','A'),`140`=c('--','A','B','A'),`141`=c('A','A','B','A'),`143`=c('B','A','B','A'
),`144`=c('A','A','--','B'),`147`=c('A','A','A','A'),`148`=c('--','A','--','--'),`149`=c('A',
'A','B','A'),`150`=c('A','A','B','A'),`151`=c('A','A','B','B'),`152`=c('A','A','B','--'),
`14`=c('A','A','--','--'),`15`=c('--','--','--','--'),`N…`=c('A','A','A','A'),check.names=F,
stringsAsFactors=F);
--
to indicate a missing value. See?read.table
and the argumentna.strings
.