0

I have a string that is failing to evaluate as a match with itself. I am trying to do a simple subset based on one of 8 possible values in a column,

out <- df[df$`Var name` == "string",] 

I've had it work multiple times with different strings but for some reason this string fails. I have tried to get the exact string (thinking there may be some character encoding issue) from the source using the four below avenues but have had no success. Even when I make an explicit call to a cell I know contains that string and copy that into an evaluation statement it fails

> df[i,j]
[1] "string"
df[i,j]=="string"  # pasted from above line

I don't understand how I can be explicitly pasting the output I was just given and it not match.

## attempts to get exact string to paste into subset statement    
# from dput 
"IF APPLICABLE – Which of the following best characterizes the expectations with"

# from calling a specific row/col (df[i, j])
[1] "IF APPLICABLE – Which of the following best characterizes the expectations with"

# from the source pane of rstudio
IF APPLICABLE – Which of the following best characterizes the expectations with

# from the source excel file
IF APPLICABLE – Which of the following best characterizes the expectations with

I don't have a clue what could be going on here. I am explicitly drawing the string straight from the data and yet it still fails to evaluate as true. Is there something going on in the background that I'm not seeing? Am I overlooking something ridiculously simple?

edit:

I subset based on another way, below is a dput and actual example of what I'm doing:

> dput(temp)
structure(list(`Item Stem` = "IF APPLICABLE – Which of the following best characterizes the expectations with", 
    `Item Response` = "It was required.", orgchar_group = "locale", 
    `Org Characteristic` = "Rural", N = 487, percent = 34.5145287030475, 
    `Graphs note` = NA_character_, `Report note` = NA_character_, 
    `Other note` = NA_character_, subsig = 1, overall = 0, varname = NA_character_, 
    statsig = NA_real_, use = NA_real_, difference = 9.16044821292665), .Names = c("Item Stem", 
"Item Response", "orgchar_group", "Org Characteristic", "N", 
"percent", "Graphs note", "Report note", "Other note", "subsig", 
"overall", "varname", "statsig", "use", "difference"), row.names = 288L, class = "data.frame")
> temp[1,1]
[1] "IF APPLICABLE – Which of the following best characterizes the expectations with"
> temp[1,1] == "IF APPLICABLE – Which of the following best characterizes the expectations with"
[1] FALSE
5
  • 1
    Maybe the original has non printable characters in it. Commented Jan 22, 2018 at 18:13
  • 1
    It must be system specific. I ran your code on my windows machine and on tio.run and it evaluates as TRUE. Commented Jan 22, 2018 at 18:16
  • 1
    Works for me. You'll have to come up with an example that actually fails, I guess. Commented Jan 22, 2018 at 18:16
  • 1
    tio.run/… Commented Jan 22, 2018 at 18:17
  • Reading up on non-printable characters. Based on the fact it works for two of you as pasted above I've got a feeling that may be it. Will upate if/when I figure out that's it. Commented Jan 22, 2018 at 18:20

1 Answer 1

0

Turns out it was in fact a non-printable character, shoutout to the commenters for helping me figure it out by 1) suggesting it and 2) showing that it worked for them.

I was able to figure it out using insights from here (& here) and here.

I used a grep command (from @Tyler Rinker) to determine that there was in fact a non-ASCII character in my string, and a stringi command (from @hadley) to determine what kind. I then used base solution from @Josh O'Brien to remove it. Turns out it was the heiphen.

# working in the temp df
> x <- temp[1,1]
> grepl("[^ -~]", x)
[1] TRUE
> stringi::stri_enc_mark(x)
[1] "UTF-8"
> iconv(x, "UTF-8", "ASCII", sub="")  
[1] "IF APPLICABLE  Which of the following best characterizes the expectations with"

# set x as df$`Var name` and reassign it to fix
df$`Var name` <- iconv(df$`Var name`, "UTF-8", "ASCII", sub="")

Still don't understand it enough to explain why it happened but it's fixed now.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.