Change name of columns that have the same name in a df

Question

I have the following df where two columns ara labeled with the same name:

dput(df_test)
structure(list(X = c("Gen", "ABCB1", "ABCG2", "CES1"), X.1 = c("Prioridad del gen", 
"Candidato", "Candidato", "Candidato"), X.2 = c("Región codificante", 
"2110", "1526", "3533"), X.3 = c("Categoría Reg. Codif.", "intron", 
"intron", "intron"), X.4 = c("Alineamiento múltiple", "No", "No", 
"No"), X.5 = c("Cromosoma", "7", "4", "16"), X.6 = c("Posición inicial", 
"87153584", "89096060", "55855151"), X.7 = c("Posición final", 
"87153585", "89096061", "55855151"), X.8 = c("Tamaño (pb)", "2", 
"2", "1"), X.9 = c("Nº pb cob. ? 15X", "0", "1", "0"), X.10 = c("Nº pb cob. ? 15X", 
"2", "1", "1"), X.11 = c("% pb cob. ? 15X", "0%", "50%", "0%"
), X.12 = c("Cobertura media", "3", "14,50", "0"), X.13 = c("Nº pb sin cubrir", 
"0", "0", "1"), X.14 = c("Nº pb cob. [1-5]", "2", "0", "0"), 
    X.15 = c("Nº pb cob. [6-14]", "0", "1", "0"), X.16 = c("Nº pb cob. [15-29]", 
    "0", "1", "0"), X.17 = c("Nº pb cob. ? 30X", "0", "0", "0"
    )), class = "data.frame", row.names = c(NA, -4L))

Because the first raw is empty in the original file, the real header becomes part of the df instead of being used as the header. Hence, I use row_to_names to move up the raw containing the names:

df1 <- read.delim("file", header = T) %>% row_to_names(row_number = 1)

Now I need to rename the columns "Nº pb cob. ? 15X" as "Nº pb cob. ≥ 15X" and " Nº pb cob. ≤ 15X", respectively. I've tried with:

clean_ rename_at clean_names() after row_to_names() and didn't change anything.
rename_vars and rename_at didn't work out either.

df1 <- rename_at(df1, 10, ~"Num pb cob. ≥ 15X") Error in combine_names(): ! Can't rename duplicate variables to {name}. Run rlang::last_error() to see where the error occurred.

Could any one give me some advice¿?

Thanks!!

I recommend fixing the initial file loading rather than fiddling with the headers afterwards. Can’t you read the table with e.g. readr::read_tsv? — Konrad Rudolph
– Konrad Rudolph, Commented Feb 22, 2022 at 13:05
Please also add the first line of the csv file to teh question and not just dput(df_test) so we can replicate the parsing and modification of column names — danlooo
– danlooo, Commented Feb 22, 2022 at 13:13
Thanks Konrad, but I did not consider this option because I have 50 files with the same format, therefore, it would take me a lot of time. — Mireia Boluda
– Mireia Boluda, Commented Feb 22, 2022 at 13:15
With 50 files to fix, invest a little time in automating the fix. I believe @KonradRudolph's advice is very sound. It shouldn't be too difficult. We really need to see a (few) sample(s) of your raw files rather than the df you have after import. — Limey
– Limey, Commented Feb 22, 2022 at 13:43
@MireiaBoluda To clarify, my suggestion wasn’t to fix the files but rather to fix the way you’re reading them. As mentioned by danlooo, it would be helpful to see the first few lines of your files to understand why the reading is going so wrong. It’s entirely possible that my suggestion isn’t practical but we’d need to see the files to judge that. — Konrad Rudolph
– Konrad Rudolph, Commented Feb 22, 2022 at 13:47

Johan Rosa · Accepted Answer · 2022-02-22 14:37:34Z

I did not find a way to place those especial characters in a data frame variable name, so I used a minor variation.

The idea was to create a function that cleans your data, that way you can apply this function to all your files.

library(stringr)
library(purrr)

test <- structure(
  list(
    X = c("Gen", "ABCB1", "ABCG2", "CES1"), 
    X.1 = c("Prioridad del gen","Candidato", "Candidato", "Candidato"),
    X.2 = c("Región codificante","2110", "1526", "3533"), 
    X.3 = c("Categoría Reg. Codif.", "intron", "intron", "intron"),
    X.4 = c("Alineamiento múltiple", "No", "No", "No"),
    X.5 = c("Cromosoma", "7", "4", "16"),
    X.6 = c("Posición inicial", "87153584", "89096060", "55855151"),
    X.7 = c("Posición final", "87153585", "89096061", "55855151"), 
    X.8 = c("Tamaño (pb)", "2", "2", "1"), 
    X.9 = c("Nº pb cob. ? 15X", "0", "1", "0"), 
    X.10 = c("Nº pb cob. ? 15X", "2", "1", "1"), 
    X.11 = c("% pb cob. ? 15X", "0%", "50%", "0%"),
    X.12 = c("Cobertura media", "3", "14,50", "0"), 
    X.13 = c("Nº pb sin cubrir", "0", "0", "1"), 
    X.14 = c("Nº pb cob. [1-5]", "2", "0", "0"), 
    X.15 = c("Nº pb cob. [6-14]", "0", "1", "0"), 
    X.16 = c("Nº pb cob. [15-29]", "0", "1", "0"),
    X.17 = c("Nº pb cob. ? 30X", "0", "0", "0")), 
  class = "data.frame", row.names = c(NA, -4L))

# Function to clean the names as you need
clean_df_names <- function(df) {
  df_names <- df[1, ] %>%
    unlist(use.names = FALSE)
  
  repeated_names <- which(df_names == 'Nº pb cob. ? 15X')
  
  
  #name_symbols <- c('\u2265', '\u2264') # these are the unicode symbols, but can not be used in df names
  name_symbols <- c('>=', '<=')
  
  new_names <- purrr::map2_chr(
    df_names[repeated_names], name_symbols,
    ~stringr::str_replace(.x, '\\?', .y)
  )
  
  df_names[repeated_names] <- new_names
  
  new_df <- df[-1, ]
  
  setNames(new_df, df_names)
}

test <- clean_df_names(test)

str(test)
#> 'data.frame':    3 obs. of  18 variables:
#>  $ Gen                  : chr  "ABCB1" "ABCG2" "CES1"
#>  $ Prioridad del gen    : chr  "Candidato" "Candidato" "Candidato"
#>  $ Región codificante   : chr  "2110" "1526" "3533"
#>  $ Categoría Reg. Codif.: chr  "intron" "intron" "intron"
#>  $ Alineamiento múltiple: chr  "No" "No" "No"
#>  $ Cromosoma            : chr  "7" "4" "16"
#>  $ Posición inicial     : chr  "87153584" "89096060" "55855151"
#>  $ Posición final       : chr  "87153585" "89096061" "55855151"
#>  $ Tamaño (pb)          : chr  "2" "2" "1"
#>  $ Nº pb cob. >= 15X    : chr  "0" "1" "0"
#>  $ Nº pb cob. <= 15X    : chr  "2" "1" "1"
#>  $ % pb cob. ? 15X      : chr  "0%" "50%" "0%"
#>  $ Cobertura media      : chr  "3" "14,50" "0"
#>  $ Nº pb sin cubrir     : chr  "0" "0" "1"
#>  $ Nº pb cob. [1-5]     : chr  "2" "0" "0"
#>  $ Nº pb cob. [6-14]    : chr  "0" "1" "0"
#>  $ Nº pb cob. [15-29]   : chr  "0" "1" "0"
#>  $ Nº pb cob. ? 30X     : chr  "0" "0" "0"

^{Created on 2022-02-22 by the reprex package (v2.0.1)}

danlooo · Accepted Answer · 2022-02-22 13:05:04Z

0

You can do it manually using backticks:

library(tidyverse)
df <-tibble(`Nº pb cob. ? 15X` = seq(2))
df
#> # A tibble: 2 x 1
#>   `Nº pb cob. ? 15X`
#>                <int>
#> 1                  1
#> 2                  2
rename(df, `Nº pb cob. ≤ 15X` = `Nº pb cob. ? 15X`)
#> # A tibble: 2 x 1
#>   `Nº pb cob. ≤ 15X`
#>                <int>
#> 1                  1
#> 2                  2

^{Created on 2022-02-22 by the reprex package (v2.0.0)}

answered Feb 22, 2022 at 13:05

danlooo

10.8k2 gold badges13 silver badges27 bronze badges

2 Comments

Johan Rosa Over a year ago

Note that he have two columns with the same name.

Mireia Boluda Over a year ago

exactly, that is why rename does not work

Collectives™ on Stack Overflow

Change name of columns that have the same name in a df

2 Answers 2

1 Comment

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Related