How do I replace NA values with zeros in an R dataframe?

Question

I have a data frame and some columns have NA values.

How do I replace these NA values with zeroes?

small modification of stackoverflow.com/questions/7279089/… (which I found by searching "[r] replace NA with zero") ... — Ben Bolker
– Ben Bolker, Commented Nov 17, 2011 at 4:16

aL3xa · Accepted Answer · 2011-11-17 16:16:47Z

1179

Answer recommended by R Language Collective

See my comment in @gsk3 answer. A simple example:

> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   4  3 NA  3  7  6  6 10  6   5
2   9  8  9  5 10 NA  2  1  7   2
3   1  1  6  3  6 NA  1  4  1   6
4  NA  4 NA  7 10  2 NA  4  1   8
5   1  2  4 NA  2  6  2  6  7   4
6  NA  3 NA NA 10  2  1 10  8   4
7   4  4  9 10  9  8  9  4 10  NA
8   5  8  3  2  1  4  5  9  4   7
9   3  9 10  1  9  9 10  5  3   3
10  4  2  2  5 NA  9  7  2  5   5

> d[is.na(d)] <- 0

> d
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   4  3  0  3  7  6  6 10  6   5
2   9  8  9  5 10  0  2  1  7   2
3   1  1  6  3  6  0  1  4  1   6
4   0  4  0  7 10  2  0  4  1   8
5   1  2  4  0  2  6  2  6  7   4
6   0  3  0  0 10  2  1 10  8   4
7   4  4  9 10  9  8  9  4 10   0
8   5  8  3  2  1  4  5  9  4   7
9   3  9 10  1  9  9 10  5  3   3
10  4  2  2  5  0  9  7  2  5   5

There's no need to apply apply. =)

EDIT

You should also take a look at norm package. It has a lot of nice features for missing data analysis. =)

edited Nov 17, 2011 at 16:16

answered Nov 17, 2011 at 11:48

aL3xa

36.2k18 gold badges81 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Renato Dinhani Over a year ago

I already tried this code yesterday before you post it and not worked. Because this I posted the question. But I tried know and worked perfectly. I think I was doing something wrong.

Aaron - mostly inactive Over a year ago

@RenatoDinhaniConceição: if you tried something already, it's helpful to share that information when you ask the question; it helps to narrow down where the problem may be.

user798719 Over a year ago

d[is.na(d)] <- 0 does not make sense to me. It seems backwards? How does R process this statement?

Twitch_City Over a year ago

@user798719 - "<-" is R's assignment operator, and can be read as: do something on the right hand side and then assign it to the location/name on the left. In this case, we aren't really "doing" anything - just making zeroes. The left side is saying: look at the d object, inside the d object (the square brackets), find all the elements that return TRUE (is.na(d) returns a logical for each element). Once they are found, replace them ("assign them") with the value 0. These leaves all of the non-NAs as they were, and only replaces the ones with missingness.

jtdoud Over a year ago

And... if you have a data frame and only want to apply the replacement to specific nurmeric vectors (leaving say... strings with NA): df[19:28][is.na(df[19:28])] <- 0

|

Captain Hat · Accepted Answer · 2024-01-03 10:56:30Z

The dplyr hybridized options are now around 30% faster than the Base R subset reassigns. On a 100M datapoint dataframe mutate_all(~replace(., is.na(.), 0)) runs a half a second faster than the base R d[is.na(d)] <- 0 option. What one wants to avoid specifically is using an ifelse() or an if_else(). (The complete 600 trial analysis ran to over 4.5 hours mostly due to including these approaches.) Please see benchmark analyses below for the complete results.

If you are struggling with massive dataframes, data.table is the fastest option of all: 40% faster than the standard Base R approach. It also modifies the data in place, effectively allowing you to work with nearly twice as much of the data at once.

A clustering of other helpful tidyverse replacement approaches

Locationally:

index mutate_at(c(5:10), ~replace(., is.na(.), 0))
direct reference mutate_at(vars(var5:var10), ~replace(., is.na(.), 0))
fixed match mutate_at(vars(contains("1")), ~replace(., is.na(.), 0))

or in place of contains(), try ends_with(),starts_with()

pattern match mutate_at(vars(matches("\\d{2}")), ~replace(., is.na(.), 0))

Conditionally:
(change just single type and leave other types alone.)

integers mutate_if(is.integer, ~replace(., is.na(.), 0))
numbers mutate_if(is.numeric, ~replace(., is.na(.), 0))
strings mutate_if(is.character, ~replace(., is.na(.), 0))

##The Complete Analysis - Updated for dplyr 0.8.0: functions use purrr format ~ symbols: replacing deprecated funs() arguments.

Approaches tested:

# Base R: 
baseR.sbst.rssgn   <- function(x) { x[is.na(x)] <- 0; x }
baseR.replace      <- function(x) { replace(x, is.na(x), 0) }
baseR.for          <- function(x) { for(j in 1:ncol(x))
    x[[j]][is.na(x[[j]])] = 0 }

# tidyverse
## dplyr
dplyr_if_else      <- function(x) { mutate_all(x, ~if_else(is.na(.), 0, .)) }
dplyr_coalesce     <- function(x) { mutate_all(x, ~coalesce(., 0)) }

## tidyr
tidyr_replace_na   <- function(x) { replace_na(x, as.list(setNames(rep(0, 10), as.list(c(paste0("var", 1:10)))))) }

## hybrid 
hybrd.ifelse     <- function(x) { mutate_all(x, ~ifelse(is.na(.), 0, .)) }
hybrd.replace_na <- function(x) { mutate_all(x, ~replace_na(., 0)) }
hybrd.replace    <- function(x) { mutate_all(x, ~replace(., is.na(.), 0)) }
hybrd.rplc_at.idx<- function(x) { mutate_at(x, c(1:10), ~replace(., is.na(.), 0)) }
hybrd.rplc_at.nse<- function(x) { mutate_at(x, vars(var1:var10), ~replace(., is.na(.), 0)) }
hybrd.rplc_at.stw<- function(x) { mutate_at(x, vars(starts_with("var")), ~replace(., is.na(.), 0)) }
hybrd.rplc_at.ctn<- function(x) { mutate_at(x, vars(contains("var")), ~replace(., is.na(.), 0)) }
hybrd.rplc_at.mtc<- function(x) { mutate_at(x, vars(matches("\\d+")), ~replace(., is.na(.), 0)) }
hybrd.rplc_if    <- function(x) { mutate_if(x, is.numeric, ~replace(., is.na(.), 0)) }

# data.table   
library(data.table)
DT.for.set.nms   <- function(x) { for (j in names(x))
    set(x,which(is.na(x[[j]])),j,0) }
DT.for.set.sqln  <- function(x) { for (j in seq_len(ncol(x)))
    set(x,which(is.na(x[[j]])),j,0) }
DT.nafill        <- function(x) { nafill(df, fill=0)}
DT.setnafill     <- function(x) { setnafill(df, fill=0)}

The code for this analysis:

library(microbenchmark)
# 20% NA filled dataframe of 10 Million rows and 10 columns
set.seed(42) # to recreate the exact dataframe
dfN <- as.data.frame(matrix(sample(c(NA, as.numeric(1:4)), 1e7*10, replace = TRUE),
                            dimnames = list(NULL, paste0("var", 1:10)), 
                            ncol = 10))
# Running 600 trials with each replacement method 
# (the functions are excecuted locally - so that the original dataframe remains unmodified in all cases)
perf_results <- microbenchmark(
    hybrd.ifelse     = hybrd.ifelse(copy(dfN)),
    dplyr_if_else    = dplyr_if_else(copy(dfN)),
    hybrd.replace_na = hybrd.replace_na(copy(dfN)),
    baseR.sbst.rssgn = baseR.sbst.rssgn(copy(dfN)),
    baseR.replace    = baseR.replace(copy(dfN)),
    dplyr_coalesce   = dplyr_coalesce(copy(dfN)),
    tidyr_replace_na = tidyr_replace_na(copy(dfN)),
    hybrd.replace    = hybrd.replace(copy(dfN)),
    hybrd.rplc_at.ctn= hybrd.rplc_at.ctn(copy(dfN)),
    hybrd.rplc_at.nse= hybrd.rplc_at.nse(copy(dfN)),
    baseR.for        = baseR.for(copy(dfN)),
    hybrd.rplc_at.idx= hybrd.rplc_at.idx(copy(dfN)),
    DT.for.set.nms   = DT.for.set.nms(copy(dfN)),
    DT.for.set.sqln  = DT.for.set.sqln(copy(dfN)),
    times = 600L
)

Summary of Results

> print(perf_results)
Unit: milliseconds
              expr       min        lq     mean   median       uq      max neval
      hybrd.ifelse 6171.0439 6339.7046 6425.221 6407.397 6496.992 7052.851   600
     dplyr_if_else 3737.4954 3877.0983 3953.857 3946.024 4023.301 4539.428   600
  hybrd.replace_na 1497.8653 1706.1119 1748.464 1745.282 1789.804 2127.166   600
  baseR.sbst.rssgn 1480.5098 1686.1581 1730.006 1728.477 1772.951 2010.215   600
     baseR.replace 1457.4016 1681.5583 1725.481 1722.069 1766.916 2089.627   600
    dplyr_coalesce 1227.6150 1483.3520 1524.245 1519.454 1561.488 1996.859   600
  tidyr_replace_na 1248.3292 1473.1707 1521.889 1520.108 1570.382 1995.768   600
     hybrd.replace  913.1865 1197.3133 1233.336 1238.747 1276.141 1438.646   600
 hybrd.rplc_at.ctn  916.9339 1192.9885 1224.733 1227.628 1268.644 1466.085   600
 hybrd.rplc_at.nse  919.0270 1191.0541 1228.749 1228.635 1275.103 2882.040   600
         baseR.for  869.3169 1180.8311 1216.958 1224.407 1264.737 1459.726   600
 hybrd.rplc_at.idx  839.8915 1189.7465 1223.326 1228.329 1266.375 1565.794   600
    DT.for.set.nms  761.6086  915.8166 1015.457 1001.772 1106.315 1363.044   600
   DT.for.set.sqln  787.3535  918.8733 1017.812 1002.042 1122.474 1321.860   600

Boxplot of Results

ggplot(perf_results, aes(x=expr, y=time/10^9)) +
    geom_boxplot() +
    xlab('Expression') +
    ylab('Elapsed Time (Seconds)') +
    scale_y_continuous(breaks = seq(0,7,1)) +
    coord_flip()

Color-coded Scatterplot of Trials (with y-axis on a log scale)

qplot(y=time/10^9, data=perf_results, colour=expr) + 
    labs(y = "log10 Scaled Elapsed Time per Trial (secs)", x = "Trial Number") +
    coord_cartesian(ylim = c(0.75, 7.5)) +
    scale_y_log10(breaks=c(0.75, 0.875, 1, 1.25, 1.5, 1.75, seq(2, 7.5)))

A note on the other high performers

When the datasets get larger, Tidyr''s replace_na had historically pulled out in front. With the current collection of 100M data points to run through, it performs almost exactly as well as a Base R For Loop. I am curious to see what happens for different sized dataframes.

Additional examples for the mutate and summarize _at and _all function variants can be found here: https://rdrr.io/cran/dplyr/man/summarise_all.html Additionally, I found helpful demonstrations and collections of examples here: https://blog.exploratory.io/dplyr-0-5-is-awesome-heres-why-be095fd4eb8a

Attributions and Appreciations

With special thanks to:

Tyler Rinker and Akrun for demonstrating microbenchmark.
alexis_laz for working on helping me understand the use of local(), and (with Frank's patient help, too) the role that silent coercion plays in speeding up many of these approaches.
ArthurYip for the poke to add the newer coalesce() function in and update the analysis.
Gregor for the nudge to figure out the data.table functions well enough to finally include them in the lineup.
Base R For loop: alexis_laz
data.table For Loops: Matt_Dowle
Roman for explaining what is.numeric() really tests.

(Of course, please reach over and give them upvotes, too if you find those approaches useful.)

Note on my use of Numerics: If you do have a pure integer dataset, all of your functions will run faster. Please see alexiz_laz's work for more information. IRL, I can't recall encountering a data set containing more than 10-15% integers, so I am running these tests on fully numeric dataframes.

Hardware Used 3.9 GHz CPU with 24 GB RAM

@Frank - Thank you for finding that discrepancy. The references are all cleaned up and the results have been entirely rerun on a single machine and reposted.
@UweBlock - great question: it allowed me to do the subsetting left assign operation with all functions working on exactly the same dataframe. Since I had to wrap the local around that function, then in the name of science [One job, you had one job!] I wrapped it around all of them so that the playing field was unequivocally level. For more info - please see here: stackoverflow.com/questions/41604711/… I had trimmed down the rather longwinded previous answer - but that part of the discussion would be good to add back in. Thank you!
@ArthurYip - I've added the coalesce() option in and rerun all the times. Thank you for the nudge to update.
Update for dplyr 1.0.2 that removes the mutate_at and mutate_all: function(x) { mutate(across(x, ~replace_na(., 0))) }
across also supports inline anonymous functions which might provide a slight performance boost over ~ which has to be converted to a function: mutate(across(everything(), \(x) replace_na(x, 0))).

Community · Accepted Answer · 2017-05-23 12:10:46Z

163

For a single vector:

x <- c(1,2,NA,4,5)
x[is.na(x)] <- 0

For a data.frame, make a function out of the above, then apply it to the columns.

Please provide a reproducible example next time as detailed here:

How to make a great R reproducible example?

edited May 23, 2017 at 12:10

CommunityBot

11 silver badge

answered Nov 17, 2011 at 3:50

Ari B. Friedman

73k35 gold badges183 silver badges238 bronze badges

5 Comments

aL3xa Over a year ago

is.na is generic function, and has methods for objects of data.frame class. so this one will also work on data.frames!

aL3xa Over a year ago

When I ran methods(is.na) for the first time, I was like whaaa?!?. I love when stuff like that happen! =)

Mark Miller Over a year ago

Suppose you have a data frame named df instead of a single vector and you just want to replace missing observations in a single column named X3. You can do so with this line: df$X3[is.na(df$X3)] <- 0

Mark Miller Over a year ago

Suppose you only want to replace NA with 0 in columns 4-6 of a data frame named my.df. You can use: my.df[,4:6][is.na(my.df[,4:6])] <- 0

uh_big_mike_boi Over a year ago

how come you pass 'x' to is.na(x) is there a way to tell which library routines in R are vectorized?

zx8754 · Accepted Answer · 2018-01-26 10:57:52Z

98

dplyr example:

library(dplyr)

df1 <- df1 %>%
    mutate(myCol1 = if_else(is.na(myCol1), 0, myCol1))

Note: This works per selected column, if we need to do this for all column, see @reidjax's answer using mutate_each.

edited Jan 26, 2018 at 10:57

zx8754

56.6k12 gold badges130 silver badges229 bronze badges

answered May 8, 2014 at 16:15

ianmunoz

2,1411 gold badge14 silver badges8 bronze badges

Comments

Sasha · Accepted Answer · 2021-04-14 20:16:21Z

74

It is also possible to use tidyr::replace_na.

    library(tidyr)
    df <- df %>% mutate_all(funs(replace_na(.,0)))

Edit (dplyr > 1.0.0):

df %>% mutate(across(everything(), .fns = ~replace_na(.,0)))

edited Apr 14, 2021 at 20:16

answered Jan 13, 2019 at 21:14

Sasha

6,0798 gold badges36 silver badges38 bronze badges

4 Comments

Ömer A. Over a year ago

mutate_* verbs are now superseded by across()

Faustin Gashakamba Over a year ago

I am curious as to why I need wo wrap the replace(is.na(.), 0) function inside mutate().Why not feed it directly to the pipe?

Julien Over a year ago

mutate enables to create or replace variables.

ha-pu Over a year ago

As of dplyr 1.1.0 this is how you should write the replacement mutate(data, across(.cols = everything(), \(x) replace_na(x, 0)))

zx8754 · Accepted Answer · 2018-01-26 10:59:37Z

73

If we are trying to replace NAs when exporting, for example when writing to csv, then we can use:

  write.csv(data, "data.csv", na = "0")

edited Jan 26, 2018 at 10:59

zx8754

56.6k12 gold badges130 silver badges229 bronze badges

answered Feb 21, 2014 at 16:27

mrsoltys

1,12510 silver badges13 bronze badges

1 Comment

CubicInfinity Over a year ago

Also works for readr::write_csv and read.csv or readr::read_csv. When reading, can be a vector of possible values.

krishan404 · Accepted Answer · 2015-09-24 13:49:59Z

60

I know the question is already answered, but doing it this way might be more useful to some:

Define this function:

na.zero <- function (x) {
    x[is.na(x)] <- 0
    return(x)
}

Now whenever you need to convert NA's in a vector to zero's you can do:

na.zero(some.vector)

answered Sep 24, 2015 at 13:49

krishan404

6095 silver badges2 bronze badges

1 Comment

Friede Over a year ago

return(x) is not needed, x is sufficient.

akuiper · Accepted Answer · 2016-09-16 21:34:34Z

33

With dplyr 0.5.0, you can use coalesce function which can be easily integrated into %>% pipeline by doing coalesce(vec, 0). This replaces all NAs in vec with 0:

Say we have a data frame with NAs:

library(dplyr)
df <- data.frame(v = c(1, 2, 3, NA, 5, 6, 8))

df
#    v
# 1  1
# 2  2
# 3  3
# 4 NA
# 5  5
# 6  6
# 7  8

df %>% mutate(v = coalesce(v, 0))
#   v
# 1 1
# 2 2
# 3 3
# 4 0
# 5 5
# 6 6
# 7 8

edited Sep 16, 2016 at 21:34

answered Sep 16, 2016 at 21:25

akuiper

216k33 gold badges362 silver badges379 bronze badges

3 Comments

Arthur Yip Over a year ago

I tested coalesce and it performs about the same as replace. the coalesce command is the simplest so far!

jangorecki Over a year ago

it would be useful if you would present how to apply that on all columns of 2+ columns tibble.

LMc Over a year ago

mutate(across(where(is.character), ~ coalesce(.x, 0)))

Ronak Shah · Accepted Answer · 2017-03-08 09:29:47Z

33

More general approach of using replace() in matrix or vector to replace NA to 0

For example:

> x <- c(1,2,NA,NA,1,1)
> x1 <- replace(x,is.na(x),0)
> x1
[1] 1 2 0 0 1 1

This is also an alternative to using ifelse() in dplyr

df = data.frame(col = c(1,2,NA,NA,1,1))
df <- df %>%
   mutate(col = replace(col,is.na(col),0))

edited Mar 8, 2017 at 9:29

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

answered Feb 25, 2016 at 4:30

Charleslmh

4314 silver badges3 bronze badges

3 Comments

Climbs_lika_Spyder Over a year ago

My column was a factor so I had to add my replacement value levels(A$x) <- append(levels(A$x), "notAnswered") A$x <- replace(A$x,which(is.na(A$x)),"notAnswered")

lmo Over a year ago

which isn't needed here, you can use x1 <- replace(x,is.na(x),1).

Gerry Over a year ago

I tried many ways proposed in this thread to replace NA to 0 in just one specific column in a large data frame and this function replace() worked the most effectively while also the most simply.

Oliver Oliver · Accepted Answer · 2020-05-11 06:40:03Z

30

To replace all NAs in a dataframe you can use:

df %>% replace(is.na(.), 0)

answered May 11, 2020 at 6:40

Oliver Oliver

2,3954 gold badges18 silver badges18 bronze badges

2 Comments

jogo Over a year ago

this is not a new solution

canderson156 Over a year ago

But it's a fast, easy and simple answer. I like it.

reidjax · Accepted Answer · 2016-09-26 20:32:03Z

Would've commented on @ianmunoz's post but I don't have enough reputation. You can combine dplyr's mutate_each and replace to take care of the NA to 0 replacement. Using the dataframe from @aL3xa's answer...

> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
> d

    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   4  8  1  9  6  9 NA  8  9   8
2   8  3  6  8  2  1 NA NA  6   3
3   6  6  3 NA  2 NA NA  5  7   7
4  10  6  1  1  7  9  1 10  3  10
5  10  6  7 10 10  3  2  5  4   6
6   2  4  1  5  7 NA NA  8  4   4
7   7  2  3  1  4 10 NA  8  7   7
8   9  5  8 10  5  3  5  8  3   2
9   9  1  8  7  6  5 NA NA  6   7
10  6 10  8  7  1  1  2  2  5   7

> d %>% mutate_each( funs_( interp( ~replace(., is.na(.),0) ) ) )

    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   4  8  1  9  6  9  0  8  9   8
2   8  3  6  8  2  1  0  0  6   3
3   6  6  3  0  2  0  0  5  7   7
4  10  6  1  1  7  9  1 10  3  10
5  10  6  7 10 10  3  2  5  4   6
6   2  4  1  5  7  0  0  8  4   4
7   7  2  3  1  4 10  0  8  7   7
8   9  5  8 10  5  3  5  8  3   2
9   9  1  8  7  6  5  0  0  6   7
10  6 10  8  7  1  1  2  2  5   7

We're using standard evaluation (SE) here which is why we need the underscore on "funs_." We also use lazyeval's interp/~ and the . references "everything we are working with", i.e. the data frame. Now there are zeros!

Steffen Moritz · Accepted Answer · 2016-11-10 18:21:37Z

13

Another example using imputeTS package:

library(imputeTS)
na.replace(yourDataframe, 0)

answered Nov 10, 2016 at 18:21

Steffen Moritz

7,74811 gold badges41 silver badges57 bronze badges

Comments

smci · Accepted Answer · 2018-04-06 00:10:41Z

12

If you want to replace NAs in factor variables, this might be useful:

n <- length(levels(data.vector))+1

data.vector <- as.numeric(data.vector)
data.vector[is.na(data.vector)] <- n
data.vector <- as.factor(data.vector)
levels(data.vector) <- c("level1","level2",...,"leveln", "NAlevel")

It transforms a factor-vector into a numeric vector and adds another artifical numeric factor level, which is then transformed back to a factor-vector with one extra "NA-level" of your choice.

edited Apr 6, 2018 at 0:10

smci

34.1k21 gold badges117 silver badges152 bronze badges

answered Mar 17, 2016 at 8:55

user6075957

1211 silver badge2 bronze badges

Comments

jangorecki · Accepted Answer · 2020-06-23 16:34:31Z

12

Dedicated functions, nafill and setnafill, for that purpose is in data.table. Whenever available, they distribute columns to be computed on multiple threads.

library(data.table)

ans_df <- nafill(df, fill=0)

# or even faster, in-place
setnafill(df, fill=0)

edited Jun 23, 2020 at 16:34

answered Feb 3, 2019 at 15:46

jangorecki

16.7k5 gold badges86 silver badges169 bronze badges

Comments

STA · Accepted Answer · 2021-09-18 04:42:58Z

10

No need to use any library.

df <- data.frame(a=c(1,3,5,NA))

df$a[is.na(df$a)] <- 0

df

edited Sep 18, 2021 at 4:42

STA

35.3k9 gold badges49 silver badges62 bronze badges

answered Aug 31, 2021 at 10:06

gmcoding

2392 silver badges8 bronze badges

1 Comment

ivo Welch May 22 at 3:52

only single variable in df

LMc · Accepted Answer · 2021-03-31 17:48:39Z

dplyr >= 1.0.0

In newer versions of dplyr:

across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().

df <- data.frame(a = c(LETTERS[1:3], NA), b = c(NA, 1:3))

library(tidyverse)

df %>% 
  mutate(across(where(anyNA), ~ replace_na(., 0)))

  a b
1 A 0
2 B 1
3 C 2
4 0 3

This code will coerce 0 to be character in the first column. To replace NA based on column type you can use a purrr-like formula in where:

df %>% 
  mutate(across(where(~ anyNA(.) & is.character(.)), ~ replace_na(., "0")))

Brian Willis · Accepted Answer · 2013-04-08 01:44:10Z

7

You can use replace()

For example:

> x <- c(-1,0,1,0,NA,0,1,1)
> x1 <- replace(x,5,1)
> x1
[1] -1  0  1  0  1  0  1  1

> x1 <- replace(x,5,mean(x,na.rm=T))
> x1
[1] -1.00  0.00  1.00  0.00  0.29  0.00 1.00  1.00

edited Apr 8, 2013 at 1:44

Brian Willis

24.1k9 gold badges50 silver badges50 bronze badges

answered Mar 30, 2013 at 6:52

Zahra

871 silver badge1 bronze badge

2 Comments

dardisco Over a year ago

True, but only practical when you know the index of NAs in your vector. It's fine for small vectors as in your example.

lmo Over a year ago

@dardisco x1 <- replace(x,is.na(x),1) will work without explicitly listing the index values.

MS Berends · Accepted Answer · 2022-03-14 20:43:23Z

7

The cleaner package has an na_replace() generic, that at default replaces numeric values with zeroes, logicals with FALSE, dates with today, etc.:

library(dplyr)
library(cleaner)

starwars %>% na_replace()
na_replace(starwars)

It even supports vectorised replacements:

mtcars[1:6, c("mpg", "hp")] <- NA
na_replace(mtcars, mpg, hp, replacement = c(999, 123))

Documentation: https://msberends.github.io/cleaner/reference/na_replace.html

edited Mar 14, 2022 at 20:43

answered Jul 9, 2020 at 7:04

MS Berends

5,3572 gold badges45 silver badges59 bronze badges

Comments

Antti · Accepted Answer · 2016-10-10 11:25:07Z

Another dplyr pipe compatible option with tidyrmethod replace_na that works for several columns:

require(dplyr)
require(tidyr)

m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
d <- as.data.frame(m)

myList <- setNames(lapply(vector("list", ncol(d)), function(x) x <- 0), names(d))

df <- d %>% replace_na(myList)

You can easily restrict to e.g. numeric columns:

d$str <- c("string", NA)

myList <- myList[sapply(d, is.numeric)]

df <- d %>% replace_na(myList)

Fábio · Accepted Answer · 2017-04-11 19:11:55Z

3

This simple function extracted from Datacamp could help:

replace_missings <- function(x, replacement) {
  is_miss <- is.na(x)
  x[is_miss] <- replacement

  message(sum(is_miss), " missings replaced by the value ", replacement)
  x
}

Then

replace_missings(df, replacement = 0)

answered Apr 11, 2017 at 19:11

Fábio

8292 gold badges17 silver badges25 bronze badges

Comments

davsjob · Accepted Answer · 2019-06-10 21:14:47Z

3

An easy way to write it is with if_na from hablar:

library(dplyr)
library(hablar)

df <- tibble(a = c(1, 2, 3, NA, 5, 6, 8))

df %>% 
  mutate(a = if_na(a, 0))

which returns:

answered Jun 10, 2019 at 21:14

davsjob

1,98017 silver badges11 bronze badges

Comments

Maël · Accepted Answer · 2023-05-08 08:07:10Z

Another option is to use collapse::replace_NA. By default, replace_NA replaces NAs with 0s.

library(collapse)
replace_NA(df)

For only some columns:

replace_NA(df, cols = c("V1", "V5")) 
#Alternatively, one can use a function, indices or a logical vector to select the columns

It's also faster than any other answer (see this answer for a comparison):

set.seed(42) # to recreate the exact dataframe
dfN <- as.data.frame(matrix(sample(c(NA, as.numeric(1:4)), 1e7*10, replace = TRUE),
                            dimnames = list(NULL, paste0("var", 1:10)), 
                            ncol = 10))

microbenchmark(collapse = replace_NA(dfN))

# Unit: milliseconds
#      expr      min      lq     mean  median       uq     max neval
#  collapse 508.9198 621.405 751.3413 714.835 859.5437 1298.69   100

symkly · Accepted Answer · 2019-10-31 08:05:50Z

0

if you want to assign a new name after changing the NAs in a specific column in this case column V3, use you can do also like this

my.data.frame$the.new.column.name <- ifelse(is.na(my.data.frame$V3),0,1)

answered Oct 31, 2019 at 8:05

symkly

3,03117 silver badges40 bronze badges

Comments

polkas · Accepted Answer · 2020-09-23 19:42:37Z

I wan to add a next solution which using a popular Hmisc package.

library(Hmisc)
data(airquality)
# imputing with 0 - all columns
# although my favorite one for simple imputations is Hmisc::impute(x, "random")
> dd <- data.frame(Map(function(x) Hmisc::impute(x, 0), airquality))
> str(dd[[1]])
 'impute' Named num [1:153] 41 36 12 18 0 28 23 19 8 0 ...
 - attr(*, "names")= chr [1:153] "1" "2" "3" "4" ...
 - attr(*, "imputed")= int [1:37] 5 10 25 26 27 32 33 34 35 36 ...
> dd[[1]][1:10]
  1   2   3   4   5   6   7   8   9  10 
 41  36  12  18  0*  28  23  19   8  0*

There could be seen that all imputations metadata are allocated as attributes. Thus it could be used later.

John Haberstroh · Accepted Answer · 2021-07-29 04:23:13Z

This is not exactly a new solution, but I like to write inline lambdas that handle things that I can't quite get packages to do. In this case,

df %>%
   (function(x) { x[is.na(x)] <- 0; return(x) })

Because R does not ever "pass by object" like you might see in Python, this solution does not modify the original variable df, and so will do quite the same as most of the other solutions, but with much less need for intricate knowledge of particular packages.

Note the parens around the function definition! Though it seems a bit redundant to me, since the function definition is surrounded in curly braces, it is required that inline functions are defined within parens for magrittr.

jaeyeon · Accepted Answer · 2022-12-09 22:34:07Z

0

This is a more flexible solution. It works no matter how large your data frame is, or zero is indicated by 0 or zero or whatsoever.

library(dplyr) # make sure dplyr ver is >= 1.00

df %>%
    mutate(across(everything(), na_if, 0)) # if 0 is indicated by `zero` then replace `0` with `zero`

answered Dec 9, 2022 at 22:34

jaeyeon

4093 silver badges4 bronze badges

Comments

Quinten · Accepted Answer · 2023-01-15 16:57:37Z

Another option using sapply to replace all NA with zeros. Here is some reproducible code (data from @aL3xa):

set.seed(7) # for reproducibility
m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
d <- as.data.frame(m)
d
#>    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#> 1   9  7  5  5  7  7  4  6  6   7
#> 2   2  5 10  7  8  9  8  8  1   8
#> 3   6  7  4 10  4  9  6  8 NA  10
#> 4   1 10  3  7  5  7  7  7 NA   8
#> 5   9  9 10 NA  7 10  1  5 NA   5
#> 6   5  2  5 10  8  1  1  5 10   3
#> 7   7  3  9  3  1  6  7  3  1  10
#> 8   7  7  6  8  4  4  5 NA  8   7
#> 9   2  1  1  2  7  5  9 10  9   3
#> 10  7  5  3  4  9  2  7  6 NA   5
d[sapply(d, \(x) is.na(x))] <- 0
d
#>    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#> 1   9  7  5  5  7  7  4  6  6   7
#> 2   2  5 10  7  8  9  8  8  1   8
#> 3   6  7  4 10  4  9  6  8  0  10
#> 4   1 10  3  7  5  7  7  7  0   8
#> 5   9  9 10  0  7 10  1  5  0   5
#> 6   5  2  5 10  8  1  1  5 10   3
#> 7   7  3  9  3  1  6  7  3  1  10
#> 8   7  7  6  8  4  4  5  0  8   7
#> 9   2  1  1  2  7  5  9 10  9   3
#> 10  7  5  3  4  9  2  7  6  0   5

^{Created on 2023-01-15 with reprex v2.0.2}

Please note: Since R 4.1.0 you can use \(x) instead of function(x).

Antreas Stefopoulos · Accepted Answer · 2025-02-17 14:12:33Z

0

library(Rcpp)

cppFunction('
void fastReplaceNA(NumericMatrix mat) {
    int n = mat.nrow();
    int m = mat.ncol();
    
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
            if (R_IsNA(mat(i, j))) {
                mat(i, j) = 0;
            }
        }
    }
}
')

fastReplaceNA(mat)

answered Feb 17 at 14:12

Antreas Stefopoulos

3354 silver badges11 bronze badges

Comments

wesleysc352 · Accepted Answer · 2020-12-30 04:05:39Z

-1

in data.frame it is not necessary to create a new column by mutate.

library(tidyverse)    
k <- c(1,2,80,NA,NA,51)
j <- c(NA,NA,3,31,12,NA)
        
df <- data.frame(k,j)%>%
   replace_na(list(j=0))#convert only column j, for example

result

answered Dec 30, 2020 at 4:05

wesleysc352

6272 gold badges13 silver badges35 bronze badges

Comments

Tyler2P · Accepted Answer · 2023-03-29 18:36:32Z

-1

I used this personally and works fine:

players_wd$APPROVED_WD[is.na(players_wd$APPROVED_WD)] <- 0

edited Mar 29, 2023 at 18:36

Tyler2P

2,37030 gold badges26 silver badges34 bronze badges

answered Sep 9, 2022 at 13:35

Aymen Azoui

4124 silver badges7 bronze badges

1 Comment

Tyler2P Over a year ago

Your answer could be improved by adding more information on what the code does and how it helps the OP.

Collectives™ on Stack Overflow

30 Answers 30

8 Comments

A clustering of other helpful tidyverse replacement approaches

Approaches tested:

The code for this analysis:

Summary of Results

Boxplot of Results

Color-coded Scatterplot of Trials (with y-axis on a log scale)

A note on the other high performers

Attributions and Appreciations

24 Comments

5 Comments

Comments

4 Comments

1 Comment

1 Comment

3 Comments

3 Comments

2 Comments

Comments

Comments

Comments

Comments

1 Comment

dplyr >= 1.0.0

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Linked

Related