0

I am trying to open and read multiple NetCDF files and I want to save the result to one list of many data frames. in my working directory, I have the folder "main_folder" contains five folders (x1,x2,x3,x4, and x5) each folder of these five contains a different number of subfolders, let me say for example the folder x1 contains subfolders from folder "y1", to "y20". the folder y1 contains n1 number of NetCDF files. the folder y2 contains n2 number of NetCDF files and so on. similarly for the other folders x2, x3,x4,x5. From the folder x1, I want to open, read and get the variables from all NetCDF files and make them as one data frame df1. and from the folder x2, I want to make the second data frame df2 and so on. at the End I will be having five data frames corresponding to each folder content. and then I want to make a list of these five data frames.

I wrote one code, it works except one problem which is the second data frame in the list contain the data of df1 appended to it the data of the second file df2. and df5 contains the data of df1+df2+df3+df4+df5. How can I solve this problem. here is my code

setwd("E:/main_folder")
#1#  list all files in the main_folder
folders<- as.list(list.files("E:/main_folder"))

#2# make list of subfiles 
subfiles<- lapply(folders, function(x) as.list(list.files(paste("E:/main_folder",x, sep="/"))))

#3# list the netcdf files from each subfiles
files1<- lapply(subfiles[[1]], function(x) list.files(paste( folders[1],x, sep = "/"),pattern='*.nc',full.names=TRUE))
files2<- lapply(subfiles[[2]], function(x) list.files(paste( folders[2],x, sep = "/"),pattern='*.nc',full.names=TRUE))
files3<- lapply(subfiles[[3]], function(x) list.files(paste( folders[3],x, sep = "/"),pattern='*.nc',full.names=TRUE))
files4<- lapply(subfiles[[4]], function(x) list.files(paste( folders[4],x, sep = "/"),pattern='*.nc',full.names=TRUE))
files5<- lapply(subfiles[[5]], function(x) list.files(paste( folders[5],x, sep = "/"),pattern='*.nc',full.names=TRUE))

#4# join all files in one list
filelist<- list(files1,files2,files3,files4,files5)



#5# Read the NetCDF and get the desired variables 
df<-  data.frame()
MissionsData<- list()
for (i in seq_along(filelist)){
  n<- length(filelist[[i]])
  for (j in 1:n){
    for( m in 1:length( filelist[[i]][[j]])){
   nc<- nc_open(filelist[[i]][[j]][[m]])
lat<-  ncvar_get(nc, "glat.00")
lon<- ncvar_get(nc, "glon.00")
ssh<-  ncvar_get(nc, "ssh.53")
jdn<- ncvar_get(nc, "jday.00")

df<- rbind(df,data.frame(lat,lon,ssh,jdn))
nc_close(nc)
    }
  }

  MissionsData[[i]]<- df

}

In addition, Can I make step #3# in one go instead of typing them manually?

1 Answer 1

1
#3 Nesting the code inside another `lapply` should do the job:

filelist = lapply(subfiles, function(subfile){
    lapply(subfile, function(x) list.files(paste(folders[1],x, sep = "/"),
    pattern='*.nc', full.names=TRUE))
})

#This might work as #5. 
#It was written without reproducible code so I didn't test it  

MissionsData = lapply(filelist, function(x){
    # I don't see the j and m indexes used for any other purpose than looping
    # so I just unlist these files into a vector 
    files_i = unlist(x, recursive = TRUE)
    df_list = lapply(files_i, function(file_i){
        nc = nc_open(file_i)
        lat = ncvar_get(nc, "glat.00")
        lon = ncvar_get(nc, "glon.00")
        ssh = ncvar_get(nc, "ssh.53")
        jdn = ncvar_get(nc, "jday.00") 
        nc_close(nc)  
        return(data.frame(lat,lon,ssh,jdn))
    })
    df = do.call(rbind, df_list)
})
Sign up to request clarification or add additional context in comments.

3 Comments

thanks for your answer. it works in a very nice way.
one more question, when I have too many NetCDF files it returns an error. in your answer should nc_close(nc) be added before return(data.frame(lat,lon,ssh,jdn)) right? or not?
@Jisika Yes, that seems like the right place to put it, I updated the answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.