Skip to main content
3 of 5
edited body
Antony
  • 131
  • 1
  • 5

Copy files from a hdfs folder to another hdfs location by filtering with modified time using shell script

I have 1 year data in my hdfs location and i want to copy data for last 3 months into another folder. Is it possible to copy data only for 3 months directly from hdfs command or do we need to write shell script for copying data for last 3 months?

Below is the command i tried for copying data.

hdfs dfs -cp  /hive/warehouse/data.db/all_history/copy_55(SOURCE) /user/can_anns/all_history_copy/(DESTINATION)

Even tried the below:

for i in `hdfs dfs -ls /hive/warehouse/data.db/all_history/|grep "2019-03-01"|awk '{print $8}'`;do echo ${i}; hdfs dfs -cp ${i} /user/can_anns/all_history_copy/;
  1. Do we have hdfs code that will copy only data for a particular interval of time?
  2. If we have to write a shell script to do this, how to achieve this?
Antony
  • 131
  • 1
  • 5