Skip to main content
deleted 218 characters in body; edited title
Source Link
Antony
  • 131
  • 1
  • 5

Copy files from a hdfs folder to another hdfs location by filtering with modified timedate using shell script

I have 1 year data in my hdfs location and i want to copy data for last 3 6 months into another folderhdfs location. Is it possible to copy data only for 36 months directly from hdfs command or do we need to write shell script for copying data for last 36 months?

Below is the command iI have tried hdfs commands for copying dataperforming this, but didn't work.

hdfs dfs -cp  /hive/warehouse/data.db/all_history/copy_55(SOURCE) /user/can_anns/all_history_copy/(DESTINATION)

EvenI tried with the below: shell script and it was working fine till creating TempFile but throwing an error

for i in `hdfs dfs$ -lssh /hive/warehouse/datascriptnew.db/all_history/|grep "2019-03-01"|awk '{printsh
scriptnew.sh: $8}'`;doline echo8: ${i};syntax hdfserror: dfsunexpected -cpend ${i}of /user/can_anns/all_history_copy/;file

  1. Do we have hdfs code that will copy only data for a particular interval of time?
  2. If we have to write a shell script to do this, how to achieve this?

and script is not executed further.

Below is the hdfs commandshell script which i tried but getting syntax error exception. It is working till printing part and failing because of cp -p commandused.

#!/bin/bash
hdfs dfs -ls /hive/warehouse/data.db/all_history/copy_55 | tr -s " " | cut -d" " -f6-8 | awk|awk 'BEGIN{ MIN=259200; LAST=60*MIN;SIXMON=60*60*24*180; "date +%s" | getline NOW } { cmd="date -d'\''"$1"d'\''"$6" "$2"'\''"$7"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN;SIXMON; if(DIFFWHEN <> LASTDIFF){print $3;$8}}' system("hdfs>> TempFile.txt
cat TempFile.txt |while read line
do
    echo $i
    hdfs dfs -cp -p "$3$line /user/can_anns/all_history_copy/}}}';
done

What might be the error and how to resolve this ?

Copy files from a hdfs folder to another hdfs location by filtering with modified time using shell script

I have 1 year data in my hdfs location and i want to copy data for last 3 months into another folder. Is it possible to copy data only for 3 months directly from hdfs command or do we need to write shell script for copying data for last 3 months?

Below is the command i tried for copying data.

hdfs dfs -cp  /hive/warehouse/data.db/all_history/copy_55(SOURCE) /user/can_anns/all_history_copy/(DESTINATION)

Even tried the below:

for i in `hdfs dfs -ls /hive/warehouse/data.db/all_history/|grep "2019-03-01"|awk '{print $8}'`;do echo ${i}; hdfs dfs -cp ${i} /user/can_anns/all_history_copy/;
  1. Do we have hdfs code that will copy only data for a particular interval of time?
  2. If we have to write a shell script to do this, how to achieve this?

Below is the hdfs command i tried but getting syntax error exception. It is working till printing part and failing because of cp -p command.

hdfs dfs -ls /hive/warehouse/data.db/all_history/copy_55 | tr -s " " | cut -d" " -f6-8 | awk 'BEGIN{ MIN=259200; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF < LAST){print $3; system("hdfs dfs -cp -p "$3 /user/can_anns/all_history_copy/}}}'

Copy files from a hdfs folder to another hdfs location by filtering with modified date using shell script

I have 1 year data in my hdfs location and i want to copy data for last 6 months into another hdfs location. Is it possible to copy data only for 6 months directly from hdfs command or do we need to write shell script for copying data for last 6 months?

I have tried hdfs commands for performing this, but didn't work.

I tried with the below shell script and it was working fine till creating TempFile but throwing an error

$ sh scriptnew.sh
scriptnew.sh: line 8: syntax error: unexpected end of file

and script is not executed further.

Below is the shell script which i used.

#!/bin/bash
hdfs dfs -ls /hive/warehouse/data.db/all_history/ |awk 'BEGIN{ SIXMON=60*60*24*180; "date +%s" | getline NOW } { cmd="date -d'\''"$6" "$7"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-SIXMON; if(WHEN > DIFF){print $8}}' >> TempFile.txt
cat TempFile.txt |while read line
do
    echo $i
    hdfs dfs -cp -p $line /user/can_anns/all_history_copy/;
done

What might be the error and how to resolve this ?

added 467 characters in body
Source Link
Antony
  • 131
  • 1
  • 5

I have 1 year data in my hdfs location and i want to copy data for last 3 months into another folder. Is it possible to copy data only for 3 months directly from hdfs command or do we need to write shell script for copying data for last 3 months?

Below is the command i tried for copying data.

hdfs dfs -cp  /hive/warehouse/data.db/all_history/copy_55(SOURCE) /user/can_anns/all_history_copy/(DESTINATION)

Even tried the below:

for i in `hdfs dfs -ls /hive/warehouse/data.db/all_history/|grep "2019-03-01"|awk '{print $8}'`;do echo ${i}; hdfs dfs -cp ${i} /user/can_anns/all_history_copy/;
  1. Do we have hdfs code that will copy only data for a particular interval of time?
  2. If we have to write a shell script to do this, how to achieve this?

Below is the hdfs command i tried but getting syntax error exception. It is working till printing part and failing because of cp -p command.

hdfs dfs -ls /hive/warehouse/data.db/all_history/copy_55 | tr -s " " | cut -d" " -f6-8 | awk 'BEGIN{ MIN=259200; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF < LAST){print $3; system("hdfs dfs -cp -p "$3 /user/can_anns/all_history_copy/}}}'

I have 1 year data in my hdfs location and i want to copy data for last 3 months into another folder. Is it possible to copy data only for 3 months directly from hdfs command or do we need to write shell script for copying data for last 3 months?

Below is the command i tried for copying data.

hdfs dfs -cp  /hive/warehouse/data.db/all_history/copy_55(SOURCE) /user/can_anns/all_history_copy/(DESTINATION)

Even tried the below:

for i in `hdfs dfs -ls /hive/warehouse/data.db/all_history/|grep "2019-03-01"|awk '{print $8}'`;do echo ${i}; hdfs dfs -cp ${i} /user/can_anns/all_history_copy/;
  1. Do we have hdfs code that will copy only data for a particular interval of time?
  2. If we have to write a shell script to do this, how to achieve this?

I have 1 year data in my hdfs location and i want to copy data for last 3 months into another folder. Is it possible to copy data only for 3 months directly from hdfs command or do we need to write shell script for copying data for last 3 months?

Below is the command i tried for copying data.

hdfs dfs -cp  /hive/warehouse/data.db/all_history/copy_55(SOURCE) /user/can_anns/all_history_copy/(DESTINATION)

Even tried the below:

for i in `hdfs dfs -ls /hive/warehouse/data.db/all_history/|grep "2019-03-01"|awk '{print $8}'`;do echo ${i}; hdfs dfs -cp ${i} /user/can_anns/all_history_copy/;
  1. Do we have hdfs code that will copy only data for a particular interval of time?
  2. If we have to write a shell script to do this, how to achieve this?

Below is the hdfs command i tried but getting syntax error exception. It is working till printing part and failing because of cp -p command.

hdfs dfs -ls /hive/warehouse/data.db/all_history/copy_55 | tr -s " " | cut -d" " -f6-8 | awk 'BEGIN{ MIN=259200; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF < LAST){print $3; system("hdfs dfs -cp -p "$3 /user/can_anns/all_history_copy/}}}'
edited body
Source Link
Antony
  • 131
  • 1
  • 5

I have 1 year data in my hdfs location and i want to copy data for last 3 months into another folder. Is it possible to copy data only for 3 months directly from hdfs command or do we need to write shell script for copying data for last 3 months?

Below is the command i tried for copying data.

hdfs dfs -cp  /hive/warehouse/data.db/all_history/copy_55(SOURCE) /user/can_anns/all_history_copy/(DESTINATION)

Even tried the below:

for i in `hdfs dfs -ls /hive/warehouse/data.db/all_history/|grep "2019-03-01"|awk '{print $8}'`;do echo ${i}; hdfs dfs -cp ${i} /user/can_anns/all_history_copy/;
  1. Do we have hdfs code that will copy only data for a particular interval of time?
  2. If we have to write a shell script to do this, how to achieve thatthis?

I have 1 year data in my hdfs location and i want to copy data for last 3 months into another folder. Is it possible to copy data only for 3 months directly from hdfs command or do we need to write shell script for copying data for last 3 months?

Below is the command i tried for copying data.

hdfs dfs -cp  /hive/warehouse/data.db/all_history/copy_55(SOURCE) /user/can_anns/all_history_copy/(DESTINATION)

Even tried the below:

for i in `hdfs dfs -ls /hive/warehouse/data.db/all_history/|grep "2019-03-01"|awk '{print $8}'`;do echo ${i}; hdfs dfs -cp ${i} /user/can_anns/all_history_copy/;
  1. Do we have hdfs code that will copy only data for a particular interval of time?
  2. If we have to write a shell script to do this, how to achieve that?

I have 1 year data in my hdfs location and i want to copy data for last 3 months into another folder. Is it possible to copy data only for 3 months directly from hdfs command or do we need to write shell script for copying data for last 3 months?

Below is the command i tried for copying data.

hdfs dfs -cp  /hive/warehouse/data.db/all_history/copy_55(SOURCE) /user/can_anns/all_history_copy/(DESTINATION)

Even tried the below:

for i in `hdfs dfs -ls /hive/warehouse/data.db/all_history/|grep "2019-03-01"|awk '{print $8}'`;do echo ${i}; hdfs dfs -cp ${i} /user/can_anns/all_history_copy/;
  1. Do we have hdfs code that will copy only data for a particular interval of time?
  2. If we have to write a shell script to do this, how to achieve this?
added 202 characters in body
Source Link
Antony
  • 131
  • 1
  • 5
Loading
Source Link
Antony
  • 131
  • 1
  • 5
Loading