I have a large folder containing many sub-directories each holding many .txt files. I want to concatenate all of these files into one .txt file. I am able to do it for each of the sub-directories with cat *.txt>merged.txt, but I am trying to do it for all of the files in the large folder. How do I do this?
3 Answers
try with
find /path/to/source -type f -name '*.txt' -exec cat {} + >mergedfile
find all '*.txt' files in /path/to/source recursively for sub-directories and concatenate all into one mergedfile.
To concatenate each sub-directories files within its directory, do:
find . -mindepth 1 -type d -execdir sh -c 'cat $1/*.txt >> $1/mergedfile' _ {} \;
-
>>can be>in the firstfindcall.2018-06-04 05:36:00 +00:00Commented Jun 4, 2018 at 5:36 -
@Kusalananda won't that truncate the mergedfile if ARG_MAX exceed?αғsнιη– αғsнιη2018-06-04 05:49:33 +00:00Commented Jun 4, 2018 at 5:49
-
The
>redirects the output offind, notcat. Thecatcommand ends at the+, and you can't do redirections in-execwithout using a child shell (sh -c). In your second example, you won't need it either as you do one directory at a time.2018-06-04 05:52:03 +00:00Commented Jun 4, 2018 at 5:52 -
Actually, that second example won't work. Since
-execdiris already executing with the directory as the working directory, you should get rid of$1/in the command.2018-06-04 05:56:39 +00:00Commented Jun 4, 2018 at 5:56 -
1@DerekMahar
_there is 0th argument to thesh -c '....'and{}is the 1st. when you remove the_, the{}is being 0th argument while thesh -c '...'perform and do stuff on the$1argument but now there is no 1st ($1) argument. why we don't use the{}as the first argument because in general always the 1st argument is the script name and all errors/warning/... will use that name prefixed to alert where things go wrong.αғsнιη– αғsнιη2023-03-29 14:30:27 +00:00Commented Mar 29, 2023 at 14:30
If you are using Bash and the number of text files is contained (i.e. does not exceed the maximum argument number limit, which is very large but not infinite), you can easily achieve this with the globstar feature:
$ shopt -s globstar
$ cat **/*.txt > merged.txt
A more general, although less elegant approach, will be to use find as the driver and make it call cat on each file, appending the output:
$ find -name \*.txt -exec sh -c 'cat {} >> merged.out' \;
Calling sh is needed here because you want to append the result of each cat. Make sure the output file has a different extension or lies outside of the tree you're merging, or find may try to concatenate the output with itself.
If you have to do the concatenation in a particular order, then the below will concatenate the files in lexicographical order (sorted by pathnames) in bash:
shopt -s globstar
for name in **/*.txt; do
[ -f "$name" ] && cat <"$name"
done >merged.out
This is similar to the find command
find . -type f -name '*.txt' -exec cat {} ';' >merged.out
except that the ordering may be different, symbolic links to regular files would be included (add a && [ ! -L "$name" ] if you don't want them) and hidden files (and files in hidden directories) would be excluded (use shopt -s dotglob to add them back).
-
what else your first command does that the same in my answer doesn't?αғsнιη– αғsнιη2018-06-04 06:14:42 +00:00Commented Jun 4, 2018 at 6:14
-
@αғsнιη Absolutely nothing now when you've changed your answer. I will modify that part. Thanks for letting me know.2018-06-04 06:26:59 +00:00Commented Jun 4, 2018 at 6:26
-
Does bash guarantee that
**/*.txtsorts the pathnames in lexicographical order?Derek Mahar– Derek Mahar2023-03-29 22:27:00 +00:00Commented Mar 29, 2023 at 22:27 -
1@DerekMahar Yes, the list resulting from expanding a globbing pattern is guaranteed to be lexicographically sorted. From the POSIX standard: "If the pattern matches any existing filenames or pathnames, the pattern shall be replaced with those filenames and pathnames, sorted according to the collating sequence in effect in the current locale."2023-03-29 22:36:36 +00:00Commented Mar 29, 2023 at 22:36