Is there a way to apply the dos2unix command so that it runs against all of the files in a folder and it's subfolders?
man dos2unix doesn't show any -r or similar options that would make this straight forward?
-
See also: Stack Overflow: How can I run dos2unix on an entire directory?Gabriel Staples– Gabriel Staples2023-07-19 18:08:49 +00:00Commented Jul 19, 2023 at 18:08
6 Answers
find /path -type f -print0 | xargs -0 dos2unix --
-
11Don't worry,
dos2unixskips binaries by default.Walf– Walf2017-05-26 06:38:37 +00:00Commented May 26, 2017 at 6:38 -
Better said,
dos2unixtries to skip binaries (by default). I wouldn't trust that it always succeeds. I'd feel much safer running a separate command for each file extension, as suggested in this answer.Henke– Henke2023-03-06 12:54:49 +00:00Commented Mar 6, 2023 at 12:54
Using bash:
shopt -s globstar
dos2unix **
The globstar shell option in bash enables the use of the ** glob. This works just like * but matches across / in pathnames (hence matching names in subdirectories too). This would work in a directory containing a moderate number of files in its subdirectories (not many thousands).
In the zsh and yash shells (with set -o extended-glob in yash), you would do
dos2unix **/*
In the zsh shell, you could restrict the matching by the globbing pattern to only regular files by using a glob modifier:
dos2unix **/*(.)
-
This works. If one wants to restrict to certain file extentions use this
dos2unix **/*.ext. This is same asdos2unix *.extbut (recursively) includes files in subdirectories.Andreas H.– Andreas H.2024-07-24 08:57:50 +00:00Commented Jul 24, 2024 at 8:57
Skipping binaries and hidden files were important for me:
This one worked well for me:
find . -type f -not -path '*/\.*' -exec grep -Il '.' {} \; | xargs -d '\n' -L 1 dos2unix -k
Which translates to: find all non-hidden files recursively in the current directory, then using grep, list all non-binary (-I) non-empty files, then pipe it into xargs (delimited by newlines) one file at a time to dos2unix and keep the original timestamp.
See also:
You can use find to find all of the files in a directory structure that you want to run through your dos2unix command
find /path/to/the/files -type f -exec dos2unix {} \;
Take a look at the man pages for find, there are a lot of options that you can use to specify what gets evaluated
-
This didn't work.Alex Kinman– Alex Kinman2016-04-28 21:09:55 +00:00Commented Apr 28, 2016 at 21:09
-
3Be VERY carefull running this if there is a .git directory anywhere down the file tree...it corrupted my local git repository.Aaron_H– Aaron_H2017-07-20 18:59:39 +00:00Commented Jul 20, 2017 at 18:59
How to recursively run dos2unix (or any other command) on your desired directory or path using multiple processes
This answer also implicitly covers "how to use xargs".
I've combined the best from this answer, this answer, and this answer, to make my own answer, with 3 separate solutions depending on what you need:
Run
dos2unix(or any other command) on all files in an entire directory.find . -type f -print0 | xargs -0 -n 50 -P $(nproc) dos2unix(NB: do not run the above command in a git repository or else it will botch some of the contents of your
.gitdir and make you have to re-clone the directory from scratch! For git directories, you must exclude the.gitdir. See the solutions below for that.)Run
dos2unix(or any other command) on all files, or all checked-in files, in an entire git repository:# A) Use `git ls-files` to find just the files *checked-in* to the repo. git ls-files -z | xargs -0 -n 50 -P $(nproc) dos2unix # Or B): use `find`, to find all files in this dir, period, but exclude the # `.git` dir so we don't damage the repo. # - See my answer on excluding directories using `find`: # https://stackoverflow.com/a/69830768/4561887 find . -not \( -path "./.git" -type d -prune \) -type f -print0 \ | xargs -0 -n 50 -P $(nproc) dos2unixRun
dos2unix(or any other command) on all files, or all checked-in files, in a specified directory or directories within a git repository:# 1. only in this one directory: "path/to/dir1": # A) Use `git ls-files` to find just the files checked-in to the repo. git ls-files -z -- path/to/dir1 | xargs -0 -n 50 -P $(nproc) dos2unix # Or B): use `find` to find all files in this repo dir, period. find path/to/dir1 -type f -print0 | xargs -0 -n 50 -P $(nproc) dos2unix # 2. in all 3 of these directories: # A) Use `git ls-files` to find just the files checked-in to the repo. git ls-files -z -- path/to/dir1 path/to/dir2 path/to/dir3 \ | xargs -0 -n 50 -P $(nproc) dos2unix # Or B): use `find` to find all files in these 3 repo dirs, period. Note # that by specifying specific folders you are automatically excluding the # `.git` dir, which is what you need to do. find path/to/dir1 path/to/dir2 path/to/dir3 -type f -print0 \ | xargs -0 -n 50 -P $(nproc) dos2unix
Speed:
Unfortunately, I didn't write down the time it took when I ran it, but I know that the git ls-files -z | xargs -0 -n 50 -P $(nproc) dos2unix command above converted about 1.5M files in my massive git repo in < 3 minutes. The multi-process command I used above helped a ton, allowing my computer's total CPU processing power (consisting of 20 cores) to be as high as 90% utilized overall throughout the duration of the procedure.
Explanation:
dos2unixis the command we are running viaxargs.- The
-print0infind,-0inxargs, and-zingit ls-files, all mean to "zero-separate", or "null-separate" file path listings. This way, file paths with special chars and spaces are easily separated by simply looking for the binary zero separating them. nproclists the number of CPU cores your computer has (ex: 8). So, passing-P $(nproc)says to spawn as many processes to run the command (dos2unixin our case) as we have cores. This way, we attempt to optimize the run-time by spawning one worker process per CPU core.xargsallows running individual commands from input piped to it in a stream.-n 50says to pass 50 filepaths to each process spawned to run the command (dos2unixin our case); this way, we reduce the overhead of spawning a newdos2unixprocess since we pass it many files at once to process, rather than just one or two or a few.find .finds files (-type f) in the current directory (.).git ls-fileslists all files in your git repository.--ends the options passed togit ls-filesby marking to its parser that no more options to this function will come afterwards. In this way, it knows that everything after--is going to be a list of file or folder paths.
References:
- The 3 answers I linked to above.
- Where I learned about
nproc: How to obtain the number of CPUs/cores in Linux from the command line? - My answer on How do I exclude a directory when using
find?
See also:
- How to find out line-endings in a text file? - use
fileinstead ofdos2unixin the commands above if you just want to see what the line endings currently are for all files in a given directory. - My answer: What are the file limits in Git (number and size)?
- GitHub: Configuring Git to handle line endings
- Another
xargsexample of mine, with the addition of the-I{}option to specify argument placement: How to unzip multiple files at once...using parallel operations (one CPU core per process, with as many processes as you have cores), into output directories with the same names as the zip files - Sometimes you need to use
bash -cwithxargsin order to get proper substitution, such as withdirname. See here: Stack Overflow: Why using dirname in find command gives dots for each match?- I used that trick in some of the
xargscommands to extract .zip files in my repo here: https://github.com/ElectricRCAircraftGuy/FatFs. See the readme for thosexargscommands.
- I used that trick in some of the
Use a wildcard. Like this: (If you're in the folder)
dos2unix *
or if you're outside of the folder do:
dos2unix /path/to/folder/*
-
2This doesn't run on sub-folders.Chiwda– Chiwda2023-01-10 01:13:31 +00:00Commented Jan 10, 2023 at 1:13