I have tens of thousands of directories. Each directory is named by number, like 1, 2, 3,... Each directory contains a large .dat file called data.dat and each file has a section that looks like this:
Configurations for Sm:
Sm Nd H O
0 1 4 0 1.00 7.14%
1 0 3 0 3.00 7.14%
0 0 5 0 1.00 7.14%
I care about the first two numbers on each line. I want:
- All of the lines that start with
0 1(in this example, that's the first line of numbers) to end up in a new file called0-1.datwith the file name (number) at the start of the line. An example is below, called "example." - Likewise, all of the lines that begin with
1 0(here the second line) should end up in a file called1-0.datwith the file number at the start of the line. - All lines that begin with
0 0(here the third line) should go to a file called0-0.dat.
Complications for finding the lines I need are:
- Sometimes one of the lines might be missing or the lines might be in different order.
- Also, each file has many sections called
Configurations for X, where X is some string. So I do need to somehow use the identifierConfigurations for Sm:and search the first set of numbers below it.
Example of what I want to achieve, where the first number on the line is the directory name/number containing the file from which the line was extracted:
Example
In file called 0-1.txt:
1 0 1 4 0 1.00 7.14%
2 0 1 7 1 1.00 7.14%
3 0 1 ....
In file called 1-0.txt:
1 1 0 1 0 1.00 7.14%
2 1 0 4 2 1.00 7.14%
3 1 0 ....
I currently have:
find . -name data.dat -exec grep "Configurations for Sm:" {} + > 0-1.txt
All this does though, is put anything that would come after Configurations for Sm: in a separate file. I just cannot figure out how to do what I need to do--find lines below Configurations for Sm: by their number contents. If anyone has any hints or could direct me to an online resource, I would be very grateful. Thank you.
Configurations for Sm:in any of your initial data.dat files ?