For example, the cut command can take a parameter -f, which according to man
select only these fields; also print any line that contains no delimiter character, unless the -s option is specified
In this context, what is a field?
The term "field" is often times associated with tools such as cut and awk. A field would be similar to a columns worth of data, if you take the data and separate it using a specific character. Typically the character used to do this is a Space.
However as is the case with most tools, it's configurable. For example:
awk -F"," ... - would separate by commas (i.e. ,).cut -d"," ... - would separate by commas (i.e. ,).This first one shows how awk automatically will split on spaces.
$ echo "The rain in Spain." | awk '{print $1" "$4}'
The Spain.
This one shows how cut will split on spaces too.
$ echo "The rain in Spain." | cut -d" " -f1,4
The Spain.
Here we have a CSV list of column data that we're using cut to return columns 1 & 4.
$ echo "col1,col2,col3,co4" | cut -d"," -f1,4
col1,co4
Awk too can do this:
$ echo "col1,col2,col3,co4" | awk -F"," '{print $1","$4}'
col1,co4
Awk is also a little more adept at dealing with a variety of separation characters. Here it's dealing with Tabs along with Spaces where they're inter-mixed at the same time:
$ echo -e "The\t rain\t\t in Spain." | awk '{print $1" "$4}'
The Spain.
With respect to this switch, it's simply telling cut to not print any lines which do not contain the delimiter character specified via the -d switch.
Say we had this file.
$ cat sample.txt
This is a space string.
This is a space and tab string.
Thisstringcontainsneither.
NOTE: There are spaces and tabs in the 2nd string above.
Now when we process these strings using cut with and without the -s switch:
$ cut -d" " -f1-6 sample.txt
This is a space string.
This is a space
Thisstringcontainsneither.
$ cut -d" " -f1-6 -s sample.txt
This is a space string.
This is a space
In the 2nd example you can see that the -s switch has omitted any strings from the output that do not contain the delimiter, Space.
A field according to POSIX is any part of a line delimited by any of the characters in IFS, the "input field separator (or internal field separator)." The default value of this is space, followed by a horizontal tabulator, followed by a newline. With Bash you can run printf '%q\n' "$IFS" to see its value.
echo '$IFS' | cat -vet to see how default value looks like in the shell.
cut which was the question asked.
cut -d "$IFS" will error, whereas awk -F"[ \t\n]" works as expected
It depends on the utility in question, but for cut, a "field" starts at the beginning of a line of text, and includes everything up to the first tab. The second field runs from the character after the first tab, up to the next tab. And so on for third, fourth, ... Everything between tabs, or between start-of-line and tab, or between tab and end-of-line.
Unless you specify a field delimiter with the "-d" option: cut -d: -f2 would get you everything between first and second colon (':') characters.
Other utilities have different definitions, but a tab-character is common. awk is a good fall back if cut is too strict, as awk divides fields based on one or more whitespace characters. That's a little bit more natural in a lot of situations, but you have to know a bit of syntax. To print the second field according to awk:
awk '{print $2}'
sort is the one that tricks me. My current sort man page says something like
"non-blank to blank transition" for a field seperator. For some reason it takes a few tries to get sort fields defined correctly. join apparently uses "delimited by whitespace" fields, which is what awk purports to do by default.
The moral of the story is to be careful, and experiment if you don't know.
The term "field" is not related to linux in general, but to specific programs. So cut uses a different kind of field than sort.
With cut, you define what is a field yourself, by specifying a field delimiter with the option -d, which separates the fields in each line.
If your data is separated by colons in the lines, you can combine -d and -f to get fields (or columns) 2, 3 and 6 like this:
echo 'a:b:c::d:e:f' | cut -d : -f 2-3,6
When you use cut command then it takes two main arguments
-d : which stand for delimiter
-f : which stand for field to be cut from the input file
Ex. cut - d "|" - f1, 2 input_filename
Here the output would be separated by delimiter "|" and it will cut only 2 fields from the input file
If you have following lines in your file
Alex|120000|Admin|1999
Then it will cut 2 fields which are
Alex|120000
cut is great for simple cases, where the delimiter is a single character and you want to output a subset of the input fields, in the same order (even if I specify -f3,2,1, it acts the same as -f1,2,3).
awk one-liners are much more flexible, e.g. when the input field separator might be any whitespace (awk's default), or when you want to output fields in a different order or with a particular format.
For example wc -l myfile | awk '{print $1}' or ls -l file1 file2 | awk '{printf "%s,%s:%s\n", $9, $7, $3}' are very simple, but would be hard to do with cut.
I agree with earlier posters that fields/keys in sort are tough to figure out!
Fields in join seem to work the same as in cut, although join options are easy to get wrong.