What constitutes a 'field' for the cut command?

Question

For example, the cut command can take a parameter -f, which according to man

select only these fields; also print any line that contains no delimiter character, unless the -s option is specified

In this context, what is a field?

slm · Accepted Answer · 2014-03-29 03:04:26Z

The term "field" is often times associated with tools such as cut and awk. A field would be similar to a columns worth of data, if you take the data and separate it using a specific character. Typically the character used to do this is a Space.

However as is the case with most tools, it's configurable. For example:

awk = awk -F"," ... - would separate by commas (i.e. ,).
cut = cut -d"," ... - would separate by commas (i.e. ,).

Examples

This first one shows how awk automatically will split on spaces.

$ echo "The rain in Spain." | awk '{print $1" "$4}'
The Spain.

This one shows how cut will split on spaces too.

$ echo "The rain in Spain." | cut -d" " -f1,4
The Spain.

Here we have a CSV list of column data that we're using cut to return columns 1 & 4.

$ echo "col1,col2,col3,co4" | cut -d"," -f1,4
col1,co4

Awk too can do this:

$ echo "col1,col2,col3,co4" | awk -F"," '{print $1","$4}'
col1,co4

Awk is also a little more adept at dealing with a variety of separation characters. Here it's dealing with Tabs along with Spaces where they're inter-mixed at the same time:

$ echo -e "The\t rain\t\t in Spain." | awk '{print $1" "$4}'
The Spain.

What about the -s switch to cut?

With respect to this switch, it's simply telling cut to not print any lines which do not contain the delimiter character specified via the -d switch.

Example

Say we had this file.

$ cat sample.txt 
This is a space string.
This is a space   and   tab string.
Thisstringcontainsneither.

NOTE: There are spaces and tabs in the 2nd string above.

Now when we process these strings using cut with and without the -s switch:

$ cut -d" " -f1-6 sample.txt 
This is a space string.
This is a space  
Thisstringcontainsneither.

$ cut -d" " -f1-6 -s sample.txt 
This is a space string.
This is a space

In the 2nd example you can see that the -s switch has omitted any strings from the output that do not contain the delimiter, Space.

l0b0 · Accepted Answer · 2018-04-29 05:37:37Z

8

A field according to POSIX is any part of a line delimited by any of the characters in IFS, the "input field separator (or internal field separator)." The default value of this is space, followed by a horizontal tabulator, followed by a newline. With Bash you can run printf '%q\n' "$IFS" to see its value.

edited Apr 29, 2018 at 5:37

answered Mar 29, 2014 at 10:21

l0b0

53.6k48 gold badges225 silver badges398 bronze badges

Do a echo '$IFS' | cat -vet to see how default value looks like in the shell.

C0deDaedalus
– C0deDaedalus

2018-04-28 09:30:24 +00:00
Commented Apr 28, 2018 at 9:30
1

IFS is used by the shell for most purposes (not all), but not by other programs and specifically not by cut which was the question asked.

dave_thompson_085
– dave_thompson_085

2018-04-29 10:08:05 +00:00
Commented Apr 29, 2018 at 10:08
Unlike awk, cut also supports only one delimiter at a time, so cut -d "$IFS" will error, whereas awk -F"[ \t\n]" works as expected

JGurtz
– JGurtz

2019-10-22 01:42:23 +00:00
Commented Oct 22, 2019 at 1:42

Add a comment |

user732user732 · Accepted Answer · 2014-03-29 01:29:14Z

It depends on the utility in question, but for cut, a "field" starts at the beginning of a line of text, and includes everything up to the first tab. The second field runs from the character after the first tab, up to the next tab. And so on for third, fourth, ... Everything between tabs, or between start-of-line and tab, or between tab and end-of-line.

Unless you specify a field delimiter with the "-d" option: cut -d: -f2 would get you everything between first and second colon (':') characters.

Other utilities have different definitions, but a tab-character is common. awk is a good fall back if cut is too strict, as awk divides fields based on one or more whitespace characters. That's a little bit more natural in a lot of situations, but you have to know a bit of syntax. To print the second field according to awk:

awk '{print $2}'

sort is the one that tricks me. My current sort man page says something like "non-blank to blank transition" for a field seperator. For some reason it takes a few tries to get sort fields defined correctly. join apparently uses "delimited by whitespace" fields, which is what awk purports to do by default.

The moral of the story is to be careful, and experiment if you don't know.

Volker Siegel · Accepted Answer · 2014-03-29 01:32:07Z

2

The term "field" is not related to linux in general, but to specific programs. So cut uses a different kind of field than sort.

With cut, you define what is a field yourself, by specifying a field delimiter with the option -d, which separates the fields in each line.

If your data is separated by colons in the lines, you can combine -d and -f to get fields (or columns) 2, 3 and 6 like this:

echo 'a:b:c::d:e:f' | cut -d : -f 2-3,6

answered Mar 29, 2014 at 1:32

Volker Siegel

17.8k6 gold badges56 silver badges81 bronze badges

Add a comment |

Community · Accepted Answer · 2020-06-11 14:16:50Z

1

When you use cut command then it takes two main arguments

-d : which stand for delimiter

-f : which stand for field to be cut from the input file

Ex. cut - d "|"  - f1, 2 input_filename

Here the output would be separated by delimiter "|" and it will cut only 2 fields from the input file

If you have following lines in your file

Alex|120000|Admin|1999

Then it will cut 2 fields which are

Alex|120000

edited Jun 11, 2020 at 14:16

CommunityBot

1

answered Apr 29, 2018 at 5:49

Shah Honey

744 bronze badges

Your example is completely broken due to incorrect spaces, and even if correct this adds nothing to answers given 4 years ago.

dave_thompson_085
– dave_thompson_085

2018-04-29 10:09:59 +00:00
Commented Apr 29, 2018 at 10:09

Add a comment |

Laurence Renshaw · Accepted Answer · 2014-04-04 03:28:19Z

cut is great for simple cases, where the delimiter is a single character and you want to output a subset of the input fields, in the same order (even if I specify -f3,2,1, it acts the same as -f1,2,3).

awk one-liners are much more flexible, e.g. when the input field separator might be any whitespace (awk's default), or when you want to output fields in a different order or with a particular format.

For example wc -l myfile | awk '{print $1}' or ls -l file1 file2 | awk '{printf "%s,%s:%s\n", $9, $7, $3}' are very simple, but would be hard to do with cut.

I agree with earlier posters that fields/keys in sort are tough to figure out! Fields in join seem to work the same as in cut, although join options are easy to get wrong.

Stack Exchange Network

What constitutes a 'field' for the cut command?

6 Answers 6

Examples

What about the -s switch to cut?

Example

You must log in to answer this question.

Hot Network Questions

What constitutes a 'field' for the cut command?

6 Answers 6

Examples

What about the -s switch to cut?

Example

You must log in to answer this question.

Related

Hot Network Questions