5

I have a problem with the command join. "The default join field is the first, delimited by whitespace" (Cited from join --help). However, there is a field containing sentences in my tab-delimted files. Thus, I want to join the two files using -t\t (I also tried -t "\t" which reported errors under Cygwin, but not under CentOS). Unexpectedly, the command outputted the fields in two consecutive lines. I have processed the two files with dos2unix and sort.

The example of output is as follows. The 1st and 3rd lines are from file1, and the 2nd and 4th lines are from file2. The 1st and 2nd lines should appear in the same line. However, if -t\t is used, they appear in two consecutive lines (as below); if no -t, they appear in the same line.

LM00089 0.6281  0       Q27888  L-lactate dehydrogenase
LM00089 gi|2497622|sp|Q27888|LDH_CAEEL  0.6281  0.422
LM00136 0.3219  0.376741        O62619  Pyruvate kinase
LM00136 gi|27923979|sp|O62619|KPYK_DROME        0.3219  0.111

I want to know whether it is a bug or I made some mistakes.

3
  • 2
    If you are using bash, you may try join -t $'\t' Commented Sep 1, 2012 at 15:52
  • it is no copy-paste safe but you can type '<ctrl-v><tab>' to get tab character in bash command line Commented Sep 1, 2012 at 17:30
  • Thank you, enzotib and Cougar. Both solutions work well. Commented Sep 2, 2012 at 4:55

1 Answer 1

4

-t \t passes t as the separator: an unquoted backslash always takes the next character literally (except when the next character is a newline). -t "\t" passes \t as the separator, different versions of join may behave differently when you pass multiple characters.

To pass a tab, from bash, use -t $'\t'. The $'…' syntax mimics the feature of C and many other languages where \ followed by letters designate control characters, and \ can be followed by octal digits.

Another way is to put a literal tab in your script (between single or double quotes). This isn't very readable.

If you need portability to all POSIX shells such as dash, use

tab=$(printf '\t')
join -t "$tab" …

or directly join -t "$(printf '\t')" ….

1
  • The variable can be avoided with join -t "$(printf '\t')" Commented Nov 18, 2018 at 20:44

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.