0

I have a big file with over 1 million columns and 800 rows. The first row is the chromosome's name(Ha412HOChr01,Ha412HOChr02,.....Ha412HOChr17) with the SNP position on each chromosome. in Total their are 17 chromosomes. I want to extract columns for each chromosome (Ha412HOChr01,Ha412HOChr01,Ha412HOChr01,....,Ha412HOChr17) and store them in a separate file.

"Ha412HOChr01:180159" "Ha412HOChr01:210724" "Ha412HOChr01:303270" "Ha412HOChr01:303280"....... "Ha412HOChr17:303402"
0 1 0 0 ......0
0 1 0 0 ......0
0 1 0 0 ......0
0 2 0 0 ......0
0 1 1 1 ......1
0 2 0 0 ......0

my desired output for example for chromosome 1:

out.chrom1
"Ha412HOChr01:180159" "Ha412HOChr01:210724" "Ha412HOChr01:303270" "Ha412HOChr01:303280" 
0 1 0 0 
0 1 0 0
0 1 0 0 
0 2 0 0 
0 1 1 1 
0 2 0 0 
0 0 0 0 
0 2 0 0
0 1 2 2 
3
  • did you have tried my answer unix.stackexchange.com/a/545356/195582 ? Do you need more infos? Commented Oct 9, 2019 at 7:00
  • I could not get miller running Commented Oct 10, 2019 at 15:21
  • what's your operative system? Commented Oct 10, 2019 at 16:33

1 Answer 1

0

If your field separator is one space, using Miller (https://github.com/johnkerl/miller) you can run

mlr --csv --fs " " cut -r -f "Ha412HOChr01:" input.txt

to obtain "Ha412HOChr01" data

Ha412HOChr01:180159 Ha412HOChr01:210724 Ha412HOChr01:303270 Ha412HOChr01:303280
0 1 0 0
0 1 0 0
0 1 0 0
0 2 0 0
0 1 1 1
0 2 0 0

Then you can create a for loop and create all your files.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.