Return to Answer

adjusted due to data changing

Source Link

edit approved Apr 11, 2019 at 12:20

Ignoring the header (which you can tack on later):

awk -F, 'NR > 1 {print > $2}' use_rep

which will print each line to a file named by the second column:

~ head *[0-9]*
==> 100K+ <==
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
22565,200K+
7453,200K+

==> 500K+ <==
885,500K+

==> <100K <==
10762,<100K

To put the header, maybe something like:

awk -F, 'NR == 1 {header = $0; next} # save header, skip this line
  !a[$2]++ { print header > $2 } # print if second field hasnt been seen before 
  { print > $2 }' use_rep

Result:

~ head *[0-9]*
==> 100K+ <==
user_id,rep
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
user_id,rep
22565,200K+
7453,200K+

==> 500K+ <==
user_id,rep
885,500K+

==> <100K <==
user_id,rep
10762,<100K
70524,<100K

Ignoring the header (which you can tack on later):

awk -F, 'NR > 1 {print > $2}' use_rep

which will print each line to a file named by the second column:

~ head *[0-9]*
==> 100K+ <==
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
22565,200K+
7453,200K+

==> 500K+ <==
885,500K+

==> <100K <==
10762,<100K

To put the header, maybe something like:

awk -F, 'NR == 1 {header = $0; next} # save header, skip this line
  !a[$2]++ { print header > $2 } # print if second field hasnt been seen before 
  { print > $2 }' use_rep

Result:

~ head *[0-9]*
==> 100K+ <==
user_id,rep
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
user_id,rep
22565,200K+
7453,200K+

==> 500K+ <==
user_id,rep
885,500K+

==> <100K <==
user_id,rep
10762,<100K

Ignoring the header (which you can tack on later):

awk -F, 'NR > 1 {print > $2}' use_rep

which will print each line to a file named by the second column:

~ head *[0-9]*
==> 100K+ <==
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
22565,200K+
7453,200K+

==> 500K+ <==
885,500K+

==> <100K <==
10762,<100K

To put the header, maybe something like:

awk -F, 'NR == 1 {header = $0; next} # save header, skip this line
  !a[$2]++ { print header > $2 } # print if second field hasnt been seen before 
  { print > $2 }' use_rep

Result:

~ head *[0-9]*
==> 100K+ <==
user_id,rep
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
user_id,rep
22565,200K+
7453,200K+

==> 500K+ <==
user_id,rep
885,500K+

==> <100K <==
user_id,rep
10762,<100K
70524,<100K

Source Link

answered Apr 11, 2019 at 10:56

muru

77.9k
16
212
317

Ignoring the header (which you can tack on later):

awk -F, 'NR > 1 {print > $2}' use_rep

which will print each line to a file named by the second column:

~ head *[0-9]*
==> 100K+ <==
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
22565,200K+
7453,200K+

==> 500K+ <==
885,500K+

==> <100K <==
10762,<100K

To put the header, maybe something like:

awk -F, 'NR == 1 {header = $0; next} # save header, skip this line
  !a[$2]++ { print header > $2 } # print if second field hasnt been seen before 
  { print > $2 }' use_rep

Result:

~ head *[0-9]*
==> 100K+ <==
user_id,rep
86440,100K+
116858,100K+
22222,100K+
38906,100K+

==> 200K+ <==
user_id,rep
22565,200K+
7453,200K+

==> 500K+ <==
user_id,rep
885,500K+

==> <100K <==
user_id,rep
10762,<100K