1

Im trying to obtain from this file:

A4690021|product.actor|Laurel Cronin
A4690021|product.actor|Bob Hoskins
A4690021|product.actor|Caroline Goodall
A4690021|product.actor|Julia Roberts
A4690021|product.actor|Maggie Smith
A4690021|product.actor|Amber Scott
A4690021|product.actor|Charlie Korsmo
A4690021|product.actor|Robin Williams
A4690021|product.actor|Dustin Hoffman
A4690021|product.director|Steven Spielberg
A4690021|product.bestSellers|Offline Best Seller
A4690021|product.bestSellers|Online Best Seller
A4690021|product.parentCategory.id|999.54215013.999.54216013
A4690021|product.storeName|Cine
A4690021|product.parentCat.displayName|Infantil
A19129625|product.author|. VV.AA.

This result:

A4690021|product.actor|Laurel Cronin,Bob Hoskins,Caroline Goodall,Julia Roberts,Maggie Smith,Amber Scott,Charlie Korsmo,Robin Williams,Dustin Hoffman
A4690021|product.director|Steven Spielberg
A4690021|product.bestSellers|Offline Best Seller,
A4690021|product.parentCategory.id|999.54215013.999.54216013
A4690021|product.storeName|Cine
A4690021|product.parentCat.displayName|Infantil
A19129625|product.author|. VV.AA.

Trying to use this awk script:

awk -F'|' '{a[$1"|"$2]=a[$1"|"$2]","$3}END{for(x in a)print x""a[x]}' SEARCH_ECISTORE_PRD_MULTI_ES_s.csv

But i don't know why i'm obtaining this:

A4690021|product.storeName,Cine
A4690021|product.parentCategory.id,999.54215013.999.54216013
A19129625|product.author,. VV.AA.
A4690021|product.director,Steven Spielberg
,Dustin Hoffmanllactor,Laurel Cronin
A4690021|product.parentCat.displayName,Infantil
,Online Best SellerstSellers,Offline Best Seller

I'm having an issue when reading actors, it seems $3 contains the proper value but when aggregating them they kind of merge in strange ways..

1
  • I suspect stray characters in your source file. Was it prepared on Windows? If not, can you run od -c SEARCH_ECISTORE_PRD_MULTI_ES_s.csv and report the results. Commented Aug 1, 2016 at 19:52

3 Answers 3

0

To fix the field dividers for the last field:

$ awk -F'|' '{a[$1"|"$2]=a[$1"|"$2]","$3} END{for(x in a)print x"|"substr(a[x],2)}' file.csv
A4690021|product.storeName|Cine
A4690021|product.parentCategory.id|999.54215013.999.54216013
A19129625|product.author|. VV.AA.
A4690021|product.director|Steven Spielberg
A4690021|product.actor|Laurel Cronin,Bob Hoskins,Caroline Goodall,Julia Roberts,Maggie Smith,Amber Scott,Charlie Korsmo,Robin Williams,Dustin Hoffman
A4690021|product.parentCat.displayName|Infantil
A4690021|product.bestSellers|Offline Best Seller,Online Best Seller

Line-wrapping issue

Let's convert the source file to DOS/Windows format:

$ unix2dos <file.csv >file.dos
$ awk -F'|' '{a[$1"|"$2]=a[$1"|"$2]","$3} END{for(x in a)print x"|"substr(a[x],2)}' file.dos
A4690021|product.storeName|Cine
A4690021|product.parentCategory.id|999.54215013.999.54216013
A19129625|product.author|. VV.AA.
A4690021|product.director|Steven Spielberg
,Dustin Hoffmanllactor|Laurel Cronin
A4690021|product.parentCat.displayName|Infantil
,Online Best SellerstSellers|Offline Best Seller

This looks like your problematic output.

The solution is to run dos2unix or other utility on it to remove the DOS/Windows line-endings.

0

Possible duplicate (because I do some time the same)

sed ':a;$!N;s/\(\(.*|\)[^|]\+\)\n\2/\1, /;ta;P;D' SEARCH_ECISTORE_PRD_MULTI_ES_s.csv

will produce

A4690021|product.actor|Laurel Cronin, Bob Hoskins, Caroline Goodall, Julia Roberts, Maggie Smith, Amber Scott, Charlie Korsmo, Robin Williams, Dustin Hoffman
A4690021|product.director|Steven Spielberg
A4690021|product.bestSellers|Offline Best Seller, Online Best Seller
A4690021|product.parentCategory.id|999.54215013.999.54216013
A4690021|product.storeName|Cine
A4690021|product.parentCat.displayName|Infantil
A19129625|product.author|. VV.AA.
0

With GNU datamash:

datamash -t\| -g 1,2 collapse 3 <SEARCH_ECISTORE_PRD_MULTI_ES_s.csv

Result:

A4690021|product.actor|Laurel Cronin,Bob Hoskins,Caroline Goodall,Julia Roberts,Maggie Smith,Amber Scott,Charlie Korsmo,Robin Williams,Dustin Hoffman
A4690021|product.director|Steven Spielberg
A4690021|product.bestSellers|Offline Best Seller,Online Best Seller
A4690021|product.parentCategory.id|999.54215013.999.54216013
A4690021|product.storeName|Cine
A4690021|product.parentCat.displayName|Infantil
A19129625|product.author|. VV.AA.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.