0

I have the following data string from a table and it looks like this:

The data format is correct, each row goes to a new line due to the amount of data.

const data = 
       "0 | john | doe | 
        US | Employed 
        1 | bob| dylan | 
        US | Unemployed "

How can I efficiently map data to the correct format? I want to output to an array of objects per row like below:

[{rowId: 0, name: "john", surname: "doe"}. {...}]

I initially tried to split by pipe it but it looked like this:

["0 ", " john ", " doe ", "US ", " Employed 1", " bob", " dylan ", "US ", " Unemployed"]
9
  • What format is the source data in? Like CSV? Commented Jul 15, 2021 at 8:46
  • Will every row have the same number of elements, regardless of line breaks, and are they strictly ordered? If so just iterate through your array after splitting and count out the elements. Commented Jul 15, 2021 at 8:47
  • @pilchard every row will have same number of columns and yes order will never change. Commented Jul 15, 2021 at 8:51
  • you can trim strings Commented Jul 15, 2021 at 8:51
  • 1
    @pilchard they are line breaks Commented Jul 15, 2021 at 9:45

4 Answers 4

2

Example below

const data =
  "0 | john | doe |   US | Employed  1 | bob| dylan |  US | Unemployed ";

const arr = data
  .split("|")
  .flatMap(el => el.split(" "))
  .filter(el => el !== "");

const output = [];
for (let i = 0; i < arr.length; i += 5) {
  output.push({
    rowId: arr[i],
    name: arr[i + 1],
    surname: arr[i + 2],
    country: arr[i + 3],
    status: arr[i + 4],
  });
}

console.log(output);

Sign up to request clarification or add additional context in comments.

7 Comments

You've conveniently removed the line breaks?
@pilchard, There is no line break in between if you look at data because OP use double quote for the data object. Also, OP mention he used split("|") and get a array object from it.
That could easily be a typing issue and not reflective of the actual data. I would guess that there is a combination of wrap and line breaks at work, I've asked the OP to clarify.
@pilchard, OP mentioned split by pipe and get ["0 ", " john ", " doe ", "US ", " Employed 1", " bob", " dylan ", "US ", " Unemployed"] That's why I think there is no line break
Yeah, that's a fair point, I'll edit mine on the same assumption
|
1

Assuming every row will have same number of columns and order will never change

// const data = `0 | john | doe | US | Employed | 1 | bob| dylan | US | Unemployed`; <= My assumption of data was incorrect.

const data = `0 | john | doe | US | Employed 1 | bob| dylan | US | Unemployed`;

const cleanedUp = data.split("|").flatMap(d => d.split(' ')).filter(d => d!== '');
console.log(cleanedUp);
const result = [];
for (var i = 0; i < cleanedUp.length; i += 5) { // i+=5 can solve your problem
  result.push({
    rowId: cleanedUp[i],
    name: cleanedUp[i + 1],
    surname: cleanedUp[i + 2]
  })
}

console.log(result)

The for loop will consider 5 elements as one row. Assuming each row starts with a numeric value like 0,1,2.. and so on

6 Comments

I did try something like this but if you can see my split output the last column of the row does not end with | how can I know that it's the next row
Since you mentioned each row will have the same number of columns, after we do a split using |, the for loop will consider 5 elements as one row. Assuming each row starts with a numeric value like 0,1,2.. and so on
Also, the output of split in your post is incorrect as per the data you provided.
I see, I guess i += 5 will determine each row. Thank you
@DaveDave, please clarify your data format between each records such asEmployed | 1 is there any | in between actually?
|
1

Here is an example using an external chunk() function to chunk the split data elements into the correct size arrays which are then passed to an Employee() constructor function to turn them into objects.

Initial splitting is done on \n, \r and | and the returned array trimmed and filtered for empty strings.

const
  data =
    `0 | john | doe | 
  US | Employed 
  1| bob | dylan |
  US | Unemployed `,

  chunk = (arr, chunk) => {
    let i, res = [];
    for (i = 0; i < arr.length; i += chunk) {
      res.push(arr.slice(i, i + chunk));
    }
    return res
  },

  Employee = ([rowId, name, surname, country, status]) => ({ rowId, name, surname, country, status }),

  splitData = data
    .split(/[\r\n\|]/g)               //split by newlines/returns and pipe
    .map(s => s.trim())               // trim whitespace
    .filter(s => s !== ''),           // filter out empty strings

  chunkedData = chunk(splitData, 5),  // chunk into subarrays of length

  result = chunkedData.map(Employee); // map to object using constructor declared above

console.log(result)
.as-console-wrapper { max-height: 100% !important; top: 0; }

Alternatively, given that your structure is fixed you could use a regex to capture each row, the proceed with splitting by | and object mapping.

This works regardless of whether you have line breaks or not.

const
  data1 = `0 | john | doe | 
  US | Employed 
  1| bob | dylan |
  US | Unemployed `,

  data2 = "0 | john | doe | US | Employed  1 | bob | dylan |  US | Unemployed ",

  Employee = ([rowId, name, surname, country, status]) => ({ rowId, name, surname, country, status }),

  splitData = data => data
    .match(/(?:.+?[\s\n\r]*\|[\s\n\r]*){4}.+?[\s\n\r]+/gm)   // capture each row with regex
    .map(row => row.split('|').map(s => s.trim())),         // map each row, split and trim

  result1 = splitData(data1).map(Employee), // map to object using constructor declared above
  result2 = splitData(data2).map(Employee); // map to object using constructor declared above

console.log('Multiline: \n');
console.log(result1);

console.log('\nSingle line: \n');
console.log(result2);
.as-console-wrapper { max-height: 100% !important; top: 0; }

1 Comment

I can never understand regex without putting it in regex101! I do like the Employee constructor pattern. +1
0

New answer:

If rows are split by newline after "enough" columns (5), use a regex to split:

const data =
  `0 | john | doe | 
        US | Employed 
        1 | bob| dylan | 
        US | Unemployed `;

const asObjects = data.split(/((?:[^|]+\|){4}[^|]+)(?:\n)/)
  .filter(v => v) // remove empty strings
  .map(v => v.split('|').map(v => v.trim()))
  .map(row => ({
    rowId: row[0],
    name: row[1],
    surname: row[2]
  }))
console.log(asObjects)

Explanation: string.split() will include in the result any capture groups (parenthesis) but will omit non-capturing groups (?:x)
So keep the five columns and the "|" between them - "x|x|x|x|x" - (([^|]+\|){4}[^|]+)
But ignore the inner group to avoid a duplicate: ((?:[^|]+\|){4}[^|]+)
And ignore the newline separating - add a (?:\n)


Old answer - assumption doesn't apply: Assuming correct data, with every entry is in a new line and the fields in consistent order separated by " | ":

const asObjects = data.split('\n').map(rawRow => rawRow.split(' | ')).map(row => ({ rowId: row[0], name: row[1], surname: row[2] }))

2 Comments

every entry is not on a newline, the rows wrap (for some reason?)
Hi, you can also use SO code snippets. meta.stackoverflow.com/a/356679/14032355

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.