7

I have a string (HTML content) and an array of position (index) objects. The string length is about 1.6 million characters and there are about 700 position objects.

ie:

var content = "<html><body><div class="c1">this is some text</div>...."
var positions = [{start: 20, end: 25}, {start: 35, end: 37}....]

I have to insert an opening span tag into every start position within the string and a close span tag into every end position within the string.

What is the most efficient way to do this?

So far I have tried sorting the positions array in reverse, then looping through and then using replace / splice to insert the tags, eg:

content = content.slice(0, endPosition) + "</span>" + content.substring(endPosition);
content = content.slice(0, startPosition) + "<span>" + content.slice(startPosition);

(Notice how I have started the loop from the end in order to avoid messing up the start/end positions).

But this takes about 3 seconds, which seems slow and inefficient to me.

What is a more efficient way to do this?

2
  • Position means line? or indices in the string? Commented Nov 13, 2018 at 10:20
  • position means index Commented Nov 13, 2018 at 10:21

4 Answers 4

4

Instead of modifying the big string each time, try accumulating processed "chunks" in a new buffer:

content = '0123456789'
positions = [
  [1, 3],
  [5, 7]
]

buf = []
lastPos = 0

for (let [s, e] of positions) {
  buf.push(
    content.slice(lastPos, s),
    '<SPAN>',
    content.slice(s, e),
    '</SPAN>'
  )
  lastPos = e
}

buf.push(content.slice(lastPos))


res = buf.join('')
console.log(res)

Sign up to request clarification or add additional context in comments.

3 Comments

I think you have to reverse the positions array and the positions loop otherwise adding the span tags will shift the content positions and they will no longer be in the correct positions
@joshuamiller: I don't think so, the original string remains unchanged, no shifting
true, missed that part
1

We can split content by chars into array, than did one loop to insert <span> </span> and than join back to string

var content = '<html><body><div class="c1">this is some text</div>....';
var positions = [{start: 20, end: 25}, {start: 35, end: 37}];
var arr = content.split('');

var arrPositions = {
  starts: positions.map(_ => _.start),
  ends: positions.map(_ => _.end)
}

var result = arr.map((char, i) => {
  if (arrPositions.starts.indexOf(i) > -1) {
    return '<span>' + char;
  }
  if (arrPositions.ends.indexOf(i) > -1) {
    return '</span>' + char;
  }
  return char
}).join('')

console.log(result)

Comments

1

You can do:

const content = '<div class="c1">It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using Content here, content here, making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for lorem ipsum will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like).</div>';
const positions = [{start: 24,end: 40}, {start: 160,end: 202}];
const result = positions
  .reduce((a, c, i, loopArray) => {
    a.array.push(
      content.slice(a.lastPosition, c.start), '<span class="blue">', content.slice(c.start, c.end), '</span>'
    );
    
    a.lastPosition = c.end;
    
    if (loopArray.length === ++i) {
      a.array.push(content.slice(a.lastPosition));
    }
    
    return a;
  }, {array: [], lastPosition: 0})
  .array
  .join('');

document.write(result);
.blue {color: blue;}

Comments

1

You can do this :

const content = 'this is some text. this is some text. this is some text. this is some text. this is some text. this is some text. this is some text. this is some text. ';
const positions = [{start: 20, end: 26}, {start: 35, end: 37}];

// using Set will help in reducing duplicate position elements.
let starts = new Set();
let ends = new Set();

const START_TAG = '<span>';
const END_TAG = '</span>';

const string_length = content.length;

positions.forEach(function(position) {
   let _start = position.start;
   let _end = position.end;

   // check whether index positions are in-bound.
   if(_start > -1 && _start < string_length) starts.add(_start);
   if(_end > -1 && _end < string_length) ends.add(_end);
});

updated_string = content;

starts.forEach(function(position) {
  updated_string = updated_string.substr(0, position) + START_TAG + updated_string.substr(position);
});

ends.forEach(function(position) {
  updated_string = updated_string.substr(0, position) + END_TAG + updated_string.substr(position);
});

console.log(updated_string);

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.