0

I scrape sites for a database with a chrome extension, need assitance with a JavaScript Clean up function

e.g

https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p

my target output is:

_60789694386.html

everything past .html needs to be removed, but since it is diffrent in each URL - i'm lost

the output is in a .csv file, in which i run a JavaScript to clean up the data.

   this.values[8] = this.values[8].replace("https://www.alibaba.com/product-detail/","");

this.values[8] is how i target the column in the script. (Column 8 holds the URL)

2

7 Answers 7

3

Well, you can use split.

var final = this.values[8].split('.html')[0]

split gives you an array of items split by a string, in your case'.html', then you take the first one.

Sign up to request clarification or add additional context in comments.

Comments

1

Consider using substr

this.values[8] = this.values[8].substr(0,this.values[8].indexOf('?'))

Comments

0

You can use split method to divide text from ? as in example.

var link = "https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p"
var result = link.split('?')[0].replace("https://www.alibaba.com/product-detail/","");
console.log(result);

Comments

0

Not sure i understood your problem, but try this

var s = 'https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p'
s = s.substring(0, s.indexOf('?'));
console.log( s );

Comments

0

For when you don't care about readability...

this.values[8] = new URL(this.values[8]).pathname.split("/").pop().replace(".html","");

Comments

0

Alternate, without using split

var link = "https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p"
var result = link.replace('https://www.alibaba.com/product-detail/', '').replace(/\?.*$/, '');
console.log(result);

Comments

0

You can use the regex to get it done. As of my knowledge you do something like:

    var v = "https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p"
    result = (v.match(/[^\/]+$/)[0]);
    result = result.substring(0,result.indexOf('?'));
    console.log(result);    // will return _60789694386.html

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.