1

Using Jsoup, I am able to extract the most websites page source code (right click on webpage and choose "View Page Source"). But for any youtube video page, I am unable to extract page source Its not giving proper page source code. Tried the following coed but failed to extract.

public class App {
  public static void main(String[] args) throws IOException {

    String webUrl = "https://www.youtube.com/watch?v=Zu6o23Pu0Do";
    Document doc = Jsoup.connect(webUrl)
            .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36")
            .get();

    System.out.println(doc);

 }
}

Anybody can have any advice to fix this???

I am getting the output like the following:

sample output

5
  • Is the connection timing out? Are you getting an error? Commented Jan 2, 2020 at 13:17
  • no. there is not connection timed out. And no error. Just getting unusual data which is not in the original page. Commented Jan 3, 2020 at 15:07
  • I just ran your code in my IDE and it came back with the document. Check out my paste bin. Could you paste all of your code into one as well and append to your question? The image you posted is very hard to ready. - pastebin.com/QqY2Lp69 Commented Jan 3, 2020 at 15:25
  • i added my full code. and I am getting the output the followings. The url is here - pastebin.com/jRkiu3Mt Commented Jan 3, 2020 at 22:06
  • i am also facing the same..i am getting empty title while trying to fetch meta data for youtube pages.. @FunnyBoss Commented May 11, 2020 at 5:19

1 Answer 1

1

You're not setting a user agent which could be triggering anti scraping measures by the website. I'm going to assume the problem is your connection is timing out when you're running this. Try to use the following user agent and see if it works for you off of the connect().

.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36")

Sign up to request clarification or add additional context in comments.

1 Comment

i added that but still not working. I updated the output in my original post also.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.