4

When I download https://www.wired.com/category/security/ using either wget or curl, the result is gibberish/encrypted.

Is it possible (and if so what is the correct way) to save that web page (unencrypted / plain HTML) from the command line?

1 Answer 1

7

Executive summary:

It seems like the downloaded file is compressed and you should decompress it.

Detailed answer

Running:

wget https://www.wired.com/category/security/

Result with a downloaded index.html file

Executing file command on the download file shows:

$ file index.html 
index.html: gzip compressed data, from Unix

Renaming the file and decompressing it turn it to be HTML document

$ mv index.html index.html.gz
$ gunzip index.html.gz 
$ file index.html 

index.html: HTML document, UTF-8 Unicode text, with very long lines, with overstriking

Extra Info - why wget downloaded a compressed file?

As explained in How To Optimize Your Site With GZIP Compression:

Instead of downloading a large text file, modern HTTP server/clients uses Compressed HTTP Response which reduce the size of the transfered files.

3
  • why would this page save compressed? Commented Jun 8, 2017 at 14:03
  • @JB0x2D1 - updated my answer Commented Jun 8, 2017 at 14:16
  • For me, wget saves plain HTML for the given URI. It is decompressed after transport. Do you possibly have some extra wget option set that causes it to save raw data? Commented Jun 9, 2017 at 10:51

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.