wget and curl saving web page as gibberish (encrypted?)

Question

When I download https://www.wired.com/category/security/ using either wget or curl, the result is gibberish/encrypted.

Is it possible (and if so what is the correct way) to save that web page (unencrypted / plain HTML) from the command line?

JB0x2D1 · Accepted Answer · 2017-06-09 10:47:07Z

7

Executive summary:

It seems like the downloaded file is compressed and you should decompress it.

Detailed answer

Running:

wget https://www.wired.com/category/security/

Result with a downloaded index.html file

Executing file command on the download file shows:

$ file index.html 
index.html: gzip compressed data, from Unix

Renaming the file and decompressing it turn it to be HTML document

$ mv index.html index.html.gz
$ gunzip index.html.gz 
$ file index.html

index.html: HTML document, UTF-8 Unicode text, with very long lines, with overstriking

Extra Info - why wget downloaded a compressed file?

Instead of downloading a large text file, modern HTTP server/clients uses Compressed HTTP Response which reduce the size of the transfered files.

JB0x2D1

2931 gold badge3 silver badges12 bronze badges

answered Jun 8, 2017 at 13:25

Yaron

4,3792 gold badges22 silver badges33 bronze badges

1 Answer 1