Download entire websites with wget
I don't have internet at home by choice so for me it's a necessity to have things accessible offline. Most webpages can be saved with ctrl-s from a bloated browser relatively nicely but there are some resources which are spread on multiple pages and don't have a download option. (I'm looking at you, ansible documentation) It's very convenient to mirror the whole website with wget in such cases. Depending on how bloated a website is, it might not look good when browsed localy but I found the problem can usually be alleviated by opening the files with w3m. Anyway, here's the command:
wget -c -r -k -np -N -e robots=off 'url'
If that fails because of ddos protection, you should use this instead:
wget -c -r -k -np -N -e robots=off --random-wait --wait 1 'url'
If that also fails, you can change the headers wget sends to match your browser but at that point I wouldn't want anything to do with such a website.
Both commands can be ran again to update the local mirror or to finish a
partial one so don't worry about stopping the process and starting again
later if it's taking too much time. I usually run the commands over tor
by simply adding torsocks -i
in front. Although I don't trust exit
nodes, it's better than nothing. I also encourage you to read the manual
and to abstain from running random commands you find on the internet :)