How to save websites offline with HTTrack

How to save websites offline with HTTrack
If you want to have a copy of a website for offline browsing then HTTrack can help you out

Having universal Wi-Fi, 3G or wired internet access is great, but do we rely on connectivity too much?

What do you do when you know you'll need to get vital information in a place you can't go online, or if you know you're going to be hampered by a flaky connection?

The information will be out of reach until you can get back to somewhere with a decent internet connection.

The solution could be HTTrack. It temporarily replicates all or parts of a website on your PC so that the content will be available when there's no connection. It can also resume a download that was interrupted previously by a dodgy internet connection.

Website mirroring

The principle that we'll be using is called mirroring, which boils down to grabbing every file on a website in one sweeping motion. However, as we prepare to get stuck in and find out how it's done, there are a few social niceties that we need to observe.

Before you begin downloading sites in earnest, think about other people. Small websites may not use the most robust of hosting services, and you may swamp the webserver by bombarding it with requests for information.

Another problem is that some sites have monthly bandwidth limits. If your downloading activities exceed these limits, the site will become unavailable to everyone. So, in cases where you want to download someone's private website, you should ask if it's OK first.

The other thing to be aware of is that with huge amounts of free disk space, you could easily be tempted to download lots of sites on the off-chance that they may be useful. Try to resist this urge, because large websites can take a while to download, eat into free disk space and tie up bandwidth.

Some large, commercial sites ban mirroring programs from accessing them. While they may detect your subsequent browsing attempts based on the content of an easily deleted cookie, if the site you want to mirror requires you to log in with a username and password, you could find your account being banned.

While we'll show you how to alter the browser ID used by HTTrack to help get around this, it's always best to follow the rules and reduce the speed at which HTTrack requests information.

With those provisos in mind, let's begin.

Basic HTTrack

At the time of writing, the current version of HTTrack is 3.43-9. It runs on all the recent versions of Windows, and the Download page at the project site also has packages for a range of Linux distributions and Mac OS X.

For Windows use, download the version with an installer and run the executable. When the installation wizard appears, click 'Next'. Accept the licence agreement and click 'Next' again. Accept the installation directory and press 'Next' to accept the desktop shortcut before pressing 'Next' again. The resultant page confirms the installation choices you made. Click 'Next' and then 'Install'.

step 1

Once complete, click 'Finish' to run HTTrack. When the program appears, select your default language and click 'OK'. A wizard will pop up – this will guide you through the process of creating a new project, into which you'll download a website.

As with virtually all wizards, this process begins by pressing 'Next'. Enter a name for your project and a category. Later on, previous categories will be available from the associated dropdown menu. Click 'Next' to continue.

Now select an action to take regarding the URL by making sure the Action dropdown menu is set to 'Download web site(s)'. Choose the site Click 'Add URL' and an input box will pop up. Enter the URL of the website you want to mirror (without 'http://'). Also enter the username and password you'd normally use to access the site, if applicable.

step 2

You can provide a subpage for a site instead of just the domain name if you're only interested in a particular part. This is also a good way of getting a feel for HTTrack's operation without filling up all of your free disk space. Click 'Next' once again.

In the 'Remote connect' box, leave the dropdown menu on 'Do not use remote access connection'. This is a relic from the old dial-up days when you connected, downloaded information, then dropped the line again to reduce phone costs. Now click 'Finish' and the mirroring process will begin.

Depending on the complexity of the website, this process can take anywhere from under a minute to several hours. If your connection fails, or you have to cancel because of time or bandwidth issues, you'll be left with an incomplete site. Luckily, HTTrack can recover from this. If you cancel a download, you'll be sent back to the welcome screen.

Step 3

To resume mirroring, click 'Next', select the interrupted project from the dropdown list and click 'Next' again. The following screen shows that HTTrack knows the download was interrupted. The action selector is now set to 'Continue interrupted download'. Click 'Next' and then 'Finish'.