How to save websites offline with HTTrack

HTTrack exerts useful control over the mirroring process. When you select or create a new project and press 'Next', you'll see a button called 'Options' on the screen where you enter the site's URL. Click this and a multi-tabbed window will appear.

Perhaps the most immediately useful tab is Limits. Click this to see the sub-options. The 'Max mirroring depth' setting defines how deeply HTTrack follows links. A value of three, for example, means that it will mirror the page at the URL you entered, plus two levels of sub-pages. By default, it isn't filled in, meaning that the program will mirror as deeply as possible on the target website.

Step 5

Allied to this setting is the 'Maximum external depth', which performs the same limiting function, but for external data sources such as hosted pictures. Setting the maximum size for a non-HTML file is useful to prevent you downloading useless but large files, such as zip archives. As with the 'Maximum site size' setting, this is measured in bytes, not kilobytes.

There are also a couple of very useful overall limits you can set on this tab. The first is the maximum overall time HTTrack should take. This gives you the ability to leave the program to run for a period during which you know there'll be little other network traffic, so as not to interfere with other people's net use. The second parameter is the 'Maximum transfer rate'. By setting this, you effectively throttle back the overall amount of bandwidth that HTTrack uses at any one time.

The 'Maximum number of connections per second' setting exerts a more advanced form of control. When you begin the mirroring process, you'll see a number of downloads happening simultaneously. This setting limits the maximum number of parallel downloads from the site you're mirroring. The default is 10, but for slow sites a lower value is kinder on the overworked server.

Finally, the 'Maximum number of links' setting is the highest number of clickable links that HTTrack will analyse overall when crawling the site looking for elements to download. The default is 100,000, which should be fine for most sites. If you mirror very large websites and find that you're left with dead links, try adding an extra 5,000.

Go with the flow

The Flow Control tab also enables you to determine how HTTrack operates. You can specify the number of parallel connections the program can make to the target domain at any one time. It also enables you to specify how long to wait before retrying a connection to a site if it doesn't work at first, and how many times to retry before declaring the site dead.

The settings on this page are useful if a site being mirrored gets some of its content from a defunct or slow domain. You can use a combination of timeout and minimum transfer rate settings to cut out chronically slow information sources.

Rules

To prevent you from downloading files you don't want, HTTrack enables you to filter a site being mirrored by file type. This is especially useful if, say, you're only interested in the images or videos stored.

step 4

Under 'Options', select the Scan Rules tab. This uses a simple allow/deny language to specify content. The rule is that if there's a plus sign before an entry, HTTrack downloads it; if there's a minus sign, it doesn't.

Everything in HTTrack is a link to a file to be mirrored, so files are also called links. As an example, to exclude a particular file type, click 'Exclude links'. A pop-up will now appear. Select a filtering criterion from the list. In this case, let's try filtering out all files that end in '.mov', since these can be sizeable. Select the criterion 'Filenames with extension:' and in the String input box, enter .mov. Click 'Add' and the string '-*.mov' will appear in the list of filtering criteria.

For convenience, there are three lists of file types defined on this page – images, archives and movies. Each has a tickbox next to it. Click one, and the list of file types it filters will appear in the Filter list. Untick it and the entry will disappear again.

If you want to edit an entry in the Filter list, perhaps to change its inclusion rule from a plus to a minus, click it and scroll the cursor left and right. When you click 'OK', the changes will all be saved for later use.

Step 6

To read the site you've just downloaded, and even one whose download you've cancelled, click the button marked 'Browse mirrored website' that appears when HTTrack either finishes mirroring or after you click 'Cancel'. This runs your system's default browser and displays a specially constructed index of your mirrored sites.

At any other time, you can simply open Windows Explorer and navigate to the directory where HTTrack stores your websites. In this directory is a file called 'index.html'. Double-click this to open it and you'll see the list of downloaded sites.