How to archive files with Tar in the Linux terminal

Most desktop environments can display the contents of archives. Here is the result of double-clicking a tarball in LXDE (the preferred desktop for Raspberry Pi).

Page 1 of 3:

Understanding Tar

Most people are probably pretty familiar with creating, sending or receiving Zip files. Zip takes a collection of files and stores them in a Zip archive file, compressing the data in the process.

As well as storing the contents of the files, it also stores all their metadata, which is extra information associated with an object. In the case of files, it includes the modification times, their owners and permissions and, of course, the name of each file.

Understanding Tar

The standard archival program for Unix-like operating systems including Linux and Mac OS X is Tar, so called because Tar was originally used to store backups on tape drives (Tape ARchive).

It works in a different way from Zip because it sends all of the archived data to its standard output and it doesn't compress the data by default because many tape drives already had hardware compression built in.

The lack of compression code may seem like a disadvantage, but it's actually a convenience. As Tar is able to pipe its data via an external compression program, it can use any compressor it likes – even one that wasn't in existence when the Tar program was developed.

Compression programs work on one file or stream of data and produce one compressed file or stream, so this splits the job into two parts: archival and compression. While this may seem more complex, Tar is perfectly capable of handling the details itself.

Let's say we have a directory that is called foo. We want to create an archive of it, which is often referred to as a tarball. We can we can do one of these options:

tar cf foo.tar foo

tar czf foo.tar.gz foo

tar cjf foo.tar.bz2 foo

tar cJf foo.tar.xz foo

The c option tells the Tar program that we are creating an archive, while f tells it that we are storing the archive in a file using the given name. Therefore, the first command creates an uncompressed archive that is called foo.tar.

The subsequent commands add an extra option that tells Tar which particular type of compression to use: z uses gzip compression, j uses bzip2 compression and J uses xz compression. (Watch the capitalisation!)

Tar arguments and options

There are also long versions of these arguments that make the commands more readable, but most of us are lazy and use the version that is shorter to type. However, we could also have used this command line if we wanted to:

tar --create --gzip --file foo.tar.gz foo

The file extension is not required, but it's a convention that makes it easier for people to see exactly what type of archive it is – the system itself needs no such help as it can work all this out on its own. Unpacking an archive is simply a matter of replacing c with x, or --create with --extract. However, you don't need to give the compression type, as Tar figures it out:

tar xf foo.tar.gz

Another option you may want to add is v or --verbose, to show you what Tar is doing.

If you have been given a tarball, you may want to see what is inside it without unpacking it. If you have created an archive, particularly a backup, you may want to check it's correct before relying on it. The test option checks the integrity and lists the contents of the archive.

tar tvf foo.tar.gz

Those are the main Tar options, but the program has many more, such as A or --concatenate to add files to an existing archive instead of creating a new one.

Future proofing

We mentioned that Tar can handle any new compression format that comes along because it passes compression to another program. There are command line options to do this automatically for gzip, bzip2 and xz, but what if someone comes up with a new compressor?

Say something like sdc - super-duper compressor? You could create an uncompressed tarball and then use sdc to compress it, but that's wasteful and slow. Instead use a pipe:

tar c foo | sdc >foo.tar.sdc

unsdc foo.tar.sdc | tar xv

Here, we use only the --create option with Tar. The lack of a destination causes Tar to send the archive data to standard output, which is then piped to the sdc compressor program. The second command reverses the process, decompressing the archive and sending it to Tar for extraction.

This article was provided to TechRadar by Linux Format, the number one magazine to boost your knowledge on Linux, open source developments, distro releases and much more. Subscribe to the print or digital version of Linux Format here

Current page: Understanding Tar

Next Page Compression types

TOPICS