+\1f
+File: tar.info, Node: Formats, Next: Media, Prev: Date input formats, Up: Top
+
+8 Controlling the Archive Format
+********************************
+
+Due to historical reasons, there are several formats of tar archives.
+All of them are based on the same principles, but have some subtle
+differences that often make them incompatible with each other.
+
+ GNU tar is able to create and handle archives in a variety of
+formats. The most frequently used formats are (in alphabetical order):
+
+gnu
+ Format used by GNU `tar' versions up to 1.13.25. This format
+ derived from an early POSIX standard, adding some improvements
+ such as sparse file handling and incremental archives.
+ Unfortunately these features were implemented in a way
+ incompatible with other archive formats.
+
+ Archives in `gnu' format are able to hold file names of unlimited
+ length.
+
+oldgnu
+ Format used by GNU `tar' of versions prior to 1.12.
+
+v7
+ Archive format, compatible with the V7 implementation of tar. This
+ format imposes a number of limitations. The most important of them
+ are:
+
+ 1. The maximum length of a file name is limited to 99 characters.
+
+ 2. The maximum length of a symbolic link is limited to 99
+ characters.
+
+ 3. It is impossible to store special files (block and character
+ devices, fifos etc.)
+
+ 4. Maximum value of user or group ID is limited to 2097151
+ (7777777 octal)
+
+ 5. V7 archives do not contain symbolic ownership information
+ (user and group name of the file owner).
+
+ This format has traditionally been used by Automake when producing
+ Makefiles. This practice will change in the future, in the
+ meantime, however this means that projects containing file names
+ more than 99 characters long will not be able to use GNU `tar'
+ 1.24 and Automake prior to 1.9.
+
+ustar
+ Archive format defined by POSIX.1-1988 specification. It stores
+ symbolic ownership information. It is also able to store special
+ files. However, it imposes several restrictions as well:
+
+ 1. The maximum length of a file name is limited to 256
+ characters, provided that the file name can be split at a
+ directory separator in two parts, first of them being at most
+ 155 bytes long. So, in most cases the maximum file name
+ length will be shorter than 256 characters.
+
+ 2. The maximum length of a symbolic link name is limited to 100
+ characters.
+
+ 3. Maximum size of a file the archive is able to accommodate is
+ 8GB
+
+ 4. Maximum value of UID/GID is 2097151.
+
+ 5. Maximum number of bits in device major and minor numbers is
+ 21.
+
+star
+ Format used by Jo"rg Schilling `star' implementation. GNU `tar'
+ is able to read `star' archives but currently does not produce
+ them.
+
+posix
+ Archive format defined by POSIX.1-2001 specification. This is the
+ most flexible and feature-rich format. It does not impose any
+ restrictions on file sizes or file name lengths. This format is
+ quite recent, so not all tar implementations are able to handle it
+ properly. However, this format is designed in such a way that any
+ tar implementation able to read `ustar' archives will be able to
+ read most `posix' archives as well, with the only exception that
+ any additional information (such as long file names etc.) will in
+ such case be extracted as plain text files along with the files it
+ refers to.
+
+ This archive format will be the default format for future versions
+ of GNU `tar'.
+
+
+ The following table summarizes the limitations of each of these
+formats:
+
+Format UID File Size File Name Devn
+--------------------------------------------------------------------
+gnu 1.8e19 Unlimited Unlimited 63
+oldgnu 1.8e19 Unlimited Unlimited 63
+v7 2097151 8GB 99 n/a
+ustar 2097151 8GB 256 21
+posix Unlimited Unlimited Unlimited Unlimited
+
+ The default format for GNU `tar' is defined at compilation time.
+You may check it by running `tar --help', and examining the last lines
+of its output. Usually, GNU `tar' is configured to create archives in
+`gnu' format, however, future version will switch to `posix'.
+
+* Menu:
+
+* Compression:: Using Less Space through Compression
+* Attributes:: Handling File Attributes
+* Portability:: Making `tar' Archives More Portable
+* cpio:: Comparison of `tar' and `cpio'
+
+\1f
+File: tar.info, Node: Compression, Next: Attributes, Up: Formats
+
+8.1 Using Less Space through Compression
+========================================
+
+* Menu:
+
+* gzip:: Creating and Reading Compressed Archives
+* sparse:: Archiving Sparse Files
+
+\1f
+File: tar.info, Node: gzip, Next: sparse, Up: Compression
+
+8.1.1 Creating and Reading Compressed Archives
+----------------------------------------------
+
+GNU `tar' is able to create and read compressed archives. It supports
+a wide variety of compression programs, namely: `gzip', `bzip2',
+`lzip', `lzma', `lzop', `xz' and traditional `compress'. The latter is
+supported mostly for backward compatibility, and we recommend against
+using it, because it is by far less effective than the other
+compression programs(1).
+
+ Creating a compressed archive is simple: you just specify a
+"compression option" along with the usual archive creation commands.
+The compression option is `-z' (`--gzip') to create a `gzip' compressed
+archive, `-j' (`--bzip2') to create a `bzip2' compressed archive,
+`--lzip' to create an lzip compressed archive, `-J' (`--xz') to create
+an XZ archive, `--lzma' to create an LZMA compressed archive, `--lzop'
+to create an LSOP archive, and `-Z' (`--compress') to use `compress'
+program. For example:
+
+ $ tar cfz archive.tar.gz .
+
+ You can also let GNU `tar' select the compression program based on
+the suffix of the archive file name. This is done using
+`--auto-compress' (`-a') command line option. For example, the
+following invocation will use `bzip2' for compression:
+
+ $ tar cfa archive.tar.bz2 .
+
+whereas the following one will use `lzma':
+
+ $ tar cfa archive.tar.lzma .
+
+ For a complete list of file name suffixes recognized by GNU `tar',
+see *note auto-compress::.
+
+ Reading compressed archive is even simpler: you don't need to specify
+any additional options as GNU `tar' recognizes its format
+automatically. Thus, the following commands will list and extract the
+archive created in previous example:
+
+ # List the compressed archive
+ $ tar tf archive.tar.gz
+ # Extract the compressed archive
+ $ tar xf archive.tar.gz
+
+ The format recognition algorithm is based on "signatures", a special
+byte sequences in the beginning of file, that are specific for certain
+compression formats. If this approach fails, `tar' falls back to using
+archive name suffix to determine its format (*note auto-compress::, for
+a list of recognized suffixes).
+
+ The only case when you have to specify a decompression option while
+reading the archive is when reading from a pipe or from a tape drive
+that does not support random access. However, in this case GNU `tar'
+will indicate which option you should use. For example:
+
+ $ cat archive.tar.gz | tar tf -
+ tar: Archive is compressed. Use -z option
+ tar: Error is not recoverable: exiting now
+
+ If you see such diagnostics, just add the suggested option to the
+invocation of GNU `tar':
+
+ $ cat archive.tar.gz | tar tfz -
+
+ Notice also, that there are several restrictions on operations on
+compressed archives. First of all, compressed archives cannot be
+modified, i.e., you cannot update (`--update', alias `-u') them or
+delete (`--delete') members from them or add (`--append', alias `-r')
+members to them. Likewise, you cannot append another `tar' archive to
+a compressed archive using `--concatenate' (`-A'). Secondly,
+multi-volume archives cannot be compressed.
+
+ The following options allow to select a particular compressor
+program:
+
+`-z'
+`--gzip'
+`--ungzip'
+ Filter the archive through `gzip'.
+
+`-J'
+`--xz'
+ Filter the archive through `xz'.
+
+`-j'
+`--bzip2'
+ Filter the archive through `bzip2'.
+
+`--lzip'
+ Filter the archive through `lzip'.
+
+`--lzma'
+ Filter the archive through `lzma'.
+
+`--lzop'
+ Filter the archive through `lzop'.
+
+`-Z'
+`--compress'
+`--uncompress'
+ Filter the archive through `compress'.
+
+ When any of these options is given, GNU `tar' searches the compressor
+binary in the current path and invokes it. The name of the compressor
+program is specified at compilation time using a corresponding
+`--with-COMPNAME' option to `configure', e.g. `--with-bzip2' to select
+a specific `bzip2' binary. *Note lbzip2::, for a detailed discussion.
+
+ The output produced by `tar --help' shows the actual compressor
+names along with each of these options.
+
+ You can use any of these options on physical devices (tape drives,
+etc.) and remote files as well as on normal files; data to or from such
+devices or remote files is reblocked by another copy of the `tar'
+program to enforce the specified (or default) record size. The default
+compression parameters are used. Most compression programs allow to
+override these by setting a program-specific environment variable. For
+example, when using `gzip' you can use `GZIP' as in the example below:
+
+ $ GZIP=--best tar cfz archive.tar.gz subdir
+
+Another way would be to use the `-I' option instead (see below), e.g.:
+
+ $ tar -cf archive.tar.gz -I 'gzip --best' subdir
+
+Finally, the third, traditional, way to achieve the same result is to
+use pipe:
+
+ $ tar cf - subdir | gzip --best -c - > archive.tar.gz
+
+ About corrupted compressed archives: compressed files have no
+redundancy, for maximum compression. The adaptive nature of the
+compression scheme means that the compression tables are implicitly
+spread all over the archive. If you lose a few blocks, the dynamic
+construction of the compression tables becomes unsynchronized, and there
+is little chance that you could recover later in the archive.
+
+ Another compression options provide a better control over creating
+compressed archives. These are:
+
+`--auto-compress'
+`-a'
+ Select a compression program to use by the archive file name
+ suffix. The following suffixes are recognized:
+
+ Suffix Compression program
+ --------------------------------------------------------------
+ `.gz' `gzip'
+ `.tgz' `gzip'
+ `.taz' `gzip'
+ `.Z' `compress'
+ `.taZ' `compress'
+ `.bz2' `bzip2'
+ `.tz2' `bzip2'
+ `.tbz2' `bzip2'
+ `.tbz' `bzip2'
+ `.lz' `lzip'
+ `.lzma' `lzma'
+ `.tlz' `lzma'
+ `.lzo' `lzop'
+ `.xz' `xz'
+
+`--use-compress-program=PROG'
+`-I=PROG'
+ Use external compression program PROG. Use this option if you are
+ not happy with the compression program associated with the suffix
+ at compile time or if you have a compression program that GNU `tar'
+ does not support. There are two requirements to which PROG should
+ comply:
+
+ First, when called without options, it should read data from
+ standard input, compress it and output it on standard output.
+
+ Secondly, if called with `-d' argument, it should do exactly the
+ opposite, i.e., read the compressed data from the standard input
+ and produce uncompressed data on the standard output.
+
+ The `--use-compress-program' option, in particular, lets you
+implement your own filters, not necessarily dealing with
+compression/decompression. For example, suppose you wish to implement
+PGP encryption on top of compression, using `gpg' (*note gpg:
+(gpg)Top.). The following script does that:
+
+ #! /bin/sh
+ case $1 in
+ -d) gpg --decrypt - | gzip -d -c;;
+ '') gzip -c | gpg -s;;
+ *) echo "Unknown option $1">&2; exit 1;;
+ esac
+
+ Suppose you name it `gpgz' and save it somewhere in your `PATH'.
+Then the following command will create a compressed archive signed with
+your private key:
+
+ $ tar -cf foo.tar.gpgz -Igpgz .
+
+Likewise, the command below will list its contents:
+
+ $ tar -tf foo.tar.gpgz -Igpgz .
+
+* Menu:
+
+* lbzip2:: Using lbzip2 with GNU `tar'.
+
+ ---------- Footnotes ----------
+
+ (1) It also had patent problems in the past.
+
+\1f
+File: tar.info, Node: lbzip2, Up: gzip
+
+8.1.1.1 Using lbzip2 with GNU `tar'.
+....................................
+
+`Lbzip2' is a multithreaded utility for handling `bzip2' compression,
+written by Laszlo Ersek. It makes use of multiple processors to speed
+up its operation and in general works considerably faster than `bzip2'.
+For a detailed description of `lbzip2' see
+`http://freshmeat.net/projects/lbzip2' and lbzip2: parallel bzip2
+utility
+(http://www.linuxinsight.com/lbzip2-parallel-bzip2-utility.html).
+
+ Recent versions of `lbzip2' are mostly command line compatible with
+`bzip2', which makes it possible to automatically invoke it via the
+`--bzip2' GNU `tar' command line option. To do so, GNU `tar' must be
+configured with the `--with-bzip2' command line option, like this:
+
+ $ ./configure --with-bzip2=lbzip2 [OTHER-OPTIONS]
+
+ Once configured and compiled this way, `tar --help' will show the
+following:
+
+ $ tar --help | grep -- --bzip2
+ -j, --bzip2 filter the archive through lbzip2
+
+which means that running `tar --bzip2' will invoke `lbzip2'.
+