APT Files Jason Gunthorpe jgg@debian.org $Id: files.sgml,v 1.2 1998/07/12 02:11:09 jgg Exp $ This document describes the complete implementation and format of the installed APT directory structure. It also serves as guide to how APT views the Debian archive. Copyright © Jason Gunthorpe, 1998.

"APT" and this document are free software; you can redistribute them and/or modify them under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

For more details, on Debian GNU/Linux systems, see the file /usr/doc/copyright/GPL for the full license. Introduction General

This document serves two purposes. The first is to document the installed directory structure and the format and purpose of each file. The second purpose is to document how APT views the Debian archive and deals with multiple package files.

The var directory structure is as follows: /var/state/apt/ lists/ partial/ xstatus /var/cache/apt/ pkgcache.bin srcpkgcache.bin archives/ partial/ /etc/apt/ sources.list cdromdevs.list /usr/lib/apt/ methods/ cdrom ftp http

As is specified in the FHS 2.0 /var/state/apt is used for application data that is not expected to be user modified. /var/cache/apt is used for regeneratable data and is where the package cache and downloaded .debs go. Files Distribution Source list (sources.list)

The distribution source list is used to locate archives of the debian distribution. It is designed to support any number of active sources and to support a mix of source media. The file lists one source per line, with the fastest source listed first. The format of each line is:

type ui args

The first item, type, indicates the format for the remainder of the line. It is designed to indicate the structure of the distribution the line is talking about. Currently the only defined value is deb which indicates a standard debian archive with a dists dir. The deb Type

The deb type is to be a typical two level debian distributions, dist/distribution/component. Typically distribution is one of stable, unstable or frozen while component is one of main, contrib, non-free or non-us. The format for the deb line is as follows:

deb uri distribution compontent [component ...]

uri for the deb type must specify the base of the debian distribution. APT will automatically generate the proper longer URIs to get the information it needs. distribution can specify an exact path, in this case the components must be omitted and distribution must end in a slash.

Since only one distribution can be specified per deb line it may be necessary to list a number of deb lines for the same URI. APT will sort the URI list after it has generated a complete set to allow connection reuse. It is important to order things in the sourcelist from most prefered to least prefered (fastest to slowest). URI specification

URIs in the source list support a large number of access schemes. cdrom The cdrom scheme is special in that If Modifed Since queries are never performed and that APT knows how to match a cdrom to the name it was given when first inserted. It does this by examining the date and size of the package file. APT also knows all of the possible prefix paths for the cdrom drives and that the user should be prompted to insert a CD if it cannot be found. The path is relative to an arbitary mount point (of APT's choosing) and must not start with a slash. The first pathname component is the given name and is purely descriptive and of the users choice. However, if a file in the root of the cdrom is called 'cdname' its contents will be used instead of prompting. The name serves as a tag for the cdrom and should be unique. APT will track the CDROM's based on their tag and package file properties. cdrom:Debian 1.3/debian http This scheme specifies a HTTP server for the debian archive. HTTP is prefered over FTP because If Modified Since queries against the Package file are possible. Newer HTTP protcols may even support reget which would make http the protocol of choice. http://www.debian.org/archive ftp This scheme specifies a FTP connection to the server. FTP is limited because there is no support for IMS and is hard to proxy over firewalls. ftp://ftp.debian.org/debian file The file scheme allows an arbitary directory in the file system to be considered as a debian archive. This is usefull for NFS mounts and local mirrors/archives. file:/var/debian mirror The mirror scheme is special in that it does not specify the location of a debian archive but specifies the location of a list of mirrors to use to access the archive. Some technique will be used to determine the best choice for a mirror. The mirror file is specified in the Mirror File section. If/when URIs take off they should obsolete this field. mirror:http://www.debian.org/archivemirrors smb A possible future expansion may be to have direct support for smb (Samba servers). smb://ftp.kernel.org/pub/mirrors/debian Hashing the URI

All permanent information aquired from any of the sources is stored in the lists directory. Thus, there must be a way to relate the filename in the lists directory to a line in the sourcelist. To simplify things this is done by quoting the URI and treating _'s as quoteable characters and converting / to _. The URI spec says this is done by converting a sensitive character into %xx where xx is the hexadecimal representation from the ascii character set. Examples: http://www.debian.org/archive/dists/stable/binary-i386/Packages /var/state/apt/lists/www.debian.org_archive_dists_stable_binary-i386_Packages cdrom:Debian 1.3/debian/Packages /var/state/apt/info/Debian%201.3_debian_Packages

The other alternative that was considered was to use a deep directory structure but this poses two problems, it makes it very difficult to prune directories back when sources are no longer used and complicates the handling of the partial directory. This gives a very simple way to deal with all of the situations that can arise. The equals sign was choosen on the suggestion of Manoj because it is very infrequently used in filenames. Also note that the same rules described in the Archive Directory section regarding the partial sub dir apply here as well. Extra Status File (xstatus)

The extra status file serves the same purpose as the normal dpkg status file (/var/lib/dpkg/status) except that it stores information unique to diety. This includes the autoflag, target distribution and version and any other uniqe features that come up over time. It duplicates nothing from the normal dpkg status file. Please see other APT documentation for a discussion of the exact internal behavior of these fields. The Package field is placed directly before the new fields to indicate which package they apply to. The new fields are as follows: X-Auto The Auto flag can be Yes or No and controls whether the package is in auto mode. X-TargetDist The TargetDist item indicates which distribution versions are offered for installation from. It should be stable, unstable or frozen. X-TargetVersion The target version item is set if the user selects a specific version, it overrides the TargetDist selection if both are present. Binary Package Cache (pkgcache.bin)

Please see cache.sgml for a complete description of what this file is. The cache file is updated whenever the contents of the lists directory changes. If the cache is erased, corrupted or of a non-matching version it will be automatically rebuilt by all of the tools that need it. srcpkgcache.bin contains a cache of all of the package files in the source list. This allows regeneration of the cache when the status files change to use a prebuilt version for greater speed. Downloads Directory (archives)

The archives directory is where all downloaded .deb archives go. When the file transfer is initiated the deb is placed in partial. Once the file is fully downloaded and its MD5 hash and size are verifitied it is moved from partial into archives/. Any files found in archives/ can be assumed to be verified.

No dirctory structure is transfered from the receiving site and all .deb file names conform to debian conventions. No short (msdos) filename should be placed in archives. If the need arises .debs should be unpacked, scanned and renamed to their correct internal names. This is mostly to prevent file name conflicts but other programs may depend on this if convenient. Downloaded .debs must be found in one of the package lists with an exact name + version match.. The Methods Directory (/usr/lib/apt/methods)

Like dselect, APT will support plugable acquisition methods to complement its internaly supported methods. The files in this directory are execultables named after the URI type. APT will sort the required URIs and spawn these programs giving a full sorted, quoted list of URIs.

The interface is simple, the program will be given a list of URIs on the command line. The URIs will be a pairs of strings, the first being the actual URI and the second being the filename to write the data to. The current directory will be set properly by APT and it is expected the method will put files relative to the current directory. The output of these programs is strictly speficied. The programs must accept nothing from stdin (stdin will be an invalid fd) and they must output status information to stdout according to the format below. Stderr will be redirected to the logging facility.

Each line sent to stdout must be a line that has a single letter and a space. Strings after the first letter do not need quoting, they are taken as is till the end of the line. The tag letters, listed in expected order, is as follows: F - Change URI This specifies a change in URI. All information after this will be applied to the new URI. When the URI is changed it is assumed that the old URI has completed unless an error is set. The format is F URI S - Object Size This specifies the expected size of the object. APT will use this to compute percent done figures. If it is not sent then a kilobyte meter will be used instead of a percent display. The foramat is S INTEGER E - Error Information Exactly one line of error information can be set for each URI. The information will be summarized for the user. If an E tag is send before any F tags then the error is assumed to be a fatal method error and all URI fetches for that method are aborted with that error string. The format is E String I - Informative progress information The I tag allows the method to specify the status of the connection. Typically the GUI will show the last recieved I line. The format is I String As a general rule an I tag should be ommitted before a lengthy operation only. Things that always take a short period are not suited for I tags. I tags should change wnenever the methods state changes. Some standard forms, in order of occurance, are Connecting to SITE, Connecting to SITE (1.1.1.1), Waiting for file, Authenticating, Downloading, Resuming (size), Computing MD5 I lines should never print out information that APT is already aware of, such as file names. R - Set final path The R tag allows the method to tell APT that the file is present in the local file system. APT might copy it into a the download directory. The format is R String M - MD5Sum of the file The method is expected to compute the md5 hash on the fly as the download progresses. The final md5 of the file is to be output when the file is completed. If the md5 is not output it will not be checked! Some methods such as the file method will not check md5's because they are most commonly used on mirrors or local CD-ROM's, a paranoid option may be provided in future to force checking. The format is M MD5-String L - Log output This tag indicates a string that should be dumped to some log file. The string is for debugging and is not ment to be seen by the user. The format is L String Log things should only be used in a completed method if they have special relavence to what is happening.

APT monitors the progress of the transfer by watching the file size. This means the method must not create any temp files and must use a fairly small buffer. The method is also responsible for If-Modified-Since (IMS) queries for the object. It should check ../outputname to get the time stamp but not size. The size may be different because the file was uncompressed after it was transfed. A method must never change the file in .., it may only change the output file in the current directory.

The APT 'http' program is the reference implementation of this specification, it implements all of the features a method is expected to do. The Mirror List

The mirror list is stored on the primary debian web server (www.debian.org) and contains a machine readable list of all known debian mirrors. The mirror URI type will cause this list to be downloaded and considered. It has the same form as the source list. When the source list specifies mirror as the target the mirror list is scanned to find the nescessary parts for the requested distributions and components. This means the user could have a line like: deb mirror:http://www.debian.org/mirrorlist stable main non-us which would likely cause APT to choose two separate sites to download from, one for main and another for non-us.

Some form of network measurement will have to be used to gauge performance of each of the mirrors. This will be discussed later, initial versions will use the first found URI. The Release File

This file plays and important role in how APT presents the archive to the user. Its main purpose is to present a descriptive name for the source of each version of each package. It also is used to detect when new versions of debian are released. It augments the package file it is associated with by providing meta information about the entire archive which the Packages file describes.

The full name of the distribution for presentation to the user is formed as 'label version archive', with a possible extended name being 'label version archive component'.

The file is formed as the package file (RFC-822) with the following tags defined: Archive This is the common name we give our archives, such as stable or unstable. Component Referes to the sub-component of the archive, main, contrib etc. Version This is a version string with the same properties as in the Packages file. It represents the release level of the archive. Origin This specifies who is providing this archive. In the case of Debian the string will read 'Debian'. Other providers may use their own string Label This carries the encompassing name of the distribution. For Debian proper this field reads 'Debian'. For derived distributions it should contain their proper name. Architecture When the archive has packages for a single architecture then the Architecture is listed here. If a mixed set of systems are represented then this should contain the keyword mixed. NotAutomatic A Yes/No flag indicating that the archive is extremely unstable and its version's should never be automatically selected. This is to be used by experimental. Description Description is used to describe the release. For instance experimental would contain a warning that the packages have problems.

The location of the Release file in the archive is very important, it must be located in the same location as the packages file so that it can be located in all situations. The following is an example for the current stable release, 1.3.1r6 Archive: stable Compontent: main Version: 1.3.1r6 Origin: Debian Label: Debian Architecture: i386 This is an example of experimental, Archive: experimental Version: 0 Origin: Debian Label: Debian Architecture: mixed NotAutomatic: Yes And unstable, Archive: unstable Compontent: main Version: 2.1 Origin: Debian Label: Debian Architecture: i386