summaryrefslogtreecommitdiff
path: root/doc/cache.sgml
diff options
context:
space:
mode:
authorArch Librarian <arch@canonical.com>2004-09-20 16:50:36 +0000
committerArch Librarian <arch@canonical.com>2004-09-20 16:50:36 +0000
commit578bfd0aed2ec993f4ad85fa6a7094a852261422 (patch)
tree737f52267f6468a4b6754f8c6665824767352746 /doc/cache.sgml
Base revisions
Author: jgg Date: 1998-07-02 02:58:12 GMT Base revisions
Diffstat (limited to 'doc/cache.sgml')
-rw-r--r--doc/cache.sgml761
1 files changed, 761 insertions, 0 deletions
diff --git a/doc/cache.sgml b/doc/cache.sgml
new file mode 100644
index 000000000..43f8d4fff
--- /dev/null
+++ b/doc/cache.sgml
@@ -0,0 +1,761 @@
+<!doctype debiandoc system>
+<!-- -*- mode: sgml; mode: fold -*- -->
+<book>
+<title>APT Cache File Format</title>
+
+<author>Jason Gunthorpe <email>jgg@debian.org</email></author>
+<version>$Id: cache.sgml,v 1.1 1998/07/02 02:58:12 jgg Exp $</version>
+
+<abstract>
+This document describes the complete implementation and format of the APT
+Cache file. The APT Cache file is a way for APT to parse and store a
+large number of package files for display in the UI. It's primary design
+goal is to make display of a single package in the tree very fast by
+pre-linking important things like dependencies and provides.
+
+The specification doubles as documentation for one of the in-memory
+structures used by the package library and the APT GUI.
+
+</abstract>
+
+<copyright>
+Copyright &copy; Jason Gunthorpe, 1997.
+<p>
+APT and this document are free software; you can redistribute them and/or
+modify them under the terms of the GNU General Public License as published
+by the Free Software Foundation; either version 2 of the License, or (at your
+option) any later version.
+
+<p>
+For more details, on Debian GNU/Linux systems, see the file
+/usr/doc/copyright/GPL for the full license.
+</copyright>
+
+<toc sect>
+
+<chapt>Introduction
+<!-- Purpose {{{ -->
+<!-- ===================================================================== -->
+<sect>Purpose
+
+<p>
+This document describes the implementation of an architecture
+dependent binary cache file. The goal of this cache file is two fold,
+firstly to speed loading and processing of the package file array and
+secondly to reduce memory consumption of the package file array.
+
+<p>
+The implementation is aimed at an environment with many primary package
+files, for instance someone that has a Package file for their CD-ROM, a
+Package file for the latest version of the distribution on the CD-ROM and a
+package file for the development version. Always present is the information
+contained in the status file which might be considered a separate package
+file.
+
+<p>
+Please understand, this is designed as a -CACHE FILE- it is not ment to be
+used on any system other than the one it was created for. It is not ment to
+be authoritative either, ie if a system crash or software failure occures it
+must be perfectly acceptable for the cache file to be in an inconsistant
+state. Furthermore at any time the cache file may be erased without losing
+any information.
+
+<p>
+Also the structures and storage layout is optimized for use by the APT
+GUI and may not be suitable for all purposes. However it should be possible
+to extend it with associate cache files that contain other information.
+
+<p>
+To keep memory use down the cache file only contains often used fields and
+fields that are inexepensive to store, the Package file has a full list of
+fields. Also the client may assume that all items are perfectly valid and
+need not perform checks against their correctness. Removal of information
+from the cache is possible, but blanks will be left in the file, and
+unused strings will also be present. The recommended implementation is to
+simply rebuild the cache each time any of the data files change. It is
+possible to add a new package file to the cache without any negative side
+effects.
+
+<sect1>Note on Pointer access
+<p>
+Every item in every structure is stored as the index to that structure.
+What this means is that once the files is mmaped every data access has to
+go through a fixup stage to get a real memory pointer. This is done
+by taking the tndex, multiplying it by the type size and then adding
+it to the start address of the memory block. This sounds complex, but
+in C it is a single array dereference. Because all items are aligned to
+their size and indexs are stored as multiples of the size of the structure
+the format is immediately portable to all possible architectures - BUT the
+generated files are -NOT-.
+
+<p>
+This scheme allows code like this to be written:
+<example>
+ void *Map = mmap(...);
+ Package *PkgList = (Package *)Map;
+ Header *Head = (Header *)Map;
+ char *Strings = (char *)Map;
+ cout << (Strings + PkgList[Head->HashTable[0]]->Name) << endl;
+</example>
+<p>
+Notice the lack of casting or multiplication. The net result is to return
+the name of the first package in the first hash bucket, without error
+checks.
+
+<p>
+The generator uses allocation pools to group similarly sized structures in
+large blocks to eliminate any alignment overhead. The generator also
+assures that no structures overlap and all indexes are unique. Although
+at first glance it may seem like there is the potential for two structures
+to exist at the same point the generator never allows this to happen.
+(See the discussion of free space pools)
+ <!-- }}} -->
+
+<chapt>Structures
+<!-- Header {{{ -->
+<!-- ===================================================================== -->
+<sect>Header
+<p>
+This is the first item in the file.
+<example>
+ struct Header
+ {
+ // Signature information
+ unsigned long Signature;
+ short MajorVersion;
+ short MinorVersion;
+ bool Dirty;
+
+ // Size of structure values
+ unsigned short HeaderSz;
+ unsigned short PackageSz;
+ unsigned short PackageFileSz;
+ unsigned short VersionSz;
+ unsigned short DependencySz;
+ unsigned short ProvidesSz;
+
+ // Structure counts
+ unsigned long PackageCount;
+ unsigned long VersionCount;
+ unsigned long DependsCount;
+ unsigned long PackageFileCount;
+
+ // Offsets
+ unsigned long FileList; // PackageFile
+ unsigned long StringList; // StringItem
+
+ // Pool structures
+ unsigned long PoolStart[6];
+ unsigned long PoolSize[6];
+ unsigned long PoolAln[6];
+
+ // Package name lookup
+ unsigned long HashTable[512]; // Package
+ };
+</example>
+<taglist>
+<tag>Signature<item>
+This must contain the hex value 0x98FE76DC which is designed to verify
+that the system loading the image has the same byte order and byte size as
+the system saving the image
+
+<tag>MajorVersion
+<tag>MinorVersion<item>
+These contain the version of the cache file, currently 0.2.
+
+<tag>Dirty<item>
+Dirty is true if the cache file was opened for reading, the client expects
+to have written things to it and have not fully synced it. The file should
+be erased and rebuilt if it is true.
+
+<tag>HeaderSz
+<tag>PackageSz
+<tag>PackageFileSz
+<tag>VersionSz
+<tag>DependencySz
+<tag>ProvidesSz<item>
+*Sz contains the sizeof() that particular structure. It is used as an
+extra consistancy check on the structure of the file.
+
+If any of the size values do not exactly match what the client expects then
+the client should refuse the load the file.
+
+<tag>PackageCount
+<tag>VersionCount
+<tag>DependsCount
+<tag>PackageFileCount<item>
+These indicate the number of each structure contianed in the cache.
+PackageCount is especially usefull for generating user state structures.
+See Package::Id for more info.
+
+<tag>FileList<item>
+This contains the index of the first PackageFile structure. The PackageFile
+structures are singely linked lists that represent all package files that
+have been merged into the cache.
+
+<tag>StringList<item>
+This contains a list of all the unique strings (string item type strings) in
+the cache. The parser reads this list into memory so it can match strings
+against it.
+
+<tag>PoolStart
+<tag>PoolSize
+<tag>PoolAln<item>
+The Pool structures manage the allocation pools that the generator uses.
+Start indicates the first byte of the pool, Size is the number of bytes
+remaining in the pool and Aln (alignment) is the structure size of the pool.
+An Aln of 0 indicates the slot is empty. There should be the same number of
+slots as there are structure types. The generator stores this information
+so future additions can make use of any unused pool blocks.
+
+<tag>HashTable<item>
+HashTable is a hash table that provides indexing for all of the packages.
+Each package name is inserted into the hash table using the following has
+function:
+<example>
+ unsigned long Hash(string Str)
+ {
+ unsigned long Hash = 0;
+ for (const char *I = Str.begin(); I != Str.end(); I++)
+ Hash += *I * ((Str.end() - I + 1));
+ return Hash % _count(Head.HashTable);
+ }
+</example>
+<p>
+By iterating over each entry in the hash table it is possible to iterate over
+the entire list of packages. Hash Collisions are handled with a singely linked
+list of packages based at the hash item. The linked list contains only
+packages that macth the hashing function.
+
+</taglist>
+ <!-- }}} -->
+<!-- Package {{{ -->
+<!-- ===================================================================== -->
+<sect>Package
+<p>
+This contians information for a single unique package. There can be any
+number of versions of a given package. Package exists in a singly
+linked list of package records starting at the hash index of the name in
+the Header->HashTable.
+<example>
+ struct Pacakge
+ {
+ // Pointers
+ unsigned long Name; // Stringtable
+ unsigned long VersionList; // Version
+ unsigned long TargetVer; // Version
+ unsigned long CurrentVer; // Version
+ unsigned long TargetDist; // StringTable (StringItem)
+ unsigned long Section; // StringTable (StringItem)
+
+ // Linked lists
+ unsigned long NextPackage; // Package
+ unsigned long RevDepends; // Dependency
+ unsigned long ProvidesList; // Provides
+
+ // Install/Remove/Purge etc
+ unsigned char SelectedState; // What
+ unsigned char InstState; // Flags
+ unsigned char CurrentState; // State
+
+ // Unique ID for this pkg
+ unsigned short ID;
+ unsigned short Flags;
+ };
+</example>
+
+<taglist>
+<tag>Name<item>
+Name of the package.
+
+<tag>VersionList<item>
+Base of a singely linked list of version structures. Each structure
+represents a unique version of the package. The version structures
+contain links into PackageFile and the original text file as well as
+detailed infromation about the size and dependencies of the specific
+package. In this way multiple versions of a package can be cleanly handled
+by the system. Furthermore, this linked list is guarenteed to be sorted
+from Highest version to lowest version with no duplicate entries.
+
+<tag>TargetVer
+<tag>CurrentVer<item>
+This is an index (pointer) to the sub version that is being targeted for
+upgrading. CurrentVer is an index to the installed version, either can be
+0.
+
+<tag>TargetDist<item>
+This indicates the target distribution. Automatic upgrades should not go
+outside of the specified dist. If it is 0 then the global target dist should
+be used. The string should be contained in the StringItem list.
+
+<tag>Section<item>
+This indicates the deduced section. It should be "Unknown" or the section
+of the last parsed item.
+
+<tag>NextPackage<item>
+Next link in this hash item. This linked list is based at Header.HashTable
+and contains only packages with the same hash value.
+
+<tag>RevDepends<item>
+Reverse Depends is a linked list of all dependencies linked to this package.
+
+<tag>ProvidesList<item>
+This is a linked list of all provides for this package name.
+
+<tag>SelectedState
+<tag>InstState
+<tag>CurrentState<item>
+These corrispond to the 3 items in the Status field found in the status
+file. See the section on defines for the possible values.
+<p>
+SelectedState is the state that the user wishes the package to be
+in.
+<p>
+InstState is the installation state of the package. This normally
+should be Ok, but if the installation had an accident it may be otherwise.
+<p>
+CurrentState indicates if the package is installed, partially installed or
+not installed.
+
+<tag>ID<item>
+ID is a value from 0 to Header->PackageCount. It is a unique value assigned
+by the generator. This allows clients to create an array of size PackageCount
+and use it to store state information for the package map. For instance the
+status file emitter uses this to track which packages have been emitted
+already.
+
+<tag>Flags<item>
+Flags are some usefull indicators of the package's state.
+
+</taglist>
+
+ <!-- }}} -->
+<!-- PackageFile {{{ -->
+<!-- ===================================================================== -->
+<sect>PackageFile
+<p>
+This contians information for a single package file. Package files are
+referenced by Version structures. This is a singly linked list based from
+Header.FileList
+<example>
+ struct PackageFile
+ {
+ // Names
+ unsigned long FileName; // Stringtable
+ unsigned long Version; // Stringtable
+ unsigned long Distribution; // Stringtable
+ unsigned long Size;
+
+ // Linked list
+ unsigned long NextFile; // PackageFile
+ unsigned short ID;
+ unsigned short Flags;
+ time_t mtime; // Modification time
+ };
+</example>
+<taglist>
+
+<tag>FileName<item>
+Refers the the physical disk file that this PacakgeFile represents.
+
+<tag>Version<item>
+Version is the given version, ie 1.3.1, 2.4_revision_1 etc.
+
+<tag>Distribution<item>
+Distribution is the symbolic name for this PackageFile, hamm,bo,rexx etc
+
+<tag>Size<item>
+Size is provided as a simple check to ensure that the package file has not
+been altered.
+
+<tag>ID<item>
+See Package::ID.
+
+<tag>Flags<item>
+Provides some flags for the PackageFile, see the section on defines.
+
+<tag>mtime<item>
+Modification time for the file at time of cache generation.
+
+</taglist>
+
+ <!-- }}} -->
+<!-- Version {{{ -->
+<!-- ===================================================================== -->
+<sect>Version
+<p>
+This contians the information for a single version of a package. This is a
+singley linked list based from Package.Versionlist.
+
+<p>
+The version list is always sorted from highest version to lowest version by
+the generator. Also there may not be any duplicate entries in the list (same
+VerStr).
+
+<example>
+ struct Version
+ {
+ unsigned long VerStr; // Stringtable
+ unsigned long File; // PackageFile
+ unsigned long Section; // StringTable (StringItem)
+
+ // Lists
+ unsigned long NextVer; // Version
+ unsigned long DependsList; // Dependency
+ unsigned long ParentPkg; // Package
+ unsigned long ProvidesList; // Provides
+
+ unsigned long Offset;
+ unsigned long Size;
+ unsigned long InstalledSize;
+ unsigned short ID;
+ unsigned char Priority;
+ };
+</example>
+<taglist>
+
+<tag>VerStr<item>
+This is the complete version string.
+
+<tag>File<item>
+References the PackageFile that this version came out of. File can be used
+to determine what distribution the Version applies to. If File is 0 then
+this is a blank version. The structure should also have a 0 in all other
+fields excluding VerStr and Possibly NextVer.
+
+<tag>Section<item>
+This string indicates which section it is part of. The string should be
+contained in the StringItem list.
+
+<tag>NextVer<item>
+Next step in the linked list.
+
+<tag>DependsList<item>
+This is the base of the dependency list.
+
+<tag>ParentPkg<item>
+This links the version to the owning package, allowing reverse dependencies
+to determine the package.
+
+<tag>ProvidesList<item>
+Head of the linked list of Provides::NextPkgProv, forward provides.
+
+<tag>Offset<item>
+The byte offset of the first line of this item in the specified
+PackageFile
+
+<tag>Size
+<tag>InstalledSize<item>
+The archive size for this version. For debian this is the size of the .deb
+file. Installed size is the uncompressed size for this version
+
+<tag>ID<item>
+See Package::ID.
+
+<tag>Priority<item>
+This is the parsed priority value of the package.
+</taglist>
+
+ <!-- }}} -->
+<!-- Dependency {{{ -->
+<!-- ===================================================================== -->
+<sect>Dependency
+<p>
+Dependency contains the information for a single dependency record. The records
+are split up like this to ease processing by the client. The base of list
+linked list is Version.DependsList. All forms of dependencies are recorded
+here including Conflicts, Suggests and Recommends.
+
+<p>
+Multiple depends on the same package must be grouped together in
+the Dependency lists. Clients should assume this is always true.
+
+<example>
+ struct Dependency
+ {
+ unsigned long Version; // Stringtable
+ unsigned long Package; // Package
+ unsigned long NextDepends; // Dependency
+ unsigned long NextRevDepends; // Reverse dependency linking
+ unsigned long ParentVer; // Upwards parent version link
+
+ // Specific types of depends
+ unsigned char Type;
+ unsigned char CompareOp;
+ unsigned short ID;
+ };
+</example>
+<taglist>
+<tag>Version<item>
+The string form of the version that the dependency is applied against.
+
+<tag>Package<item>
+The index of the package file this depends applies to. If the package file
+does not already exist when the dependency is inserted a blank one (no
+version records) should be created.
+
+<tag>NextDepends<item>
+Linked list based off a Version structure of all the dependencies in that
+version.
+
+<tag>NextRevDepends<item>
+Reverse dependency linking, based off a Package structure. This linked list
+is a list of all packages that have a depends line for a given package.
+
+<tag>ParentVer<item>
+Parent version linking, allows the reverse dependency list to link
+back to the version and package that the dependency are for.
+
+<tag>Type<item>
+Describes weather it is depends, predepends, recommends, suggests, etc.
+
+<tag>CompareOp<item>
+Describes the comparison operator specified on the depends line. If the high
+bit is set then it is a logical or with the previous record.
+
+<tag>ID<item>
+See Package::ID.
+
+</taglist>
+
+ <!-- }}} -->
+<!-- Provides {{{ -->
+<!-- ===================================================================== -->
+<sect>Provides
+<p>
+Provides handles virtual packages. When a Provides: line is encountered
+a new provides record is added associating the package with a virtual
+package name. The provides structures are linked off the package structures.
+This simplifies the analysis of dependencies and other aspects A provides
+refers to a specific version of a specific package, not all versions need to
+provide that provides.
+
+<p>
+There is a linked list of provided package names started from each
+version that provides packages. This is the forwards provides mechanism.
+<example>
+ struct Provides
+ {
+ unsigned long ParentPkg; // Package
+ unsigned long Version; // Version
+ unsigned long ProvideVersion; // Stringtable
+ unsigned long NextProvides; // Provides
+ unsigned long NextPkgProv; // Provides
+ };
+</example>
+<taglist>
+<tag>ParentPkg<item>
+The index of the package that head of this linked list is in. ParentPkg->Name
+is the name of the provides.
+
+<tag>Version<item>
+The index of the version this provide line applies to.
+
+<tag>ProvideVersion<item>
+Each provides can specify a version in the provides line. This version allows
+dependencies to depend on specific versions of a Provides, as well as allowing
+Provides to override existing packages. This is experimental.
+
+<tag>NextProvides<item>
+Next link in the singly linked list of provides (based off package)
+
+<tag>NextPkgProv<item>
+Next link in the singly linked list of provides for 'Version'.
+
+</taglist>
+
+ <!-- }}} -->
+<!-- StringItem {{{ -->
+<!-- ===================================================================== -->
+<sect>StringItem
+<p>
+StringItem is used for generating single instances of strings. Some things
+like Section Name are are usefull to have as unique tags. It is part of
+a linked list based at Header::StringList.
+<example>
+ struct StringItem
+ {
+ unsigned long String; // Stringtable
+ unsigned long NextItem; // StringItem
+ };
+</example>
+<taglist>
+<tag>String<item>
+The string this refers to.
+
+<tag>NextItem<item>
+Next link in the chain.
+</taglist>
+ <!-- }}} -->
+<!-- StringTable {{{ -->
+<!-- ===================================================================== -->
+<sect>StringTable
+<p>
+All strings are simply inlined any place in the file that is natural for the
+writer. The client should make no assumptions about the positioning of
+strings. All stringtable values point to a byte offset from the start of the
+file that a null terminated string will begin.
+ <!-- }}} -->
+<!-- Defines {{{ -->
+<!-- ===================================================================== -->
+<sect>Defines
+<p>
+Several structures use variables to indicate things. Here is a list of all
+of them.
+
+<sect1>Definitions for Dependency::Type
+<p>
+<example>
+#define pkgDEP_Depends 1
+#define pkgDEP_PreDepends 2
+#define pkgDEP_Suggests 3
+#define pkgDEP_Recommends 4
+#define pkgDEP_Conflicts 5
+#define pkgDEP_Replaces 6
+</example>
+</sect1>
+
+<sect1>Definitions for Dependency::CompareOp
+<p>
+<example>
+#define pkgOP_OR 0x10
+#define pkgOP_LESSEQ 0x1
+#define pkgOP_GREATEREQ 0x2
+#define pkgOP_LESS 0x3
+#define pkgOP_GREATER 0x4
+#define pkgOP_EQUALS 0x5
+</example>
+The lower 4 bits are used to indicate what operator is being specified and
+the upper 4 bits are flags. pkgOP_OR indicates that the next package is
+or'd with the current package.
+</sect1>
+
+<sect1>Definitions for Package::SelectedState
+<p>
+<example>
+#define pkgSTATE_Unkown 0
+#define pkgSTATE_Install 1
+#define pkgSTATE_Hold 2
+#define pkgSTATE_DeInstall 3
+#define pkgSTATE_Purge 4
+</example>
+</sect1>
+
+<sect1>Definitions for Package::InstState
+<p>
+<example>
+#define pkgSTATE_Ok 0
+#define pkgSTATE_ReInstReq 1
+#define pkgSTATE_Hold 2
+#define pkgSTATE_HoldReInstReq 3
+</example>
+</sect1>
+
+<sect1>Definitions for Package::CurrentState
+<p>
+<example>
+#define pkgSTATE_NotInstalled 0
+#define pkgSTATE_UnPacked 1
+#define pkgSTATE_HalfConfigured 2
+#define pkgSTATE_UnInstalled 3
+#define pkgSTATE_HalfInstalled 4
+#define pkgSTATE_ConfigFiles 5
+#define pkgSTATE_Installed 6
+</example>
+</sect1>
+
+<sect1>Definitions for Package::Flags
+<p>
+<example>
+#define pkgFLAG_Auto (1 << 0)
+#define pkgFLAG_New (1 << 1)
+#define pkgFLAG_Obsolete (1 << 2)
+#define pkgFLAG_Essential (1 << 3)
+#define pkgFLAG_ImmediateConf (1 << 4)
+</example>
+</sect1>
+
+<sect1>Definitions for Version::Priority
+<p>
+Zero is used for unparsable or absent Priority fields.
+<example>
+#define pkgPRIO_Important 1
+#define pkgPRIO_Required 2
+#define pkgPRIO_Standard 3
+#define pkgPRIO_Optional 4
+#define pkgPRIO_Extra 5
+</example>
+</sect1>
+
+<sect1>Definitions for PackageFile::Flags
+<p>
+<example>
+#define pkgFLAG_NotSource (1 << 0)
+</example>
+</sect1>
+
+ <!-- }}} -->
+
+<chapt>Notes on the Generator
+<!-- Notes on the Generator {{{ -->
+<!-- ===================================================================== -->
+<p>
+The pkgCache::MergePackageFile function is currently the only generator of
+the cache file. It implements a conversion from the normal textual package
+file into the cache file.
+
+<p>
+The generator assumes any package declaration with a
+Status: line is a 'Status of the package' type of package declaration.
+A Package with a Target-Version field should also really have a status field.
+The processing of a Target-Version field can create a place-holder Version
+structure that is empty to refer to the specified version (See Version
+for info on what a empty Version looks like). The Target-Version syntax
+allows the specification of a specific version and a target distribution.
+
+<p>
+Different section names on different versions is supported, but I
+do not expect to use it. To simplify the GUI it will mearly use the section
+in the Package structure. This should be okay as I hope sections do not change
+much.
+
+<p>
+The generator goes through a number of post processing steps after producing
+a disk file. It sorts all of the version lists to be in descending order
+and then generates the reverse dependency lists for all of the packages.
+ID numbers and count values are also generated in the post processing step.
+
+<p>
+It is possible to extend many of the structures in the cache with extra data.
+This is done by using the ID member. ID will be a unique number from 0 to
+Header->??Count. For example
+<example>
+struct MyPkgData;
+MyPkgData *Data = new MyPkgData[Header->PackageCount];
+Data[Package->ID]->Item = 0;
+</example>
+This provides a one way reference between package structures and user data. To
+get a two way reference would require a member inside the MyPkgData structure.
+
+<p>
+The generators use of free space pools tend to make the package file quite
+large, and quite full of blank space. This could be fixed with sparse files.
+
+ <!-- }}} -->
+
+<chapt>Future Directions
+<!-- Future Directions {{{ -->
+<!-- ===================================================================== -->
+<p>
+Some good directions to take the cache file is into a cache directory that
+contains many associated caches that cache other important bits of
+information. (/var/cache/apt, FHS2)
+
+<p>
+Caching of the info/*.list is an excellent place to start, by generating all
+the list files into a tree structure and reverse linking them to the package
+structures in the main cache file major speed gains in dpkg might be achived.
+
+ <!-- }}} -->
+
+</book>