summaryrefslogtreecommitdiff
path: root/apt-pkg/pkgcachegen.h
AgeCommit message (Collapse)Author
2020-12-17Do not require libxxhash-dev for including pkgcachegen.hJulian Andres Klode
2020-12-15Use XXH3 for cache, hash table hashingJulian Andres Klode
XXH3 is faster than both our CRC32c implementation as well as DJB hash for hash table hashing, so meh, let's switch to it.
2020-02-24Make map_pointer<T> typesafeJulian Andres Klode
Instead of just using uint32_t, which would allow you to assign e.g. a map_pointer<Version> to a map_pointer<Package>, use our own smarter struct that has strict type checking. We allow creating a map_pointer from a nullptr, and we allow comparing map_pointer to nullptr, which also deals with comparisons against 0 which are often used, as 0 will be implictly converted to nullptr.
2020-02-24Wrap AllocateInMap with a templated versionJulian Andres Klode
2020-02-24Replace map_pointer_t with map_pointer<T>Julian Andres Klode
This is a first step to a type safe cache, adding typing information everywhere. Next, we'll replace map_pointer<T> implementation with a type safe one.
2020-02-18Use a 32-bit djb VersionHash instead of CRC-16Julian Andres Klode
2020-01-14Remove includes of (md5|sha1|sha2).h headersJulian Andres Klode
Remove it everywhere, except where it is still needed.
2020-01-08Avoid extra out-of-cache hash table deduplication for package namesJulian Andres Klode
We were de-duplicating package name strings in StoreString, but also deduplicating most of them by them being in groups, so we had extra hash table lookups that could be avoided in NewGroup(). To continue deduplicating names across binary packages and source packages, insert groups for source packages as well. This is also a good first step in allowing efficient lookup of packages by source package - we can extend Group later by a list of SourceVersion objects, or alternatively, simply add a by-source chain into pkgCache::Version. This change improves performance by about 10% (913 to 814 ms), while having no significant overhead on the cache size: --- before +++ after @@ -1,7 +1,7 @@ -Total package names: 109536 (2.191 k) -Total package structures: 118689 (4.748 k) +Total package names: 119642 (2.393 k) +Total package structures: 118687 (4.747 k) Normal packages: 83309 - Pure virtual packages: 3365 + Pure virtual packages: 3363 Single virtual packages: 17811 Mixed virtual packages: 1973 Missing: 12231 @@ -10,21 +10,21 @@ Total distinct descriptions: 149291 (3.583 k) Total dependencies: 484135/156650 (12,2 M) Total ver/file relations: 57421 (1.378 k) Total Desc/File relations: 18219 (437 k) -Total Provides mappings: 29963 (719 k) +Total Provides mappings: 29959 (719 k) Total globbed strings: 226993 (5.332 k) Total slack space: 26,8 k -Total space accounted for: 38,1 M +Total space accounted for: 38,3 M Total buckets in PkgHashTable: 50503 - Unused: 5727 - Used: 44776 - Utilization: 88.6601% - Average entries: 2.65073 + Unused: 5728 + Used: 44775 + Utilization: 88.6581% + Average entries: 2.65074 Longest: 60 Shortest: 1 Total buckets in GrpHashTable: 50503 - Unused: 5727 - Used: 44776 - Utilization: 88.6601% - Average entries: 2.44631 - Longest: 10 + Unused: 4649 + Used: 45854 + Utilization: 90.7946% + Average entries: 2.60919 + Longest: 11 Shortest: 1
2019-06-11Make APT::StringView publicJulian Andres Klode
2019-02-26pkgcachegen: Remove deprecated functionsJulian Andres Klode
2017-07-12Reformat and sort all includes with clang-formatJulian Andres Klode
This makes it easier to see which headers includes what. The changes were done by running git grep -l '#\s*include' \ | grep -E '.(cc|h)$' \ | xargs sed -i -E 's/(^\s*)#(\s*)include/\1#\2 include/' To modify all include lines by adding a space, and then running ./git-clang-format.sh.
2017-07-12Drop cacheiterators.h includeJulian Andres Klode
Including cacheiterators.h before pkgcache.h fails because pkgcache.h depends on cacheiterators.h.
2016-11-22Do not use MD5SumValue for Description_md5()Julian Andres Klode
Our profile says we spend about 5% of the time transforming the hex digits into the binary format used by HashsumValue, all for comparing them against the other strings. That makes no sense at all. According to callgrind, this reduces the overall instruction count from 5,3 billion to 5 billion in my example, which roughly matches the 5%.
2016-01-27deal better with (very) small apt::cache-start valuesDavid Kalnischkies
It is a bit academic to support values which aren't big enough to fit even the hashtables without resizing, but cleaning up ensures that we do the right thing (aka not segfaulting) even if something goes wrong in these deep layers. You still can't have very very small values through… Git-Dch: Ignore
2016-01-26convert Version() and Architecture() to APT::StringViewDavid Kalnischkies
Part of hidden classes, so conversion is abi-free. Git-Dch: Ignore
2016-01-26remove unused Description methods in listparsersDavid Kalnischkies
These virtual methods are implemented in hidden classes, so we can drop them without breaking the ABI. Git-Dch: Ignore
2016-01-26Delete copy constructor and operator= for DynamicJulian Andres Klode
This would mess up reference counting and should not be allowed (it could be implemented correctly, but it would not be efficient and we do not need it). Gbp-Dch: ignore
2016-01-23Pass the old map size to ReMap()Julian Andres Klode
This allows us to check if a value to be remapped was inside the cache or not, which will become useful at a later point. Gbp-Dch: ignore
2016-01-08pkgCacheGenerator::hash: Do not call tolower_ascii()Julian Andres Klode
Gbp-Dch: ignore
2016-01-08pkgCacheGenerator::StoreString: Get rid of std::stringJulian Andres Klode
Instead of storing a string -> map_stringitem_t mapping, create our own data type that can point to either a normal string or a string inside the cache. This avoids the creation of any string and improves performance slightly (about 4%).
2016-01-08pkgCacheGenerator: Use StringView for toStringJulian Andres Klode
This removes some minor overhead. Gbp-Dch: ignore
2016-01-07Switch performance critical code to use APT::StringViewJulian Andres Klode
This improves performance of the cache generation on my ARM platform (4x Cortex A15) by about 10% to 20% from 2.35-2.50 to 2.1 seconds.
2015-12-29pkgCacheGenerator: Allow passing down an already created cacheJulian Andres Klode
If we already have opened a cache, there is no point in having to open it again.
2015-12-27pkgcachegen.h: Hack around unordered_map not existing before C++11Julian Andres Klode
This is for public users only, which cannot use the class at all, except for the static methods.
2015-12-27pkgcachegen: Use std::unordered_map instead of std::mapJulian Andres Klode
std::unordered_map is faster than std::map in our use case, reducing cache generation time by about 10% in my benchmark.
2015-11-27add messages to our deprecation warnings in libaptDavid Kalnischkies
Git-Dch: Ignore
2015-08-11Drop C++11 elements from headersJulian Andres Klode
2015-08-10eliminate dead file-provides code in cache generationDavid Kalnischkies
The code was never active in production, it just sits there collecting dust and given that it is never tested probably doesn't even work anymore the way it was supposed to be (whatever that was exactly in the first place). So just remove it before I have to "fix" it again next time. Git-Dch: Ignore
2015-08-10elimate duplicated code in pkgIndexFile subclassesDavid Kalnischkies
Trade deduplication of code for a bunch of new virtuals, so it is actually visible how the different indexes behave cleaning up the interface at large in the process. Git-Dch: Ignore
2015-08-10add volatile sources support in libapt-pkgDavid Kalnischkies
Sources are usually defined in sources.list (and co) and are pretty stable, but once in a while a frontend might want to add an additional "source" like a local .deb file to install this package (No support for 'real' sources being added this way as this is a multistep process). We had a hack in place to allow apt-get and apt to pull this of for a short while now, but other frontends are either left in the cold by this and/or the code for it looks dirty with FIXMEs plastering it and has on top of this also some problems (like including these 'volatile' sources in the srcpkgcache.bin file). So the biggest part in this commit is actually the rewrite of the cache generation as it is now potentially a three step process. The biggest problem with adding support now through is that this makes a bunch of previously mostly unusable by externs and therefore hidden classes public, so a bit of further tuneing on this now public API is in order…
2015-08-10just-in-time creation for (explicit) negative depsDavid Kalnischkies
Now that we deal with provides in a more dynamic fashion the last remaining problem is explicit dependencies like 'Conflicts: foo' which have to apply to all architectures, but creating them all at the same time requires us to know all architectures ending up in the cache which isn't needed to be the same set as all foreign architectures. The effect is visible already now through as this prevents the creation of a bunch of virtual packages for arch:all packages and as such also many dependencies, just not very visible if you don't look at the stats… Git-Dch Ignore
2015-08-10just-in-time creation for (implicit) ProvidesDavid Kalnischkies
Expecting the worst is easy to code, but has its disadvantages e.g. by creating package structures which otherwise would have never existed. By creating the provides instead at the time a package structure is added we are well prepared for the introduction of partial architectures, massive amounts of M-A:foreign (and :allowed) and co as far as provides are concerned at least. We have something relatively similar for dependencies already. Many tests are added for both M-A states and the code cleaned to properly support implicit provides for foreign architectures and architectures we 'just' happen to parse. Git-Dch: Ignore
2015-08-10hide implicit deps in apt-cache again by defaultDavid Kalnischkies
Before MultiArch implicits weren't a thing, so they were hidden by default by definition. Adding them for MultiArch solved many problems, but having no reliable way of detecting which dependency (and provides) is implicit or not causes problems everytime we want to output dependencies without confusing our observers with unneeded implementation details. The really notworthy point here is actually that we keep now a better record of how a dependency came to be so that we can later reason about it more easily, but that is hidden so deep down in the library internals that change is more the problems it solves than the change itself.
2015-08-10make all d-pointer * const pointersDavid Kalnischkies
Doing this disables the implicit copy assignment operator (among others) which would cause hovac if used on the classes as it would just copy the pointer, not the data the d-pointer points to. For most of the classes we don't need a copy assignment operator anyway and in many classes it was broken before as many contain a pointer of some sort. Only for our Cacheset Container interfaces we define an explicit copy assignment operator which could later be implemented to copy the data from one d-pointer to the other if we need it. Git-Dch: Ignore
2015-08-10apply various style suggestions by cppcheckDavid Kalnischkies
Some of them modify the ABI, but given that we prepare a big one already, these few hardly count for much. Git-Dch: Ignore
2015-06-16add d-pointer, virtual destructors and de-inline de/constructorsDavid Kalnischkies
To have a chance to keep the ABI for a while we need all three to team up. One of them missing and we might loose, so ensuring that they are available is a very tedious but needed task once in a while. Git-Dch: Ignore
2015-06-15populate the Architecture field for PackageFilesDavid Kalnischkies
This is mainly visible in the policy, so that you can now pin by b= and let it only effect Packages files of this architecture and hence the packages coming from it (which do not need to be from this architecture, but very likely are in a normal repository setup). If you should pin by architecture in this way is a different question… Closes: 687255
2015-06-12store Release files data in the CacheDavid Kalnischkies
We used to read the Release file for each Packages file and store the data in the PackageFile struct even through potentially many Packages (and Translation-*) files could use the same data. The point of the exercise isn't the duplicated data through. Having the Release files as first-class citizens in the Cache allows us to properly track their state as well as allows us to use the information also for files which aren't in the cache, but where we know to which Release file they belong (Sources are an example for this). This modifies the pkgCache structs, especially the PackagesFile struct which depending on how libapt users access the data in these structs can mean huge breakage or no visible change. As a single data point: aptitude seems to be fine with this. Even if there is breakage it is trivial to fix in a backportable way while avoiding breakage for everyone would be a huge pain for us. Note that not all PackageFile structs have a corresponding ReleaseFile. In particular the dpkg/status file as well as *.deb files have not. As these have only a Archive property need, the Component property takes over this duty and the ReleaseFile remains zero. This is also the reason why it isn't needed nor particularily recommended to change from PackagesFile to ReleaseFile blindly. Sticking with the earlier is usually the better option.
2014-11-08mark internal interfaces as hiddenDavid Kalnischkies
We have a bunch of classes which are of no use for the outside world, but were still exported and so needed to preserve ABI/API. Marking them as hidden to not export them any longer is a big API break in theory, but in practice nobody is using them – as if they would its a bug.
2014-11-08use a abi version check similar to the gcc checkDavid Kalnischkies
Git-Dch: Ignore
2014-10-03rename StringType VERSION to VERSIONNUMBERDavid Kalnischkies
aptitude has a define for VERSION, so to not generate a FTBFS we just rename our enum element to a slightly less generic name. Git-Dch: Ignore
2014-09-27fix: Member variable 'X' is not initialized in the constructor.David Kalnischkies
Reported-By: cppcheck Git-Dch: Ignore
2014-09-27drop stored StringItems in favor of in-memory mappingsDavid Kalnischkies
Strings like Section names or architectures are needed vary often. Instead of writing them each time we need them, we deploy sharing for these special strings. Until now, this was done with a linked list of strings in which we would search, which was stored in the cache. It turns out we can do this just as well in memory as well with a bunch of std::map's. In memory means here that it isn't available anymore if we have a partly invalid cache, but that isn't much of a problem in practice as the status file is compared to the other files we parse very small and includes mostly duplicates, so the space we would gain by storing is more or less equal to the size of the stored linked list…
2014-06-18cleanup datatypes mix used in binary cacheDavid Kalnischkies
We had a wild mixture of (unsigned) int, long and long long here without much sense, so this commit adds a few typedefs to get some sense in the typesystem and ensures that a ID isn't sometimes computed as int, stored as long and compared with a long long… as this could potentially bite us later on as the size of the archive only increases over time.
2014-05-09parse and retrieve multiple Descriptions in one recordDavid Kalnischkies
It seems unlikely for now that proper archives will carry multiple Description-* stanzas in the Packages (or Translation-*) file, but sometimes apt eats its own output as shown by the usage of the CD team and it would be interesting to let apt output multiple translations e.g. in 'apt-cache show'.
2014-03-21mark optional (private) symbols as hiddenDavid Kalnischkies
This methods should not be used by anyone expect the library itself as they are helpers for the specific class and therefore perfect candidates for hidding. Git-Dch: Ignore
2014-03-13abstract version hash comparison a bitDavid Kalnischkies
In #737085 we see that apt can be confused if informations about versions only differ slightly. This commit adds a way of at least adding a few more data points with the next abi break to help a bit with it. Git-Dch: Ignore
2014-03-13follow method attribute suggestions by gccDavid Kalnischkies
Git-Dch: Ignore Reported-By: gcc -Wsuggest-attribute={pure,const,noreturn}
2014-03-13cleanup headers and especially #includes everywhereDavid Kalnischkies
Beside being a bit cleaner it hopefully also resolves oddball problems I have with high levels of parallel jobs. Git-Dch: Ignore Reported-By: iwyu (include-what-you-use)
2014-03-13warning: unused parameter ‘foo’ [-Wunused-parameter]David Kalnischkies
Reported-By: gcc -Wunused-parameter Git-Dch: Ignore