summaryrefslogtreecommitdiff
path: root/apt-pkg/pkgcachegen.cc
AgeCommit message (Collapse)Author
2016-01-08pkgCacheGenerator::StoreString: Get rid of std::stringJulian Andres Klode
Instead of storing a string -> map_stringitem_t mapping, create our own data type that can point to either a normal string or a string inside the cache. This avoids the creation of any string and improves performance slightly (about 4%).
2016-01-08Replace compare() == 0 checks with this == other checksJulian Andres Klode
This improves performance, as we now can ignore unequal strings based on their length already. Gbp-Dch: ignore
2016-01-08pkgCacheGenerator: Use StringView for toStringJulian Andres Klode
This removes some minor overhead. Gbp-Dch: ignore
2016-01-08pkgCacheGenerator::StoreString: Move the string into the mapJulian Andres Klode
Moving the string is likely faster than copying it. We could probably avoid strings alltogether in the future using some more crazy code, but I have not looked at that yet. Gbp-Dch: ignore
2016-01-07Switch performance critical code to use APT::StringViewJulian Andres Klode
This improves performance of the cache generation on my ARM platform (4x Cortex A15) by about 10% to 20% from 2.35-2.50 to 2.1 seconds.
2015-12-29Do not sync the cache fileJulian Andres Klode
Integrity is taken care of by the checksum now.
2015-12-29Add support for calculating hashes over the entire cacheJulian Andres Klode
2015-12-29pkgCacheGenerator: Allow passing down an already created cacheJulian Andres Klode
If we already have opened a cache, there is no point in having to open it again.
2015-12-27pkgcachegen: Use std::unordered_map instead of std::mapJulian Andres Klode
std::unordered_map is faster than std::map in our use case, reducing cache generation time by about 10% in my benchmark.
2015-11-27add messages to our deprecation warnings in libaptDavid Kalnischkies
Git-Dch: Ignore
2015-11-20do not segfault in cache generation on mmap failureDavid Kalnischkies
Out of memory and similar circumstanzas could cause MMap::Map to fail and especially the mmap/malloc calls in it. With some additional checking we can avoid segfaults and similar in such situations – at least in theory as if this is a real out of memory everything we do to handle the error could just as well run into a memory problem as well… But at least in theory (if MMap::Map is made to fail always) we can deal with it so good that a user actually never sees a failure (as the cache it tries to load with it fails and is discarded, so that DynamicMMap takes over and a new one is build) instead of segfaulting. Closes: 803417
2015-11-05apply various suggestions made by cppcheckDavid Kalnischkies
Reported-By: cppcheck Git-Dch: Ignore
2015-09-14do not ignore differently versioned self-providesDavid Kalnischkies
Reported-By: Konomi on IRC
2015-09-14avoid using global PendingError to avoid failing too often too soonDavid Kalnischkies
Our error reporting is historically grown into some kind of mess. A while ago I implemented stacking for the global error which is used in this commit now to wrap calls to functions which do not report (all) errors via return, so that only failures in those calls cause a failure to propergate down the chain rather than failing if anything (potentially totally unrelated) has failed at some point in the past. This way we can avoid stopping the entire acquire process just because a single source produced an error for example. It also means that after the acquire process the cache is generated – even if the acquire process had failures – as we still have the old good data around we can and should generate a cache for (again). There are probably more instances of this hiding, but all these looked like the easiest to work with and fix with reasonable (aka net-positive) effects.
2015-09-14implement dpkgs vision of interpreting pkg:<arch> dependenciesDavid Kalnischkies
How the Multi-Arch field and pkg:<arch> dependencies interact was discussed at DebConf15 in the "MultiArch BoF". dpkg and apt (among other tools like dose) had a different interpretation in certain scenarios which we resolved by agreeing on dpkg view – and this commit realizes this agreement in code. As was the case so far libapt sticks to the idea of trying to hide MultiArch as much as possible from individual frontends and instead translates it to good old SingleArch. There are certainly situations which can be improved in frontends if they know that MultiArch is upon them, but these are improvements – not necessary changes needed to unbreak a frontend. The implementation idea is simple: If we parse a dependency on foo:amd64 the dependency is formed on a package 'foo:amd64' of arch 'any'. This package is provided by package 'foo' of arch 'amd64', but not by 'foo' of arch 'i386'. Both of those foo packages provide each other through (assuming foo is M-A:foreign) to allow a dependency on 'foo' to be satisfied by either foo of amd64 or i386. Packages can also declare to provide 'foo:amd64' which is translated to providing 'foo:amd64:any' as well. This indirection over provides was chosen as the alternative would be to teach dependency resolvers how to deal with architecture specific dependencies – which violates the design idea of avoiding resolver changes, especially as architecture-specific dependencies are a cornercase with quite a few subtil rules. Handling it all over versioned provides as we already did for M-A in general seems much simpler as it just works for them. This switch to :any has actually a "surprising" benefit as well: Even frontends showing a package name via .Name() [which doesn't show the architecture] will display the "architecture" for dependencies in which it was explicitely requested, while we will not show the 'strange' :any arch in FullName(true) [= pretty-print] either. Before you had to specialcase these and by default you wouldn't get these details shown. The only identifiable disadvantage is that this complicates error reporting and handling. apt-get's ShowBroken has existing problems with virtual packages [it just shows the name without any reason], so that has to be worked on eventually. The other case is that detecting if a package is completely unknown or if it was at least referenced somewhere needs to acount for this "split" – not that it makes a practical difference which error is shown… but its one of the improvements possible.
2015-08-27ignore AllowMem parameter in cache generationDavid Kalnischkies
The parameter name suggests that it should forbid the building of the entire cache in memory, but this isn't how it was previously and as AllowMem is false by default it actually prevents previous usecases from working like being root and configuring apt to build no caches at all. This should be fixed at some point to actually work, but that is hard to pull off as it means switching the default and some callers (including apt itself) actually did call it explicitly with false in certain cases for no apparent reason (at least now where it is common to have enough memory to throw at every problem and even if not is a slow apt usally better than an apt erroring out). Closes: 796459
2015-08-27Fix more instances of missing remapping handlingJulian Andres Klode
After fixing Bug#796999, we noticed that there were some more instances of iterators which had no associated Dynamic object, causing them to not be updated when the cache was remapped. This happened in two places: In NewPackage() and in NewProvidesAllArch(). Gbp-Dch: ignore
2015-08-27pkgcachegen: Account for remapping when parsing depends from NewPackageJulian Andres Klode
In both the Ver and Dep variables, we need to account for remapping, as otherwise we would still reference the old bug. Reproduction environment: * An i386 system with amd64 foreign architecture * A sources.list with deb http://snapshot.debian.org/archive/debian/20150826T102846Z/ unstable main deb http://snapshot.debian.org/archive/debian/20150826T102846Z/ experimental main Thanks: Jakub Wilk for the bug report and the backtraces Closes: #796999
2015-08-17Cleanup includes after running iwyuMichael Vogt
2015-08-13Deprecate SPtrArray<T> and convert everyone to unique_ptr<T[]>Julian Andres Klode
More standardization
2015-08-13Mark SPtr as deprecated, and convert users to std::unique_ptrJulian Andres Klode
Switch to std::unique_ptr, as this is safer than SPtr.
2015-08-11Drop C++11 elements from headersJulian Andres Klode
2015-08-10drop obsolete explicit :none handling in pkgCacheGenDavid Kalnischkies
We archieve the same without the special handling now, so drop this code. Makes supporting this abdomination a little longer bearable as well. Git-Dch: Ignore
2015-08-10eliminate dead file-provides code in cache generationDavid Kalnischkies
The code was never active in production, it just sits there collecting dust and given that it is never tested probably doesn't even work anymore the way it was supposed to be (whatever that was exactly in the first place). So just remove it before I have to "fix" it again next time. Git-Dch: Ignore
2015-08-10elimate duplicated code in pkgIndexFile subclassesDavid Kalnischkies
Trade deduplication of code for a bunch of new virtuals, so it is actually visible how the different indexes behave cleaning up the interface at large in the process. Git-Dch: Ignore
2015-08-10add volatile sources support in libapt-pkgDavid Kalnischkies
Sources are usually defined in sources.list (and co) and are pretty stable, but once in a while a frontend might want to add an additional "source" like a local .deb file to install this package (No support for 'real' sources being added this way as this is a multistep process). We had a hack in place to allow apt-get and apt to pull this of for a short while now, but other frontends are either left in the cold by this and/or the code for it looks dirty with FIXMEs plastering it and has on top of this also some problems (like including these 'volatile' sources in the srcpkgcache.bin file). So the biggest part in this commit is actually the rewrite of the cache generation as it is now potentially a three step process. The biggest problem with adding support now through is that this makes a bunch of previously mostly unusable by externs and therefore hidden classes public, so a bit of further tuneing on this now public API is in order…
2015-08-10just-in-time creation for (explicit) negative depsDavid Kalnischkies
Now that we deal with provides in a more dynamic fashion the last remaining problem is explicit dependencies like 'Conflicts: foo' which have to apply to all architectures, but creating them all at the same time requires us to know all architectures ending up in the cache which isn't needed to be the same set as all foreign architectures. The effect is visible already now through as this prevents the creation of a bunch of virtual packages for arch:all packages and as such also many dependencies, just not very visible if you don't look at the stats… Git-Dch Ignore
2015-08-10just-in-time creation for (implicit) ProvidesDavid Kalnischkies
Expecting the worst is easy to code, but has its disadvantages e.g. by creating package structures which otherwise would have never existed. By creating the provides instead at the time a package structure is added we are well prepared for the introduction of partial architectures, massive amounts of M-A:foreign (and :allowed) and co as far as provides are concerned at least. We have something relatively similar for dependencies already. Many tests are added for both M-A states and the code cleaned to properly support implicit provides for foreign architectures and architectures we 'just' happen to parse. Git-Dch: Ignore
2015-08-10hide implicit deps in apt-cache again by defaultDavid Kalnischkies
Before MultiArch implicits weren't a thing, so they were hidden by default by definition. Adding them for MultiArch solved many problems, but having no reliable way of detecting which dependency (and provides) is implicit or not causes problems everytime we want to output dependencies without confusing our observers with unneeded implementation details. The really notworthy point here is actually that we keep now a better record of how a dependency came to be so that we can later reason about it more easily, but that is hidden so deep down in the library internals that change is more the problems it solves than the change itself.
2015-08-10link DependencyData structs togetherDavid Kalnischkies
Cache generation needs a way of quickly iterating over the unique potion of the dependencies to be able to share them. By linking them together we can reduce the speed penality (~ 80%) with only a small reduction in saved size (~ 20%). Git-Dch: Ignore
2015-08-10split-up Dependency structDavid Kalnischkies
Having dependency data separated from the link between version/package and the dependency allows use to work on sharing the depdency data a bit as it turns out that many dependencies are in fact duplicates. How many are duplicates various heavily with the sources configured, but for a single Debian release the ballpark is 2 duplicates for each dependency already (e.g. libc6 counts 18410 dependencies, but only 45 unique). Add more releases and the duplicates count only rises to get ~6 for 3 releases. For each architecture a user has configured which given the shear number of dependencies amounts to MBs of duplication. We can cut down on this number, but pay a heavy price for it: In my many releases(3) + architectures(3) test we have a 10% (~ 0.5 sec) increase in cache creationtime, but also 10% less cachesize (~ 10 MB). Further work is needed to rip the whole benefits from this through, so this is just the start. Git-Dch: Ignore
2015-08-10make all d-pointer * const pointersDavid Kalnischkies
Doing this disables the implicit copy assignment operator (among others) which would cause hovac if used on the classes as it would just copy the pointer, not the data the d-pointer points to. For most of the classes we don't need a copy assignment operator anyway and in many classes it was broken before as many contain a pointer of some sort. Only for our Cacheset Container interfaces we define an explicit copy assignment operator which could later be implemented to copy the data from one d-pointer to the other if we need it. Git-Dch: Ignore
2015-06-16add d-pointer, virtual destructors and de-inline de/constructorsDavid Kalnischkies
To have a chance to keep the ABI for a while we need all three to team up. One of them missing and we might loose, so ensuring that they are available is a very tedious but needed task once in a while. Git-Dch: Ignore
2015-06-15populate the Architecture field for PackageFilesDavid Kalnischkies
This is mainly visible in the policy, so that you can now pin by b= and let it only effect Packages files of this architecture and hence the packages coming from it (which do not need to be from this architecture, but very likely are in a normal repository setup). If you should pin by architecture in this way is a different question… Closes: 687255
2015-06-15implement default apt-get file --release-info modeDavid Kalnischkies
Selecting targets based on the Release they belong to isn't to unrealistic. In fact, it is assumed to be the most used case so it is made the default especially as this allows to bundle another thing we have to be careful with: Filenames and only showing targets we have acquired. Closes: 752702
2015-06-12store Release files data in the CacheDavid Kalnischkies
We used to read the Release file for each Packages file and store the data in the PackageFile struct even through potentially many Packages (and Translation-*) files could use the same data. The point of the exercise isn't the duplicated data through. Having the Release files as first-class citizens in the Cache allows us to properly track their state as well as allows us to use the information also for files which aren't in the cache, but where we know to which Release file they belong (Sources are an example for this). This modifies the pkgCache structs, especially the PackagesFile struct which depending on how libapt users access the data in these structs can mean huge breakage or no visible change. As a single data point: aptitude seems to be fine with this. Even if there is breakage it is trivial to fix in a backportable way while avoiding breakage for everyone would be a huge pain for us. Note that not all PackageFile structs have a corresponding ReleaseFile. In particular the dpkg/status file as well as *.deb files have not. As these have only a Archive property need, the Component property takes over this duty and the ReleaseFile remains zero. This is also the reason why it isn't needed nor particularily recommended to change from PackagesFile to ReleaseFile blindly. Sticking with the earlier is usually the better option.
2014-11-08guard pkg/grp hashtable creation changesDavid Kalnischkies
The change itself is no problem ABI wise, but the remove of the old undynamic hashtables is, so we bring it back for older abis and happily use the now available free space to backport more recent additions like the dynamic hashtable itself. Git-Dch: Ignore
2014-11-08replace ignore-deprecated #pragma dance with _PragmaDavid Kalnischkies
For compatibility we use/provide and fill quiet some deprecated methods and fields, which subsequently earns us a warning for using them. These warnings therefore have to be disabled for these codeparts and that is what this change does now in a slightly more elegant way. Git-Dch: Ignore
2014-10-03rename StringType VERSION to VERSIONNUMBERDavid Kalnischkies
aptitude has a define for VERSION, so to not generate a FTBFS we just rename our enum element to a slightly less generic name. Git-Dch: Ignore
2014-09-27fix: Prefer prefix ++/-- operators for non-primitive typesDavid Kalnischkies
Git-Dch: Ignore Reported-By: cppcheck
2014-09-27de-duplicate version strings in the cacheDavid Kalnischkies
Turns out that version numbers aren't as random as you might guess. In my cache for example, I have: Total package names: 69513 (1390 k) Total package structures: 188259 (9036 k) Total distinct versions: 186345 (13.4 M) Total dependencies: 2052242 (57.5 M) which amounts to 1035873 (10,1 M) strings. Reusing version strings reduces this to 161465 (3.479 k). This comes at a cost of course: Generation is slightly slower, but we are still faster than what we started with and it makes room (also cache size wise) for further changes.
2014-09-27drop stored StringItems in favor of in-memory mappingsDavid Kalnischkies
Strings like Section names or architectures are needed vary often. Instead of writing them each time we need them, we deploy sharing for these special strings. Until now, this was done with a linked list of strings in which we would search, which was stored in the cache. It turns out we can do this just as well in memory as well with a bunch of std::map's. In memory means here that it isn't available anymore if we have a partly invalid cache, but that isn't much of a problem in practice as the status file is compared to the other files we parse very small and includes mostly duplicates, so the space we would gain by storing is more or less equal to the size of the stored linked list…
2014-09-27deprecate Pkg->Name in favor of Grp->NameDavid Kalnischkies
They both store the same information, so this field just takes up space in the Package struct for no good reason. We mark it "just" as deprecated instead of instantly removing it though as it isn't misleading like Section was and is potentially used in the wild more often.
2014-06-18correct 'apt-cache stats' to include moreDavid Kalnischkies
It still doesn't reflect the size the cache has on the disk compared to what is given as total size (90 vs 103 MB), but by counting all structs in we are at least a bit closer to the reality. Git-Dch: ignore
2014-06-18cleanup datatypes mix used in binary cacheDavid Kalnischkies
We had a wild mixture of (unsigned) int, long and long long here without much sense, so this commit adds a few typedefs to get some sense in the typesystem and ensures that a ID isn't sometimes computed as int, stored as long and compared with a long long… as this could potentially bite us later on as the size of the archive only increases over time.
2014-06-18increase hashtable size for packages/groups by factor 5David Kalnischkies
It also makes the size configureable, so it can be adapted in the future without the need for an abi break - and even by users… The increase was long overdue as it gives a >10% decrease in runtime of e.g. 'apt-get check -s'. Some (useless) benchmark with 69933 groups and 187796 packages without a pre-built cache: time apt-get check -so APT::Cache-HashTableSize=1 → 20m time apt-get check -so APT::Cache-HashTableSize=1000 → 6,41s time apt-get check -so APT::Cache-HashTableSize=2000 → 5,64s (old) time apt-get check -so APT::Cache-HashTableSize=3000 → 5,30s time apt-get check -so APT::Cache-HashTableSize=5000 → 5,08s time apt-get check -so APT::Cache-HashTableSize=6000 → 5,05s time apt-get check -so APT::Cache-HashTableSize=7000 → 5,02s time apt-get check -so APT::Cache-HashTableSize=8000 → 5,00s time apt-get check -so APT::Cache-HashTableSize=9000 → 4,98s time apt-get check -so APT::Cache-HashTableSize=10000 → 4,96s (new) time apt-get check -so APT::Cache-HashTableSize=15000 → 4,90s time apt-get check -so APT::Cache-HashTableSize=20000 → 4,86s time apt-get check -so APT::Cache-HashTableSize=30000 → 4,77s time apt-get check -so APT::Cache-HashTableSize=40000 → 4,74s time apt-get check -so APT::Cache-HashTableSize=50000 → 4,73s time apt-get check -so APT::Cache-HashTableSize=60000 → 4,71s The gap increases further for operations which have more package lookups. Factor 5 was chosen as higher values do not provide any really significant timing advantage anymore compared to the memory increase in my testing and there is always the possibility to increase it now if that changes. (also most users will not have 3 releases and 4 architectures in the cache, so theirs will be much smaller and faster).
2014-06-18Merge remote-tracking branch 'mvo/feature/hash-stats' into debian/experimentalMichael Vogt
Conflicts: apt-pkg/acquire-item.cc apt-pkg/acquire-item.h apt-pkg/deb/debmetaindex.h apt-pkg/pkgcache.cc test/integration/test-apt-ftparchive-src-cachedb
2014-06-18[API-Break] rename pkgCache::Package::NextPackage to pkgCache::Package::NextMichael Vogt
This is a internal struct not a external interface so the actual breakage should be small.
2014-05-10invalid cache if architecture set doesn't matchDavid Kalnischkies
The cache heavily depends on the architecture(s) it is build for, especially if you move from single- to multiarch. Adding a new architecture to dpkg therefore has to be detected and must invalidate the cache so that we don't operate on incorrect data. The incorrect data will prevent us from doing otherwise sensible actions (it doesn't allow bad things to happen) and the recovery is simple and automatic in most cases, so this hides pretty well and is also not as serious as it might sound at first. Closes: 745036
2014-05-09parse and retrieve multiple Descriptions in one recordDavid Kalnischkies
It seems unlikely for now that proper archives will carry multiple Description-* stanzas in the Packages (or Translation-*) file, but sometimes apt eats its own output as shown by the usage of the CD team and it would be interesting to let apt output multiple translations e.g. in 'apt-cache show'.