summaryrefslogtreecommitdiff
path: root/apt-pkg/pkgcachegen.cc
AgeCommit message (Collapse)Author
2015-08-10hide implicit deps in apt-cache again by defaultDavid Kalnischkies
Before MultiArch implicits weren't a thing, so they were hidden by default by definition. Adding them for MultiArch solved many problems, but having no reliable way of detecting which dependency (and provides) is implicit or not causes problems everytime we want to output dependencies without confusing our observers with unneeded implementation details. The really notworthy point here is actually that we keep now a better record of how a dependency came to be so that we can later reason about it more easily, but that is hidden so deep down in the library internals that change is more the problems it solves than the change itself.
2015-08-10link DependencyData structs togetherDavid Kalnischkies
Cache generation needs a way of quickly iterating over the unique potion of the dependencies to be able to share them. By linking them together we can reduce the speed penality (~ 80%) with only a small reduction in saved size (~ 20%). Git-Dch: Ignore
2015-08-10split-up Dependency structDavid Kalnischkies
Having dependency data separated from the link between version/package and the dependency allows use to work on sharing the depdency data a bit as it turns out that many dependencies are in fact duplicates. How many are duplicates various heavily with the sources configured, but for a single Debian release the ballpark is 2 duplicates for each dependency already (e.g. libc6 counts 18410 dependencies, but only 45 unique). Add more releases and the duplicates count only rises to get ~6 for 3 releases. For each architecture a user has configured which given the shear number of dependencies amounts to MBs of duplication. We can cut down on this number, but pay a heavy price for it: In my many releases(3) + architectures(3) test we have a 10% (~ 0.5 sec) increase in cache creationtime, but also 10% less cachesize (~ 10 MB). Further work is needed to rip the whole benefits from this through, so this is just the start. Git-Dch: Ignore
2015-08-10make all d-pointer * const pointersDavid Kalnischkies
Doing this disables the implicit copy assignment operator (among others) which would cause hovac if used on the classes as it would just copy the pointer, not the data the d-pointer points to. For most of the classes we don't need a copy assignment operator anyway and in many classes it was broken before as many contain a pointer of some sort. Only for our Cacheset Container interfaces we define an explicit copy assignment operator which could later be implemented to copy the data from one d-pointer to the other if we need it. Git-Dch: Ignore
2015-06-16add d-pointer, virtual destructors and de-inline de/constructorsDavid Kalnischkies
To have a chance to keep the ABI for a while we need all three to team up. One of them missing and we might loose, so ensuring that they are available is a very tedious but needed task once in a while. Git-Dch: Ignore
2015-06-15populate the Architecture field for PackageFilesDavid Kalnischkies
This is mainly visible in the policy, so that you can now pin by b= and let it only effect Packages files of this architecture and hence the packages coming from it (which do not need to be from this architecture, but very likely are in a normal repository setup). If you should pin by architecture in this way is a different question… Closes: 687255
2015-06-15implement default apt-get file --release-info modeDavid Kalnischkies
Selecting targets based on the Release they belong to isn't to unrealistic. In fact, it is assumed to be the most used case so it is made the default especially as this allows to bundle another thing we have to be careful with: Filenames and only showing targets we have acquired. Closes: 752702
2015-06-12store Release files data in the CacheDavid Kalnischkies
We used to read the Release file for each Packages file and store the data in the PackageFile struct even through potentially many Packages (and Translation-*) files could use the same data. The point of the exercise isn't the duplicated data through. Having the Release files as first-class citizens in the Cache allows us to properly track their state as well as allows us to use the information also for files which aren't in the cache, but where we know to which Release file they belong (Sources are an example for this). This modifies the pkgCache structs, especially the PackagesFile struct which depending on how libapt users access the data in these structs can mean huge breakage or no visible change. As a single data point: aptitude seems to be fine with this. Even if there is breakage it is trivial to fix in a backportable way while avoiding breakage for everyone would be a huge pain for us. Note that not all PackageFile structs have a corresponding ReleaseFile. In particular the dpkg/status file as well as *.deb files have not. As these have only a Archive property need, the Component property takes over this duty and the ReleaseFile remains zero. This is also the reason why it isn't needed nor particularily recommended to change from PackagesFile to ReleaseFile blindly. Sticking with the earlier is usually the better option.
2014-11-08guard pkg/grp hashtable creation changesDavid Kalnischkies
The change itself is no problem ABI wise, but the remove of the old undynamic hashtables is, so we bring it back for older abis and happily use the now available free space to backport more recent additions like the dynamic hashtable itself. Git-Dch: Ignore
2014-11-08replace ignore-deprecated #pragma dance with _PragmaDavid Kalnischkies
For compatibility we use/provide and fill quiet some deprecated methods and fields, which subsequently earns us a warning for using them. These warnings therefore have to be disabled for these codeparts and that is what this change does now in a slightly more elegant way. Git-Dch: Ignore
2014-10-03rename StringType VERSION to VERSIONNUMBERDavid Kalnischkies
aptitude has a define for VERSION, so to not generate a FTBFS we just rename our enum element to a slightly less generic name. Git-Dch: Ignore
2014-09-27fix: Prefer prefix ++/-- operators for non-primitive typesDavid Kalnischkies
Git-Dch: Ignore Reported-By: cppcheck
2014-09-27de-duplicate version strings in the cacheDavid Kalnischkies
Turns out that version numbers aren't as random as you might guess. In my cache for example, I have: Total package names: 69513 (1390 k) Total package structures: 188259 (9036 k) Total distinct versions: 186345 (13.4 M) Total dependencies: 2052242 (57.5 M) which amounts to 1035873 (10,1 M) strings. Reusing version strings reduces this to 161465 (3.479 k). This comes at a cost of course: Generation is slightly slower, but we are still faster than what we started with and it makes room (also cache size wise) for further changes.
2014-09-27drop stored StringItems in favor of in-memory mappingsDavid Kalnischkies
Strings like Section names or architectures are needed vary often. Instead of writing them each time we need them, we deploy sharing for these special strings. Until now, this was done with a linked list of strings in which we would search, which was stored in the cache. It turns out we can do this just as well in memory as well with a bunch of std::map's. In memory means here that it isn't available anymore if we have a partly invalid cache, but that isn't much of a problem in practice as the status file is compared to the other files we parse very small and includes mostly duplicates, so the space we would gain by storing is more or less equal to the size of the stored linked list…
2014-09-27deprecate Pkg->Name in favor of Grp->NameDavid Kalnischkies
They both store the same information, so this field just takes up space in the Package struct for no good reason. We mark it "just" as deprecated instead of instantly removing it though as it isn't misleading like Section was and is potentially used in the wild more often.
2014-06-18correct 'apt-cache stats' to include moreDavid Kalnischkies
It still doesn't reflect the size the cache has on the disk compared to what is given as total size (90 vs 103 MB), but by counting all structs in we are at least a bit closer to the reality. Git-Dch: ignore
2014-06-18cleanup datatypes mix used in binary cacheDavid Kalnischkies
We had a wild mixture of (unsigned) int, long and long long here without much sense, so this commit adds a few typedefs to get some sense in the typesystem and ensures that a ID isn't sometimes computed as int, stored as long and compared with a long long… as this could potentially bite us later on as the size of the archive only increases over time.
2014-06-18increase hashtable size for packages/groups by factor 5David Kalnischkies
It also makes the size configureable, so it can be adapted in the future without the need for an abi break - and even by users… The increase was long overdue as it gives a >10% decrease in runtime of e.g. 'apt-get check -s'. Some (useless) benchmark with 69933 groups and 187796 packages without a pre-built cache: time apt-get check -so APT::Cache-HashTableSize=1 → 20m time apt-get check -so APT::Cache-HashTableSize=1000 → 6,41s time apt-get check -so APT::Cache-HashTableSize=2000 → 5,64s (old) time apt-get check -so APT::Cache-HashTableSize=3000 → 5,30s time apt-get check -so APT::Cache-HashTableSize=5000 → 5,08s time apt-get check -so APT::Cache-HashTableSize=6000 → 5,05s time apt-get check -so APT::Cache-HashTableSize=7000 → 5,02s time apt-get check -so APT::Cache-HashTableSize=8000 → 5,00s time apt-get check -so APT::Cache-HashTableSize=9000 → 4,98s time apt-get check -so APT::Cache-HashTableSize=10000 → 4,96s (new) time apt-get check -so APT::Cache-HashTableSize=15000 → 4,90s time apt-get check -so APT::Cache-HashTableSize=20000 → 4,86s time apt-get check -so APT::Cache-HashTableSize=30000 → 4,77s time apt-get check -so APT::Cache-HashTableSize=40000 → 4,74s time apt-get check -so APT::Cache-HashTableSize=50000 → 4,73s time apt-get check -so APT::Cache-HashTableSize=60000 → 4,71s The gap increases further for operations which have more package lookups. Factor 5 was chosen as higher values do not provide any really significant timing advantage anymore compared to the memory increase in my testing and there is always the possibility to increase it now if that changes. (also most users will not have 3 releases and 4 architectures in the cache, so theirs will be much smaller and faster).
2014-06-18Merge remote-tracking branch 'mvo/feature/hash-stats' into debian/experimentalMichael Vogt
Conflicts: apt-pkg/acquire-item.cc apt-pkg/acquire-item.h apt-pkg/deb/debmetaindex.h apt-pkg/pkgcache.cc test/integration/test-apt-ftparchive-src-cachedb
2014-06-18[API-Break] rename pkgCache::Package::NextPackage to pkgCache::Package::NextMichael Vogt
This is a internal struct not a external interface so the actual breakage should be small.
2014-05-10invalid cache if architecture set doesn't matchDavid Kalnischkies
The cache heavily depends on the architecture(s) it is build for, especially if you move from single- to multiarch. Adding a new architecture to dpkg therefore has to be detected and must invalidate the cache so that we don't operate on incorrect data. The incorrect data will prevent us from doing otherwise sensible actions (it doesn't allow bad things to happen) and the recovery is simple and automatic in most cases, so this hides pretty well and is also not as serious as it might sound at first. Closes: 745036
2014-05-09parse and retrieve multiple Descriptions in one recordDavid Kalnischkies
It seems unlikely for now that proper archives will carry multiple Description-* stanzas in the Packages (or Translation-*) file, but sometimes apt eats its own output as shown by the usage of the CD team and it would be interesting to let apt output multiple translations e.g. in 'apt-cache show'.
2014-03-13abstract version hash comparison a bitDavid Kalnischkies
In #737085 we see that apt can be confused if informations about versions only differ slightly. This commit adds a way of at least adding a few more data points with the next abi break to help a bit with it. Git-Dch: Ignore
2014-03-13cleanup headers and especially #includes everywhereDavid Kalnischkies
Beside being a bit cleaner it hopefully also resolves oddball problems I have with high levels of parallel jobs. Git-Dch: Ignore Reported-By: iwyu (include-what-you-use)
2014-03-13warning: unused parameter ‘foo’ [-Wunused-parameter]David Kalnischkies
Reported-By: gcc -Wunused-parameter Git-Dch: Ignore
2014-03-13warning: cannot optimize loop, the loop counter may overflow ↵David Kalnischkies
[-Wunsafe-loop-optimizations] Git-Dch: Ignore Reported-By: gcc -Wunsafe-loop-optimizations
2014-03-13warning: cast from type A to type B casts away qualifiers [-Wcast-qual]David Kalnischkies
Git-Dch: Ignore Reported-By: gcc -Wcast-qual
2013-06-20handle missing "Description" in apt-cache showDavid Kalnischkies
do not blindly assume that all packages stanzas have a "Description:" field in 'apt-cache show' as well as in the cache creation itself. We instead assume now that if the stanza has a Description, it will not be the first field as we look out for "\nDescription" to take care of MD5sum as well as (maybe ignored) translated Descriptions embedded in the package stanza. Closes: #712435
2013-04-03share version strings between same versions (of different architectures)David Kalnischkies
to save some space and allow quick comparisions later on
2013-04-03 - sort group and package names in the hashtable on insertDavid Kalnischkies
* apt-pkg/pkgcache.cc: - assume sorted hashtable entries for groups/packages
2013-04-01equal comparisions are used mostly in same-source relations,David Kalnischkies
so use this to try to reuse some version strings
2013-03-13factor version string creation out of NewDepends, so we can easily reuseDavid Kalnischkies
version strings e.g. for implicit multi-arch dependencies
2013-03-12handle language tags for descriptions are unique strings to be sharedDavid Kalnischkies
2013-03-12* apt-pkg/pkgcachegen.cc:David Kalnischkies
- do not store the MD5Sum for every description language variant as it will be the same for all so it can be shared to save cache space
2012-10-15* apt-pkg/pkgcachegen.cc:Michael Vogt
- Fix crash if the cache is remapped while writing a Provides version (LP: #1066445).
2012-10-13write the native architecture as unique string into the cache headerDavid Kalnischkies
as it is used for arch:all packages as a map to arch:native. Otherwise arch comparisons later will see differences (Closes: #689323)
2012-10-13correct "3 missing" to "2 missing" remap registrations as the VersionDavid Kalnischkies
handled in NewVersion is already registered
2012-09-19add 3 missing remap registrations causing a segfault in caseDavid Kalnischkies
we use the not remapped iterators after a move of the mmap again
2012-09-19* apt-pkg/pkgcachegen.cc:David Kalnischkies
- ensure that dependencies for packages:none are always generated
2012-09-09* apt-pkg/pkgcachegen.cc:David Kalnischkies
- do not create 'native' (or now 'none') package structures as a side effect of description translation parsing as it pollutes the cache
2012-09-09handle packages without a mandatory architecture (debian-policy §5.3)David Kalnischkies
by introducing a pseudo-architecture 'none' so that the small group of users with these packages can get right of them without introducing too much hassle for other users (Closes: #686346)
2012-06-14* apt-pkg/pkgcachegen.cc:Daniel Hartwig
- always reset _error->StackCount in MakeStatusCache (Closes: #677175)
2012-05-12* apt-pkg/pkgcachegen.cc:David Kalnischkies
- make IsDuplicatedDescription static so that it is really private as we don't need a symbol for it as it is not in a header
2012-05-05check if we work on a valid description in IsDuplicateDescription asDavid Kalnischkies
we end up working on dangling pointers otherwise which segfaults on s390x and ppc64 (Closes: #669427)
2012-05-02* apt-pkg/pkgcachegen.cc:David Kalnischkies
- check if NewDescription allocation has failed and error out accordingly
2011-12-15atleast libapt should announce to itself that it is clean…David Kalnischkies
(and be it if it tries to announce that…)
2011-10-12add implicit dependencies needed for Multi-Arch at the time a VersionDavid Kalnischkies
struct is created and not at the end of the cache generation This allows us to be independent from the configured architectures for these kind of conflicts, we get natural progress for free and only the needed dependencies are in th respective binary cache.
2011-10-12use one string to construct the error message instead of using multipleDavid Kalnischkies
just with different debugging information at the end
2011-10-12a version can have only a single md5 for descriptions, so we can optimizeDavid Kalnischkies
the merging with this knowledge a bit and by correctly sharing the lists we only need to have a single description list for possibly many different versions. This also means that description translations are shared between different sources
2011-10-11share description list between "same" versions (LP: #868977)David Kalnischkies