Age | Commit message (Collapse) | Author |
|
Before MultiArch implicits weren't a thing, so they were hidden by
default by definition. Adding them for MultiArch solved many problems,
but having no reliable way of detecting which dependency (and provides)
is implicit or not causes problems everytime we want to output
dependencies without confusing our observers with unneeded
implementation details.
The really notworthy point here is actually that we keep now a better
record of how a dependency came to be so that we can later reason about
it more easily, but that is hidden so deep down in the library internals
that change is more the problems it solves than the change itself.
|
|
Cache generation needs a way of quickly iterating over the unique potion
of the dependencies to be able to share them. By linking them together
we can reduce the speed penality (~ 80%) with only a small reduction in
saved size (~ 20%).
Git-Dch: Ignore
|
|
Having dependency data separated from the link between version/package
and the dependency allows use to work on sharing the depdency data a bit
as it turns out that many dependencies are in fact duplicates. How many
are duplicates various heavily with the sources configured, but for a
single Debian release the ballpark is 2 duplicates for each dependency
already (e.g. libc6 counts 18410 dependencies, but only 45 unique). Add
more releases and the duplicates count only rises to get ~6 for 3
releases. For each architecture a user has configured which given the
shear number of dependencies amounts to MBs of duplication.
We can cut down on this number, but pay a heavy price for it: In my
many releases(3) + architectures(3) test we have a 10% (~ 0.5 sec)
increase in cache creationtime, but also 10% less cachesize (~ 10 MB).
Further work is needed to rip the whole benefits from this through, so
this is just the start.
Git-Dch: Ignore
|
|
Doing this disables the implicit copy assignment operator (among others)
which would cause hovac if used on the classes as it would just copy the
pointer, not the data the d-pointer points to. For most of the classes
we don't need a copy assignment operator anyway and in many classes it
was broken before as many contain a pointer of some sort.
Only for our Cacheset Container interfaces we define an explicit copy
assignment operator which could later be implemented to copy the data
from one d-pointer to the other if we need it.
Git-Dch: Ignore
|
|
To have a chance to keep the ABI for a while we need all three to team
up. One of them missing and we might loose, so ensuring that they are
available is a very tedious but needed task once in a while.
Git-Dch: Ignore
|
|
This is mainly visible in the policy, so that you can now pin by b= and
let it only effect Packages files of this architecture and hence the
packages coming from it (which do not need to be from this architecture,
but very likely are in a normal repository setup).
If you should pin by architecture in this way is a different question…
Closes: 687255
|
|
Selecting targets based on the Release they belong to isn't to
unrealistic. In fact, it is assumed to be the most used case so it is
made the default especially as this allows to bundle another thing we
have to be careful with: Filenames and only showing targets we have
acquired.
Closes: 752702
|
|
We used to read the Release file for each Packages file and store the
data in the PackageFile struct even through potentially many Packages
(and Translation-*) files could use the same data. The point of the
exercise isn't the duplicated data through. Having the Release files as
first-class citizens in the Cache allows us to properly track their
state as well as allows us to use the information also for files which
aren't in the cache, but where we know to which Release file they
belong (Sources are an example for this).
This modifies the pkgCache structs, especially the PackagesFile struct
which depending on how libapt users access the data in these structs can
mean huge breakage or no visible change. As a single data point:
aptitude seems to be fine with this. Even if there is breakage it is
trivial to fix in a backportable way while avoiding breakage for
everyone would be a huge pain for us.
Note that not all PackageFile structs have a corresponding ReleaseFile.
In particular the dpkg/status file as well as *.deb files have not. As
these have only a Archive property need, the Component property takes
over this duty and the ReleaseFile remains zero. This is also the reason
why it isn't needed nor particularily recommended to change from
PackagesFile to ReleaseFile blindly. Sticking with the earlier is
usually the better option.
|
|
The change itself is no problem ABI wise, but the remove of the old
undynamic hashtables is, so we bring it back for older abis and happily
use the now available free space to backport more recent additions like
the dynamic hashtable itself.
Git-Dch: Ignore
|
|
For compatibility we use/provide and fill quiet some deprecated methods
and fields, which subsequently earns us a warning for using them. These
warnings therefore have to be disabled for these codeparts and that is
what this change does now in a slightly more elegant way.
Git-Dch: Ignore
|
|
aptitude has a define for VERSION, so to not generate a FTBFS we just
rename our enum element to a slightly less generic name.
Git-Dch: Ignore
|
|
Git-Dch: Ignore
Reported-By: cppcheck
|
|
Turns out that version numbers aren't as random as you might guess.
In my cache for example, I have:
Total package names: 69513 (1390 k)
Total package structures: 188259 (9036 k)
Total distinct versions: 186345 (13.4 M)
Total dependencies: 2052242 (57.5 M)
which amounts to 1035873 (10,1 M) strings.
Reusing version strings reduces this to 161465 (3.479 k).
This comes at a cost of course: Generation is slightly slower, but we
are still faster than what we started with and it makes room (also cache
size wise) for further changes.
|
|
Strings like Section names or architectures are needed vary often.
Instead of writing them each time we need them, we deploy sharing for
these special strings. Until now, this was done with a linked list of
strings in which we would search, which was stored in the cache.
It turns out we can do this just as well in memory as well with a bunch
of std::map's.
In memory means here that it isn't available anymore if we have a partly
invalid cache, but that isn't much of a problem in practice as the
status file is compared to the other files we parse very small and includes
mostly duplicates, so the space we would gain by storing is more or less
equal to the size of the stored linked list…
|
|
They both store the same information, so this field just takes up space
in the Package struct for no good reason. We mark it "just" as deprecated
instead of instantly removing it though as it isn't misleading like
Section was and is potentially used in the wild more often.
|
|
It still doesn't reflect the size the cache has on the disk compared to
what is given as total size (90 vs 103 MB), but by counting all structs
in we are at least a bit closer to the reality.
Git-Dch: ignore
|
|
We had a wild mixture of (unsigned) int, long and long long here without
much sense, so this commit adds a few typedefs to get some sense in the
typesystem and ensures that a ID isn't sometimes computed as int, stored
as long and compared with a long long… as this could potentially bite us
later on as the size of the archive only increases over time.
|
|
It also makes the size configureable, so it can be adapted in the future
without the need for an abi break - and even by users…
The increase was long overdue as it gives a >10% decrease in runtime of
e.g. 'apt-get check -s'. Some (useless) benchmark with 69933 groups and
187796 packages without a pre-built cache:
time apt-get check -so APT::Cache-HashTableSize=1 → 20m
time apt-get check -so APT::Cache-HashTableSize=1000 → 6,41s
time apt-get check -so APT::Cache-HashTableSize=2000 → 5,64s (old)
time apt-get check -so APT::Cache-HashTableSize=3000 → 5,30s
time apt-get check -so APT::Cache-HashTableSize=5000 → 5,08s
time apt-get check -so APT::Cache-HashTableSize=6000 → 5,05s
time apt-get check -so APT::Cache-HashTableSize=7000 → 5,02s
time apt-get check -so APT::Cache-HashTableSize=8000 → 5,00s
time apt-get check -so APT::Cache-HashTableSize=9000 → 4,98s
time apt-get check -so APT::Cache-HashTableSize=10000 → 4,96s (new)
time apt-get check -so APT::Cache-HashTableSize=15000 → 4,90s
time apt-get check -so APT::Cache-HashTableSize=20000 → 4,86s
time apt-get check -so APT::Cache-HashTableSize=30000 → 4,77s
time apt-get check -so APT::Cache-HashTableSize=40000 → 4,74s
time apt-get check -so APT::Cache-HashTableSize=50000 → 4,73s
time apt-get check -so APT::Cache-HashTableSize=60000 → 4,71s
The gap increases further for operations which have more package
lookups. Factor 5 was chosen as higher values do not provide any
really significant timing advantage anymore compared to the memory
increase in my testing and there is always the possibility to increase
it now if that changes. (also most users will not have 3 releases and
4 architectures in the cache, so theirs will be much smaller and faster).
|
|
Conflicts:
apt-pkg/acquire-item.cc
apt-pkg/acquire-item.h
apt-pkg/deb/debmetaindex.h
apt-pkg/pkgcache.cc
test/integration/test-apt-ftparchive-src-cachedb
|
|
This is a internal struct not a external interface so the actual
breakage should be small.
|
|
The cache heavily depends on the architecture(s) it is build for,
especially if you move from single- to multiarch. Adding a new
architecture to dpkg therefore has to be detected and must invalidate
the cache so that we don't operate on incorrect data.
The incorrect data will prevent us from doing otherwise sensible
actions (it doesn't allow bad things to happen) and the recovery is
simple and automatic in most cases, so this hides pretty well and is
also not as serious as it might sound at first.
Closes: 745036
|
|
It seems unlikely for now that proper archives will carry multiple
Description-* stanzas in the Packages (or Translation-*) file, but
sometimes apt eats its own output as shown by the usage of the CD team
and it would be interesting to let apt output multiple translations
e.g. in 'apt-cache show'.
|
|
In #737085 we see that apt can be confused if informations about
versions only differ slightly. This commit adds a way of at least adding
a few more data points with the next abi break to help a bit with it.
Git-Dch: Ignore
|
|
Beside being a bit cleaner it hopefully also resolves oddball problems
I have with high levels of parallel jobs.
Git-Dch: Ignore
Reported-By: iwyu (include-what-you-use)
|
|
Reported-By: gcc -Wunused-parameter
Git-Dch: Ignore
|
|
[-Wunsafe-loop-optimizations]
Git-Dch: Ignore
Reported-By: gcc -Wunsafe-loop-optimizations
|
|
Git-Dch: Ignore
Reported-By: gcc -Wcast-qual
|
|
do not blindly assume that all packages stanzas have a "Description:"
field in 'apt-cache show' as well as in the cache creation itself.
We instead assume now that if the stanza has a Description, it will not
be the first field as we look out for "\nDescription" to take care of
MD5sum as well as (maybe ignored) translated Descriptions embedded in
the package stanza.
Closes: #712435
|
|
to save some space and allow quick comparisions later on
|
|
* apt-pkg/pkgcache.cc:
- assume sorted hashtable entries for groups/packages
|
|
so use this to try to reuse some version strings
|
|
version strings e.g. for implicit multi-arch dependencies
|
|
|
|
- do not store the MD5Sum for every description language variant as
it will be the same for all so it can be shared to save cache space
|
|
- Fix crash if the cache is remapped while writing a Provides version
(LP: #1066445).
|
|
as it is used for arch:all packages as a map to arch:native.
Otherwise arch comparisons later will see differences (Closes: #689323)
|
|
handled in NewVersion is already registered
|
|
we use the not remapped iterators after a move of the mmap again
|
|
- ensure that dependencies for packages:none are always generated
|
|
- do not create 'native' (or now 'none') package structures as a side
effect of description translation parsing as it pollutes the cache
|
|
by introducing a pseudo-architecture 'none' so that the small group of
users with these packages can get right of them without introducing too
much hassle for other users (Closes: #686346)
|
|
- always reset _error->StackCount in MakeStatusCache (Closes: #677175)
|
|
- make IsDuplicatedDescription static so that it is really private
as we don't need a symbol for it as it is not in a header
|
|
we end up working on dangling pointers otherwise which segfaults on
s390x and ppc64 (Closes: #669427)
|
|
- check if NewDescription allocation has failed and error out accordingly
|
|
(and be it if it tries to announce that…)
|
|
struct is created and not at the end of the cache generation
This allows us to be independent from the configured architectures for
these kind of conflicts, we get natural progress for free and
only the needed dependencies are in th respective binary cache.
|
|
just with different debugging information at the end
|
|
the merging with this knowledge a bit and by correctly sharing the lists
we only need to have a single description list for possibly many different
versions. This also means that description translations are shared between
different sources
|
|
|