Age | Commit message (Collapse) | Author |
|
|
|
|
|
By storing the size of the string in the cache, we can make use of
it when comparing the names in the hashtable in pkgCache::FindGrp.
|
|
This improves performance of the cache generation on my
ARM platform (4x Cortex A15) by about 10% to 20% from
2.35-2.50 to 2.1 seconds.
|
|
|
|
Git-Dch: Ignore
|
|
This is defined for compatibility, warning about it is intended, but
only in places where it is actually used, rather than at the place we
declare it for compatability…
Git-Dch: Ignore
|
|
These assumptions were once true, but they aren't anymore, so what is
supposed to be a speed up is effectively a slowdown [not that it would
be noticible].
Usage of SingleArchFindPkg was nuked in a stable update already as the
included assumption was actually harmful btw, which is why we should get
right of other 'non-harmful' but still untrue assumptions while we can.
Git-Dch: Ignore
|
|
It still compiles after the change, so just merge it.
Closes: #448627
|
|
This somehow got back, we don't really know why. Emulate the
Section() method in the PkgIterator by looking at the section
of the head of the VersionList.
|
|
Gbp-Dch: ignore
|
|
Git-Dch: Ignore
|
|
Before MultiArch implicits weren't a thing, so they were hidden by
default by definition. Adding them for MultiArch solved many problems,
but having no reliable way of detecting which dependency (and provides)
is implicit or not causes problems everytime we want to output
dependencies without confusing our observers with unneeded
implementation details.
The really notworthy point here is actually that we keep now a better
record of how a dependency came to be so that we can later reason about
it more easily, but that is hidden so deep down in the library internals
that change is more the problems it solves than the change itself.
|
|
We store very few flags in the cache, so keeping storage space for 8 is
enough for all of them and still leaves a few unused bits remaining for
future extensions without wasting bytes for nothing.
Git-Dch: Ignore
|
|
We aren't and we will not be really compatible again with the previous
stable abi, so lets drop these markers (which never made it into a
released version) for good as they have outlived their intend already.
Git-Dch: Ignore
|
|
Cache generation needs a way of quickly iterating over the unique potion
of the dependencies to be able to share them. By linking them together
we can reduce the speed penality (~ 80%) with only a small reduction in
saved size (~ 20%).
Git-Dch: Ignore
|
|
Having dependency data separated from the link between version/package
and the dependency allows use to work on sharing the depdency data a bit
as it turns out that many dependencies are in fact duplicates. How many
are duplicates various heavily with the sources configured, but for a
single Debian release the ballpark is 2 duplicates for each dependency
already (e.g. libc6 counts 18410 dependencies, but only 45 unique). Add
more releases and the duplicates count only rises to get ~6 for 3
releases. For each architecture a user has configured which given the
shear number of dependencies amounts to MBs of duplication.
We can cut down on this number, but pay a heavy price for it: In my
many releases(3) + architectures(3) test we have a 10% (~ 0.5 sec)
increase in cache creationtime, but also 10% less cachesize (~ 10 MB).
Further work is needed to rip the whole benefits from this through, so
this is just the start.
Git-Dch: Ignore
|
|
Doing this disables the implicit copy assignment operator (among others)
which would cause hovac if used on the classes as it would just copy the
pointer, not the data the d-pointer points to. For most of the classes
we don't need a copy assignment operator anyway and in many classes it
was broken before as many contain a pointer of some sort.
Only for our Cacheset Container interfaces we define an explicit copy
assignment operator which could later be implemented to copy the data
from one d-pointer to the other if we need it.
Git-Dch: Ignore
|
|
To have a chance to keep the ABI for a while we need all three to team
up. One of them missing and we might loose, so ensuring that they are
available is a very tedious but needed task once in a while.
Git-Dch: Ignore
|
|
Translation-* files are internally handled as PackageFiles which isn't
super nice, but giving them their own struct is a bit overkill so let it
be for the moment. They always appeared in the policy output because of
this through and now that they are properly linked to a ReleaseFile they
even display all the pinning information on them, but they don't contain
any packages which could be pinned… No problem, but useless and
potentially confusing output.
Adding a 'NoPackages' flag which can be set on those files and be used
in applications seems like a simple way to fix this display issue.
|
|
We used to read the Release file for each Packages file and store the
data in the PackageFile struct even through potentially many Packages
(and Translation-*) files could use the same data. The point of the
exercise isn't the duplicated data through. Having the Release files as
first-class citizens in the Cache allows us to properly track their
state as well as allows us to use the information also for files which
aren't in the cache, but where we know to which Release file they
belong (Sources are an example for this).
This modifies the pkgCache structs, especially the PackagesFile struct
which depending on how libapt users access the data in these structs can
mean huge breakage or no visible change. As a single data point:
aptitude seems to be fine with this. Even if there is breakage it is
trivial to fix in a backportable way while avoiding breakage for
everyone would be a huge pain for us.
Note that not all PackageFile structs have a corresponding ReleaseFile.
In particular the dpkg/status file as well as *.deb files have not. As
these have only a Archive property need, the Component property takes
over this duty and the ReleaseFile remains zero. This is also the reason
why it isn't needed nor particularily recommended to change from
PackagesFile to ReleaseFile blindly. Sticking with the earlier is
usually the better option.
|
|
Having every item having its own code to verify the file(s) it handles
is an errorprune process and easy to break, especially if items move
through various stages (download, uncompress, patching, …). With a giant
rework we centralize (most of) the verification to have a better
enforcement rate and (hopefully) less chance for bugs, but it breaks the
ABI bigtime in exchange – and as we break it anyway, it is broken even
harder.
It shouldn't effect most frontends as they don't deal with the acquire
system at all or implement their own items, but some do and will need to
be patched (might be an opportunity to use apt on-board material).
The theory is simple: Items implement methods to decide if hashes need to
be checked (in this stage) and to return the expected hashes for this
item (in this stage). The verification itself is done in worker message
passing which has the benefit that a hashsum error is now a proper error
for the acquire system rather than a Done() which is later revised to a
Failed().
|
|
The change itself is no problem ABI wise, but the remove of the old
undynamic hashtables is, so we bring it back for older abis and happily
use the now available free space to backport more recent additions like
the dynamic hashtable itself.
Git-Dch: Ignore
|
|
Git-Dch: Ignore
|
|
We are the only possible users of private methods, so we are also the
only users who can potentially export them via using them in inline
methods. The point is: We don't need these symbols exported if we don't
do this, so marking them as hidden removes some methods from the API
without breaking anything as nobody could have used them.
Git-Dch: Ignore
|
|
Accessing the package records to acquire this information is pretty
costly, so that information wasn't used so far in many places. The most
noticeable user by far is EDSP at the moment, but there are ideas to
change that which this commit tries to enable.
|
|
Strings like Section names or architectures are needed vary often.
Instead of writing them each time we need them, we deploy sharing for
these special strings. Until now, this was done with a linked list of
strings in which we would search, which was stored in the cache.
It turns out we can do this just as well in memory as well with a bunch
of std::map's.
In memory means here that it isn't available anymore if we have a partly
invalid cache, but that isn't much of a problem in practice as the
status file is compared to the other files we parse very small and includes
mostly duplicates, so the space we would gain by storing is more or less
equal to the size of the stored linked list…
|
|
They both store the same information, so this field just takes up space
in the Package struct for no good reason. We mark it "just" as deprecated
instead of instantly removing it though as it isn't misleading like
Section was and is potentially used in the wild more often.
|
|
A version belongs to a section and has hence a section member of its
own. A package on the other hand can have multiple versions from
different sections. This was "solved" by using the section which was
parsed first as order of sources.list defines, but that is obviously a
horribly unpredictable thing.
We therefore directly remove this struct member to free some space and
mark the access method as deprecated, which is told to return the
section of the 'newest' known version, which is at least predictable,
but possible not what it returned before – but nobody knows.
Users are way better of with the Section() as returned by the version
they are dealing with. It is likely the same for all versions of a
package, but in the few cases it isn't, it is important (like packages
moving from main/* to contrib/* or into oldlibs …).
|
|
We had a wild mixture of (unsigned) int, long and long long here without
much sense, so this commit adds a few typedefs to get some sense in the
typesystem and ensures that a ID isn't sometimes computed as int, stored
as long and compared with a long long… as this could potentially bite us
later on as the size of the archive only increases over time.
|
|
It also makes the size configureable, so it can be adapted in the future
without the need for an abi break - and even by users…
The increase was long overdue as it gives a >10% decrease in runtime of
e.g. 'apt-get check -s'. Some (useless) benchmark with 69933 groups and
187796 packages without a pre-built cache:
time apt-get check -so APT::Cache-HashTableSize=1 → 20m
time apt-get check -so APT::Cache-HashTableSize=1000 → 6,41s
time apt-get check -so APT::Cache-HashTableSize=2000 → 5,64s (old)
time apt-get check -so APT::Cache-HashTableSize=3000 → 5,30s
time apt-get check -so APT::Cache-HashTableSize=5000 → 5,08s
time apt-get check -so APT::Cache-HashTableSize=6000 → 5,05s
time apt-get check -so APT::Cache-HashTableSize=7000 → 5,02s
time apt-get check -so APT::Cache-HashTableSize=8000 → 5,00s
time apt-get check -so APT::Cache-HashTableSize=9000 → 4,98s
time apt-get check -so APT::Cache-HashTableSize=10000 → 4,96s (new)
time apt-get check -so APT::Cache-HashTableSize=15000 → 4,90s
time apt-get check -so APT::Cache-HashTableSize=20000 → 4,86s
time apt-get check -so APT::Cache-HashTableSize=30000 → 4,77s
time apt-get check -so APT::Cache-HashTableSize=40000 → 4,74s
time apt-get check -so APT::Cache-HashTableSize=50000 → 4,73s
time apt-get check -so APT::Cache-HashTableSize=60000 → 4,71s
The gap increases further for operations which have more package
lookups. Factor 5 was chosen as higher values do not provide any
really significant timing advantage anymore compared to the memory
increase in my testing and there is always the possibility to increase
it now if that changes. (also most users will not have 3 releases and
4 architectures in the cache, so theirs will be much smaller and faster).
|
|
Conflicts:
apt-pkg/acquire-item.cc
apt-pkg/acquire-item.h
apt-pkg/deb/debmetaindex.h
apt-pkg/pkgcache.cc
test/integration/test-apt-ftparchive-src-cachedb
|
|
This is a internal struct not a external interface so the actual
breakage should be small.
|
|
|
|
The cache heavily depends on the architecture(s) it is build for,
especially if you move from single- to multiarch. Adding a new
architecture to dpkg therefore has to be detected and must invalidate
the cache so that we don't operate on incorrect data.
The incorrect data will prevent us from doing otherwise sensible
actions (it doesn't allow bad things to happen) and the recovery is
simple and automatic in most cases, so this hides pretty well and is
also not as serious as it might sound at first.
Closes: 745036
|
|
Conflicts:
apt-pkg/cachefilter.h
apt-pkg/contrib/fileutl.cc
apt-pkg/contrib/netrc.h
apt-pkg/deb/debsrcrecords.cc
apt-pkg/init.h
apt-pkg/pkgcache.cc
debian/apt.install.in
debian/changelog
|
|
Git-Dch: Ignore
Reported-By: gcc -Wsuggest-attribute={pure,const,noreturn}
|
|
Reported-By: gcc -Wignored-qualifiers
Git-Dch: Ignore
|
|
Git-Dch: Ignore
Reported-By: gcc -Wpedantic
|
|
Git-Dch: Ignore
|
|
Conflicts:
apt-private/private-list.cc
configure.ac
debian/apt.install.in
debian/changelog
|
|
|
|
- adjust pkgCache::State::VerPriority enum, to match reality
|
|
The breakage is just to big for now, so guard the change with
#ifndef APT_8_CLEANER_HEADERS and be nice to library users
|
|
- prefer native providers over foreigns even if the chain is foreign
The code preferred real over virtual packages and based on priorities.
This is changed in so far that a real package from any arch is preferred
over any virtual provider and if priorities doesn't help in choosing the
best provider we choose it based on architectures
|
|
|
|
|
|
- readd All{Foreign,Allowed} as suggested by Julian to
remain strictly API compatible
|
|
- clean up mess with the "all" handling in MultiArch to
fix LP: #733741 cleanly for everyone now
|
|
- use the native Architecture stored in the cache header instead of
loading it from configuration as suggested by Julian Andres Klode
|