Age | Commit message (Collapse) | Author |
|
|
|
XXH3 is faster than both our CRC32c implementation as well
as DJB hash for hash table hashing, so meh, let's switch to
it.
|
|
Instead of just using uint32_t, which would allow you to
assign e.g. a map_pointer<Version> to a map_pointer<Package>,
use our own smarter struct that has strict type checking.
We allow creating a map_pointer from a nullptr, and we allow
comparing map_pointer to nullptr, which also deals with comparisons
against 0 which are often used, as 0 will be implictly converted
to nullptr.
|
|
|
|
This is a first step to a type safe cache, adding typing
information everywhere. Next, we'll replace map_pointer<T>
implementation with a type safe one.
|
|
|
|
Remove it everywhere, except where it is still needed.
|
|
We were de-duplicating package name strings in StoreString, but also
deduplicating most of them by them being in groups, so we had extra
hash table lookups that could be avoided in NewGroup().
To continue deduplicating names across binary packages and source
packages, insert groups for source packages as well. This is also
a good first step in allowing efficient lookup of packages by source
package - we can extend Group later by a list of SourceVersion objects,
or alternatively, simply add a by-source chain into pkgCache::Version.
This change improves performance by about 10% (913 to 814 ms), while
having no significant overhead on the cache size:
--- before
+++ after
@@ -1,7 +1,7 @@
-Total package names: 109536 (2.191 k)
-Total package structures: 118689 (4.748 k)
+Total package names: 119642 (2.393 k)
+Total package structures: 118687 (4.747 k)
Normal packages: 83309
- Pure virtual packages: 3365
+ Pure virtual packages: 3363
Single virtual packages: 17811
Mixed virtual packages: 1973
Missing: 12231
@@ -10,21 +10,21 @@ Total distinct descriptions: 149291 (3.583 k)
Total dependencies: 484135/156650 (12,2 M)
Total ver/file relations: 57421 (1.378 k)
Total Desc/File relations: 18219 (437 k)
-Total Provides mappings: 29963 (719 k)
+Total Provides mappings: 29959 (719 k)
Total globbed strings: 226993 (5.332 k)
Total slack space: 26,8 k
-Total space accounted for: 38,1 M
+Total space accounted for: 38,3 M
Total buckets in PkgHashTable: 50503
- Unused: 5727
- Used: 44776
- Utilization: 88.6601%
- Average entries: 2.65073
+ Unused: 5728
+ Used: 44775
+ Utilization: 88.6581%
+ Average entries: 2.65074
Longest: 60
Shortest: 1
Total buckets in GrpHashTable: 50503
- Unused: 5727
- Used: 44776
- Utilization: 88.6601%
- Average entries: 2.44631
- Longest: 10
+ Unused: 4649
+ Used: 45854
+ Utilization: 90.7946%
+ Average entries: 2.60919
+ Longest: 11
Shortest: 1
|
|
|
|
|
|
This makes it easier to see which headers includes what.
The changes were done by running
git grep -l '#\s*include' \
| grep -E '.(cc|h)$' \
| xargs sed -i -E 's/(^\s*)#(\s*)include/\1#\2 include/'
To modify all include lines by adding a space, and then running
./git-clang-format.sh.
|
|
Including cacheiterators.h before pkgcache.h fails because
pkgcache.h depends on cacheiterators.h.
|
|
Our profile says we spend about 5% of the time transforming the
hex digits into the binary format used by HashsumValue, all for
comparing them against the other strings. That makes no sense
at all.
According to callgrind, this reduces the overall instruction
count from 5,3 billion to 5 billion in my example, which
roughly matches the 5%.
|
|
It is a bit academic to support values which aren't big enough to fit even
the hashtables without resizing, but cleaning up ensures that we do the
right thing (aka not segfaulting) even if something goes wrong in these
deep layers. You still can't have very very small values through…
Git-Dch: Ignore
|
|
Part of hidden classes, so conversion is abi-free.
Git-Dch: Ignore
|
|
These virtual methods are implemented in hidden classes, so we can drop
them without breaking the ABI.
Git-Dch: Ignore
|
|
This would mess up reference counting and should not be allowed
(it could be implemented correctly, but it would not be efficient
and we do not need it).
Gbp-Dch: ignore
|
|
This allows us to check if a value to be remapped was inside
the cache or not, which will become useful at a later point.
Gbp-Dch: ignore
|
|
Gbp-Dch: ignore
|
|
Instead of storing a string -> map_stringitem_t mapping, create
our own data type that can point to either a normal string or
a string inside the cache.
This avoids the creation of any string and improves performance
slightly (about 4%).
|
|
This removes some minor overhead.
Gbp-Dch: ignore
|
|
This improves performance of the cache generation on my
ARM platform (4x Cortex A15) by about 10% to 20% from
2.35-2.50 to 2.1 seconds.
|
|
If we already have opened a cache, there is no point in having
to open it again.
|
|
This is for public users only, which cannot use the class at all,
except for the static methods.
|
|
std::unordered_map is faster than std::map in our use case,
reducing cache generation time by about 10% in my benchmark.
|
|
Git-Dch: Ignore
|
|
|
|
The code was never active in production, it just sits there collecting
dust and given that it is never tested probably doesn't even work
anymore the way it was supposed to be (whatever that was exactly in the
first place). So just remove it before I have to "fix" it again next
time.
Git-Dch: Ignore
|
|
Trade deduplication of code for a bunch of new virtuals, so it is
actually visible how the different indexes behave cleaning up the
interface at large in the process.
Git-Dch: Ignore
|
|
Sources are usually defined in sources.list (and co) and are pretty
stable, but once in a while a frontend might want to add an additional
"source" like a local .deb file to install this package (No support for
'real' sources being added this way as this is a multistep process).
We had a hack in place to allow apt-get and apt to pull this of for a
short while now, but other frontends are either left in the cold by this
and/or the code for it looks dirty with FIXMEs plastering it and has on
top of this also some problems (like including these 'volatile' sources
in the srcpkgcache.bin file).
So the biggest part in this commit is actually the rewrite of the cache
generation as it is now potentially a three step process. The biggest
problem with adding support now through is that this makes a bunch of
previously mostly unusable by externs and therefore hidden classes
public, so a bit of further tuneing on this now public API is in order…
|
|
Now that we deal with provides in a more dynamic fashion the last
remaining problem is explicit dependencies like 'Conflicts: foo' which
have to apply to all architectures, but creating them all at the same
time requires us to know all architectures ending up in the cache which
isn't needed to be the same set as all foreign architectures.
The effect is visible already now through as this prevents the creation
of a bunch of virtual packages for arch:all packages and as such also
many dependencies, just not very visible if you don't look at the stats…
Git-Dch Ignore
|
|
Expecting the worst is easy to code, but has its disadvantages e.g.
by creating package structures which otherwise would have never
existed. By creating the provides instead at the time a package
structure is added we are well prepared for the introduction of partial
architectures, massive amounts of M-A:foreign (and :allowed) and co as
far as provides are concerned at least. We have something relatively
similar for dependencies already.
Many tests are added for both M-A states and the code cleaned to
properly support implicit provides for foreign architectures and
architectures we 'just' happen to parse.
Git-Dch: Ignore
|
|
Before MultiArch implicits weren't a thing, so they were hidden by
default by definition. Adding them for MultiArch solved many problems,
but having no reliable way of detecting which dependency (and provides)
is implicit or not causes problems everytime we want to output
dependencies without confusing our observers with unneeded
implementation details.
The really notworthy point here is actually that we keep now a better
record of how a dependency came to be so that we can later reason about
it more easily, but that is hidden so deep down in the library internals
that change is more the problems it solves than the change itself.
|
|
Doing this disables the implicit copy assignment operator (among others)
which would cause hovac if used on the classes as it would just copy the
pointer, not the data the d-pointer points to. For most of the classes
we don't need a copy assignment operator anyway and in many classes it
was broken before as many contain a pointer of some sort.
Only for our Cacheset Container interfaces we define an explicit copy
assignment operator which could later be implemented to copy the data
from one d-pointer to the other if we need it.
Git-Dch: Ignore
|
|
Some of them modify the ABI, but given that we prepare a big one
already, these few hardly count for much.
Git-Dch: Ignore
|
|
To have a chance to keep the ABI for a while we need all three to team
up. One of them missing and we might loose, so ensuring that they are
available is a very tedious but needed task once in a while.
Git-Dch: Ignore
|
|
This is mainly visible in the policy, so that you can now pin by b= and
let it only effect Packages files of this architecture and hence the
packages coming from it (which do not need to be from this architecture,
but very likely are in a normal repository setup).
If you should pin by architecture in this way is a different question…
Closes: 687255
|
|
We used to read the Release file for each Packages file and store the
data in the PackageFile struct even through potentially many Packages
(and Translation-*) files could use the same data. The point of the
exercise isn't the duplicated data through. Having the Release files as
first-class citizens in the Cache allows us to properly track their
state as well as allows us to use the information also for files which
aren't in the cache, but where we know to which Release file they
belong (Sources are an example for this).
This modifies the pkgCache structs, especially the PackagesFile struct
which depending on how libapt users access the data in these structs can
mean huge breakage or no visible change. As a single data point:
aptitude seems to be fine with this. Even if there is breakage it is
trivial to fix in a backportable way while avoiding breakage for
everyone would be a huge pain for us.
Note that not all PackageFile structs have a corresponding ReleaseFile.
In particular the dpkg/status file as well as *.deb files have not. As
these have only a Archive property need, the Component property takes
over this duty and the ReleaseFile remains zero. This is also the reason
why it isn't needed nor particularily recommended to change from
PackagesFile to ReleaseFile blindly. Sticking with the earlier is
usually the better option.
|
|
We have a bunch of classes which are of no use for the outside world,
but were still exported and so needed to preserve ABI/API. Marking them
as hidden to not export them any longer is a big API break in theory,
but in practice nobody is using them – as if they would its a bug.
|
|
Git-Dch: Ignore
|
|
aptitude has a define for VERSION, so to not generate a FTBFS we just
rename our enum element to a slightly less generic name.
Git-Dch: Ignore
|
|
Reported-By: cppcheck
Git-Dch: Ignore
|
|
Strings like Section names or architectures are needed vary often.
Instead of writing them each time we need them, we deploy sharing for
these special strings. Until now, this was done with a linked list of
strings in which we would search, which was stored in the cache.
It turns out we can do this just as well in memory as well with a bunch
of std::map's.
In memory means here that it isn't available anymore if we have a partly
invalid cache, but that isn't much of a problem in practice as the
status file is compared to the other files we parse very small and includes
mostly duplicates, so the space we would gain by storing is more or less
equal to the size of the stored linked list…
|
|
We had a wild mixture of (unsigned) int, long and long long here without
much sense, so this commit adds a few typedefs to get some sense in the
typesystem and ensures that a ID isn't sometimes computed as int, stored
as long and compared with a long long… as this could potentially bite us
later on as the size of the archive only increases over time.
|
|
It seems unlikely for now that proper archives will carry multiple
Description-* stanzas in the Packages (or Translation-*) file, but
sometimes apt eats its own output as shown by the usage of the CD team
and it would be interesting to let apt output multiple translations
e.g. in 'apt-cache show'.
|
|
This methods should not be used by anyone expect the library itself as
they are helpers for the specific class and therefore perfect candidates
for hidding.
Git-Dch: Ignore
|
|
In #737085 we see that apt can be confused if informations about
versions only differ slightly. This commit adds a way of at least adding
a few more data points with the next abi break to help a bit with it.
Git-Dch: Ignore
|
|
Git-Dch: Ignore
Reported-By: gcc -Wsuggest-attribute={pure,const,noreturn}
|
|
Beside being a bit cleaner it hopefully also resolves oddball problems
I have with high levels of parallel jobs.
Git-Dch: Ignore
Reported-By: iwyu (include-what-you-use)
|
|
Reported-By: gcc -Wunused-parameter
Git-Dch: Ignore
|