Age | Commit message (Collapse) | Author |
|
Downloading and storing are two different operations were different
compression types can be preferred. For downloading we provide the
choice via Acquire::CompressionTypes::Order as there is a choice to
be made between download size and speed – and limited by whats available
in the repository.
Storage on the other hand has all compressions currently supported by
apt available and to reduce runtime of tools accessing these files the
compression type should be a low-cost format in terms of decompression.
apt traditionally stores its indexes uncompressed on disk, but has
options to keep them compressed. Now that apt downloads additional files
we also deal with files which simply can't be stored uncompressed as
they are just too big (like Contents for apt-file). Traditionally they
are downloaded in a low-cost format (gz) as repositories do not provide
other formats, but there might be even lower-cost formats and for
download we could introduce higher-cost in the repositories.
Downloading an entire index potentially requires recompression to
another format, so an update takes potentially longer – but big files
are usually updated via pdiffs which has to de- and re-compress anyhow
and does it on the fly anyhow, so there is no extra time needed and in
general it seems to be benefitial to invest the time in update to save
time later on file access.
|
|
There is no reason to enforce that the file we start the bootstrap with
is compressed with a compressor which is available online. This allows
us to change the on-disk format as well as deals with repositories
adding/removing support for a specific compressor.
|
|
This doesn't allow all tests to run cleanly, but it at least allows to
write tests which could run successfully in such environments.
Git-Dch: Ignore
|
|
Not all tests work yet, most notable the cdrom tests, but those require
changes in libapt itself to have a proper fix and what we have fixed so
far is good enough progress for now.
Git-Dch: Ignore
|
|
This is mostly a small speedup for the testcases, but it is also handy
to document which tests actually deal with a specific hash compared to
those which 'just' need some hash which can be important while adding
new hashes.
Git-Dch: Ignore
|
|
And of course, testing obscure things ends up showing obscure 'bugs' or
better shortcomings/inconsitencies, so lets fix them with the tests.
Git-Dch: Ignore
|
|
Some additional files like 'Contents' are very big and should therefore
kept compressed on the disk, which apt-file did in the past. It also
implemented pdiff patching of these files by un- and recompressing these
files on-the-fly, with this commit we can do the same – but we can do
this in both pdiff patching styles (client and server merging) and
secured by hashes.
Hashes are in so far slightly complicated as we can't compare the hashes
of the compressed files as we might compress them differently than the
server would (different compressor versions, options, …), so we must
compare the hashes of the uncompressed content.
While this commit has changes in public headers, the classes it changes
are marked as hidden, so nobody can use them directly, which means the
ABI break is internal only.
|
|
Again, consistency is the main sellingpoint here, but this way it is now
also easier to explain that some files move through different stages and
lines are printed for them hence multiple times: That is a bit hard to
believe if the number is changing all the time, but now that it keeps
consistent.
|
|
At the moment we only have hashes for the uncompressed pdiff files, but
via the new '$HASH-Download' field in the .diff/Index hashes can be
provided for the .gz compressed pdiff file, which apt will pick up now
and use to verify the download. Now, we "just" need a buy in from the
creators of repositories…
|
|
The rred parser is very accepting regarding 'invalid' files. Given that
we can't trust the input it might be a bit too relaxed. In any case,
checking for more errors can't hurt given that we support only a very
specific subset of ed commands.
|
|
rred is responsible for unpacking and reading the patch files in one go,
but we currently only have hashes for the uncompressed patch files, so
the handler read the entire patch file before dispatching it to the
worker which would read it again – both with an implicit uncompress.
Worse, while the workers operate in parallel the handler is the central
orchestration unit, so having it busy with work means the workers do
(potentially) nothing.
This means rred is working with 'untrusted' data, which is bad. Yet,
having the unpack in the handler meant that the untrusted uncompress was
done as root which isn't better either. Now, we have it at least
contained in a binary which we can harden a bit better. In the long run,
we want hashes for the compressed patch files through to be safe.
|
|
Especially pdiff-enhanced downloads have the tendency to fail for
various reasons from which we can recover and even a successful download
used to leave the old unpatched index in partial/.
By adding a new method responsible for making the transaction of an
individual file happen we can at specialisations especially for abort
cases to deal with the cleanup.
This also helps in keeping the compressed indexes around if another
index failed instead of keeping the decompressed files, which we
wouldn't pick up in the next call.
|
|
If we get a IMSHit for the Transaction-Manager (= the InRelease file or
as its still supported fallback Release + Release.gpg combo) we can
assume that every file we would queue based on this manager, but already
have locally is current and hence would get an IMSHit, too. We therefore
save us and the server the trouble and skip the queuing in this case.
Beside speeding up repetative executions of 'apt-get update' this way we
also avoid hitting hashsum errors if the indexes are in fact already
updated, but the Release file isn't yet as it is the case on well
behaving mirrors as Release files is updated last.
The implementation is a bit harder than the theory makes it sound as we
still have to keep reverifying the Release files (e.g. to detect now expired
once to avoid an attacker being able to silently stale us) and have to
handle cases in which the Release file hits, but some indexes aren't
present (e.g. user added a new foreign architecture).
|
|
We use test{success,failure} now all over the place in the framework, so
its only consequencial to do this in the situations in which we test for
a specific output as well.
Git-Dch: Ignore
|
|
One word: "doh!" Commit f6d4ab9ad8a2cfe52737ab620dd252cf8ceec43d
disabled the check to prevent apt from downloading bigger patches
than the index it tries to patch. Happens rarly of course, but still.
Detected by scan-build complaining about a dead assignment.
To make up for the mistake a test is included as well.
|
|
feature/acq-trans
Conflicts:
apt-pkg/acquire-item.cc
|
|
The fileformat of a pdiff index stores currently only SHA1 hashes. With
this change, we look for all other hashes we support as well and take
what we get, so that we can work after the release of jessie to get
right of SHA1 if we want to.
Note that the completely patched file is and was checked against the
hashes collected from the Release file, so this transition isn't mission
critical.
|
|
|
|
It is not very extensible to have the supported Hashes hardcoded
everywhere and especially if it is part of virtual method names.
It is also possible that a method does not support the 'best' hash
(yet), so we might end up not being able to verify a file even though we
have a common subset of supported hashes. And those are just two of the
cases in which it is handy to have a more dynamic selection.
The downside is that this is a MAJOR API break, but the HashStringList
has a string constructor for compatibility, so with a bit of luck the
few frontends playing with the acquire system directly are okay.
|
|
With APT::Get::List-Cleanup disabled the ed-style patch files are
lingering in the lists/ directory otherwise. That was kinda okay in the
old none-client-merge as the filename was always the same so it was
constantly overridden, but now with different names for client-merge
quiet a few could pill up on the system and are used by the next call
as it picks them up based on the filename.
|
|
Providing the benefits of both without the downsides :)
(ABI breaks or external dependencies)
For this Anthonys rred is equipped with:
- magic-filename-pickup of patches rather than explicit messages
- use of FileFd instead of FILE* to get on-the-fly uncompress
of the gzip compressed pdiff patches
The acquire code in turn stops checking for apt-file's helper
as our own rred is now clever enough for our needs.
|
|
The idea of pdiffs is to avoid downloading the hole file by patching the
existing index. This works very well, but becomes slow if a lot of
patches needs to be applied to reconstruct an up-to-date index and in
recent years more and more dinstall (or similar) runs are executed
creating more and more pdiffs in the same amount of time, so pdiffs
became less useful.
The solution is simple: Reduce the amount of patches (which are very
small) which need to be applied on top of the index we have available
(which is usually pretty big).
This can be done in two ways: Either merge the patches on the
server-side so that the client has to download only one patch or the
patches are all downloaded and merged on the client-side.
The first needs a client who is doing one step at a time who can also
skip patches if it needs (APT supports this for a long time now).
The later is implemented by this commit, but depends on the server NOT
merging the patches and the patches being in a strict order in which no
patch is skipped.
This is traditionally the case for dak, but other repository creators
support merging – e.g. reprepro (which helpfully adds a flag indicating
that the patches are merged). To support both or even mixes a client
needs more information which isn't available for now.
This POC uses the external diffindex-rred included in apt-file to
do the heavy lifting of merging & applying all patches in one pass,
hence to test this feature apt-file needs to be installed.
|
|
Compressing files in 4 different styles eats test-time for no practical
gain if we don't test them explicitly, so default to just building 'gz'
compressed files as it is the simplest compression algorithm supported
Git-Dch: Ignore
|
|
For many commands the output isn't stable (like then dpkg is called) but
the exitcode is, so this helper enhances the common && msgpass ||
msgfail by generating automatically a msgtest and showing the output of
the command in case of failure instead of discarding it unconditionally,
the later being chronic-like behaviour
Git-Dch: Ignore
|
|
Forking only after being ready to accept clients avoids running races
with the tests which sometimes failed on the first 'apt-get update'
(or similar) with the previous background-start and hope for the best…
The commit fixes also some oversight output-order changes in regards to
Description-md5 and (I-M-S) race conditions in various tests.
Git-Dch: Ignore
|
|
|
|
|
|
- operate optional on gzip compressed pdiffs
* apt-pkg/acquire-item.cc:
- don't uncompress downloaded pdiff files before feeding it to rred
|
|
funtions (bash complains)
|
|
- don't use ReadOnlyGzip mode for PDiffs as this mode doesn't work
in combination with the AddFd methods of our hashclasses
Add also 2 testcases: one to test pdiffs in general and
one to test the handling of compressed indexes.
|