Age | Commit message (Collapse) | Author |
|
Try 10 times in a row
|
|
Individual items shouldn't concern themselves with these alternative
locations, we can deal with this more efficiently within the
infrastructure created for other alternative URIs now avoiding the need
to implement this in each item.
|
|
If we got a file but it produced a hash error, mismatched size or
similar we shouldn't fallback to alternative URIs as they likely result
in the same error. If we can we should instead use another mirror.
We used to be a lot stricter by stopping all trys for this file if we
got a non-404 (or a hash-based) failure, but that is too hard as we
really want to try other mirrors (if we have them) in the hope that they
have the expected and correct files.
|
|
The format isn't too hard to get right, but it gets funny with multiline
fields (which we don't really have yet) and its just easier to deal with
it once and for all which can be reused for more messages later.
|
|
Failing on too much data is good, but we can do better by checking for
exact filesizes as we know with hashsums how large a file should be, so
if we get a file which has a size we do not expect we can drop it
directly, regardless of if the file is larger or smaller than what we
expect which should catch most cases which would end up as hashsum
errors later now a lot sooner.
|
|
The comment says this is intended, but looking at the history reveals
that the comment comes from a different era. Nowadays we don't really
need it anymore (and even back then it was disputeable) as we haven't
used that file for our update in the end and nothing really needs this
file after the update.
Triggered is this by 188f297a2af4c15cb1d502360d1e478644b5b810 which
moves various error conditions forward including this code expecting the
file to exist – but it doesn't need to as download could have failed.
We could fix that by simple checking if the file exists and only stage
it if it does, but instead we don't stage it and instead even rename it
out of the way with our conventional FAILED name (if it exists).
That restores support for partial mirrors (= in this case mirrors which
don't ship pdiff files). Note that apt heals itself even if only such a
mirror is used as the update is successful even if that error is shown.
Closes: 869425
|
|
Moving the code responsible for parsing the Index file from ::Done into
the slightly earlier ::VerifyDone allows us to still "fail" the download
if we can't make use of the Index for whatever reason, so that the
progress log correctly displays "Ign" instead of "Get" for the file.
This also makes quiet a few debug messages proper error messages (but
those are still hidden by default for Ign lines).
|
|
Most of them in (old) code comments. The two instances of user visible
string changes the po files of the manpages are fixed up as well.
Gbp-Dch: Ignore
Reported-By: spellintian
|
|
In af81ab9030229b4ce6cbe28f0f0831d4896fda01 by-hash got implemented as a
special compression type for our usual index files like Packages.
Missing in this scheme was the special .diff/Index index file containing
the info about individual patches for this index file. Deriving from the
index file class directly we inherent the compression handling
infrastructure and in this way also by-hash nearly for free.
Closes: #824926
|
|
The https method implemented for a long while now a hardcoded fallback
to the same options in http, which, while it works, is rather inflexible
if we want to allow the methods to use another name to change their
behavior slightly, like apt-transport-tor does to https – most of the
diff being s#https#tor#g which then fails to do the full circle
fallthrough tor -> https -> http for https sources. With this config
infrastructure this could be implemented now.
|
|
If another file in the transaction fails and hence dooms the transaction
we can end in a situation in which a -patched file (= rred writes the
result of the patching to it) remains in the partial/ directory.
The next apt call will perform the rred patching again and write its
result again to the -patched file, but instead of starting with an empty
file as intended it will override the content previously in the file
which has the same result if the new content happens to be longer than
the old content, but if it isn't parts of the old content remain in the
file which will pass verification as the new content written to it
matches the hashes and if the entire transaction passes the file will be
moved the lists/ directory where it might or might not trigger errors
depending on if the old content which remained forms a valid file
together with the new content.
This has no real security implications as no untrusted data is involved:
The old content consists of a base file which passed verification and a
bunch of patches which all passed multiple verifications as well, so the
old content isn't controllable by an attacker and the new one isn't
either (as the new content alone passes verification). So the best an
attacker can do is letting the user run into the same issue as in the
report.
Closes: #831762
|
|
We have this situation in cases were parts of the transaction are
refused (e.g. in a hashsum mismatch) and rerun the update (e.g. in the
hope that we get a mirror which is synced this time).
Previously we would ask the server with an if-range and in the best case
recieve a 416 in response (less featureful server might end up giving us
the entire file again or we get the wrong file this time giving us a
hashsum mismatch…), which is a waste of time if we know already by
checking the hashsums that we got the complete and correct file.
|
|
Queues feeding workers like rred are created in a random pattern to get
a few of them to run in parallel – but if we already have an idling queue
we don't need to assign it to a (potentially new) random queue as that
saves us the (agruably small) overhead of starting up a new queue,
avoids adding jobs to an already busy queue while others idle and as
a bonus reduces the size of debug logs a bit.
We also keep starting new queues now until we reach our limit before
we assign work at random to them, which should give us a more effective
utilisation overall compared to potentially adding work to busy queues
while we haven't reached our queue limit yet.
|
|
With the previous commit we track the state of transactions, so we can
now use our knowledge to avoid processing data for a transaction which
was already closed (via an abort in this case).
This is needed as multiple independent processes are interacting in the
process, so there isn't a simple immediate full-engine stop and it would
also be bad to teach each and every item how to check if its manager
has failed subordinate and what to do in that case.
In the pdiff case, which deals (potentially) with many items during its
lifetime e.g. a hashsum mismatch in another file can abort the
transaction the file we try to patch via pdiff belongs to. This causes
some of the items (which are already done) to be aborted with it, but
items still in the process of acquisition continue in the processing and
will later try to use all the items together failing in strange ways as
cleanup already happened.
The chosen solution is to dry up the communication channels instead by
ignoring new requests for data acquisition, canceling requests which are
not assigned to a queue and not calling Done/Failed on items anymore.
This means that e.g. already started or pending (e.g. pipelined)
downloads aren't stopped and continue as normal for now, but they remain
in partial/ and aren't processed further so the next update command will
pick them up and put them to good use while the current process fails
updating (for this transaction group) in an orderly fashion.
Closes: 817240
Thanks: Barr Detwix & Vincent Lefevre for log files
|
|
The URI descibing an item can change via mirrors/redirectors which
causes the .diff/Index files to get the wrong names in storage.
Git-Dch: Ignore
|
|
Now that we ignore SHA1-only files it makes sense to require also the
provision of hashes for the compressed patches as this was introduced in
the same patchset as support for non-SHA1 hashes in the file itself in
dak and adding support in other archive creators (if they support pdiffs
at all) will likely be in the same batch.
The reason for the change itself is simple: If you are 'scared' enough
about the security of SHA1, you shouldn't uncompress a file you haven't
verified at all – after all, it could be exploiting a bug or a zip bomb.
|
|
Given that we refuse to use SHA1-only .diff/Indexes no point in shipping
and running code which pretends to check support for it which given that
all these tests are run 3 times eats a noticeable amount of time.
Git-Dch: Ignore
|
|
Ensure that .diff/Index files that only contain SHA1 values and no
SHA2 values are not used.
|
|
If a single pdiff fails, we have to fail the entire patching endeavour
and fall back to getting the complete file instead. That is easy in
serverside merged pdiffs as we get them one by one. For clientside we
get them all at once through, which means that a failure in one has to
stop the entire pipeline, which works as expected (as proven by the
bugreporters as they don't even notice it happening). The problem is
just that the first failing pdiff will do the cleanup, so another pdiff
which happens to be successfully acquired after we processed the failure
doesn't find the file it is supposed to use as a basename anymore, so
the patch is renamed to what should be the unique extension and moved
into the current working directory. Processing is then stopped as the
patch realizes that it isn't the last one which completed downloading.
On the plus side this means this is neither us using a bad temporary
location nor a security problem. It "just" overrides unconditionally
files in your current working directory (if you happen to have them
named like a pdiff patch – a bit unlikely perhaps) and so drops files
there which are never used again.
I guess this was introduced in 4e3c5633b1e74b4f58b95f339cfbbf4cbf21ab3e
for real as I made the need for the existence of the base file rather
explicit, but the potential lingers in the code for far longer.
Closes: #816837
|
|
The code already deals with compressed leftovers, but forgot the
uncompressed files. The opertunity is picked to reorder this code and
add debug messages about the actions taken as well as produce such a
leftover file in the associated testcase.
|
|
With the addition of the $HASH-Download field in the .diff/Index we got
the size of the compressed patches for 'free', so if that information is
available we can use it for a more fitting calculation of the size
requirements of the patches vs. the complete file.
Note that this predicts a too small size in the transition case in which
the information isn't available for all patches, but figuring this out
would be a lot of code for practically nothing as only one update can
ever be in such a transition phase.
|
|
Downloading and storing are two different operations were different
compression types can be preferred. For downloading we provide the
choice via Acquire::CompressionTypes::Order as there is a choice to
be made between download size and speed – and limited by whats available
in the repository.
Storage on the other hand has all compressions currently supported by
apt available and to reduce runtime of tools accessing these files the
compression type should be a low-cost format in terms of decompression.
apt traditionally stores its indexes uncompressed on disk, but has
options to keep them compressed. Now that apt downloads additional files
we also deal with files which simply can't be stored uncompressed as
they are just too big (like Contents for apt-file). Traditionally they
are downloaded in a low-cost format (gz) as repositories do not provide
other formats, but there might be even lower-cost formats and for
download we could introduce higher-cost in the repositories.
Downloading an entire index potentially requires recompression to
another format, so an update takes potentially longer – but big files
are usually updated via pdiffs which has to de- and re-compress anyhow
and does it on the fly anyhow, so there is no extra time needed and in
general it seems to be benefitial to invest the time in update to save
time later on file access.
|
|
There is no reason to enforce that the file we start the bootstrap with
is compressed with a compressor which is available online. This allows
us to change the on-disk format as well as deals with repositories
adding/removing support for a specific compressor.
|
|
This doesn't allow all tests to run cleanly, but it at least allows to
write tests which could run successfully in such environments.
Git-Dch: Ignore
|
|
Not all tests work yet, most notable the cdrom tests, but those require
changes in libapt itself to have a proper fix and what we have fixed so
far is good enough progress for now.
Git-Dch: Ignore
|
|
This is mostly a small speedup for the testcases, but it is also handy
to document which tests actually deal with a specific hash compared to
those which 'just' need some hash which can be important while adding
new hashes.
Git-Dch: Ignore
|
|
And of course, testing obscure things ends up showing obscure 'bugs' or
better shortcomings/inconsitencies, so lets fix them with the tests.
Git-Dch: Ignore
|
|
Some additional files like 'Contents' are very big and should therefore
kept compressed on the disk, which apt-file did in the past. It also
implemented pdiff patching of these files by un- and recompressing these
files on-the-fly, with this commit we can do the same – but we can do
this in both pdiff patching styles (client and server merging) and
secured by hashes.
Hashes are in so far slightly complicated as we can't compare the hashes
of the compressed files as we might compress them differently than the
server would (different compressor versions, options, …), so we must
compare the hashes of the uncompressed content.
While this commit has changes in public headers, the classes it changes
are marked as hidden, so nobody can use them directly, which means the
ABI break is internal only.
|
|
Again, consistency is the main sellingpoint here, but this way it is now
also easier to explain that some files move through different stages and
lines are printed for them hence multiple times: That is a bit hard to
believe if the number is changing all the time, but now that it keeps
consistent.
|
|
At the moment we only have hashes for the uncompressed pdiff files, but
via the new '$HASH-Download' field in the .diff/Index hashes can be
provided for the .gz compressed pdiff file, which apt will pick up now
and use to verify the download. Now, we "just" need a buy in from the
creators of repositories…
|
|
The rred parser is very accepting regarding 'invalid' files. Given that
we can't trust the input it might be a bit too relaxed. In any case,
checking for more errors can't hurt given that we support only a very
specific subset of ed commands.
|
|
rred is responsible for unpacking and reading the patch files in one go,
but we currently only have hashes for the uncompressed patch files, so
the handler read the entire patch file before dispatching it to the
worker which would read it again – both with an implicit uncompress.
Worse, while the workers operate in parallel the handler is the central
orchestration unit, so having it busy with work means the workers do
(potentially) nothing.
This means rred is working with 'untrusted' data, which is bad. Yet,
having the unpack in the handler meant that the untrusted uncompress was
done as root which isn't better either. Now, we have it at least
contained in a binary which we can harden a bit better. In the long run,
we want hashes for the compressed patch files through to be safe.
|
|
Especially pdiff-enhanced downloads have the tendency to fail for
various reasons from which we can recover and even a successful download
used to leave the old unpatched index in partial/.
By adding a new method responsible for making the transaction of an
individual file happen we can at specialisations especially for abort
cases to deal with the cleanup.
This also helps in keeping the compressed indexes around if another
index failed instead of keeping the decompressed files, which we
wouldn't pick up in the next call.
|
|
If we get a IMSHit for the Transaction-Manager (= the InRelease file or
as its still supported fallback Release + Release.gpg combo) we can
assume that every file we would queue based on this manager, but already
have locally is current and hence would get an IMSHit, too. We therefore
save us and the server the trouble and skip the queuing in this case.
Beside speeding up repetative executions of 'apt-get update' this way we
also avoid hitting hashsum errors if the indexes are in fact already
updated, but the Release file isn't yet as it is the case on well
behaving mirrors as Release files is updated last.
The implementation is a bit harder than the theory makes it sound as we
still have to keep reverifying the Release files (e.g. to detect now expired
once to avoid an attacker being able to silently stale us) and have to
handle cases in which the Release file hits, but some indexes aren't
present (e.g. user added a new foreign architecture).
|
|
We use test{success,failure} now all over the place in the framework, so
its only consequencial to do this in the situations in which we test for
a specific output as well.
Git-Dch: Ignore
|
|
One word: "doh!" Commit f6d4ab9ad8a2cfe52737ab620dd252cf8ceec43d
disabled the check to prevent apt from downloading bigger patches
than the index it tries to patch. Happens rarly of course, but still.
Detected by scan-build complaining about a dead assignment.
To make up for the mistake a test is included as well.
|
|
feature/acq-trans
Conflicts:
apt-pkg/acquire-item.cc
|
|
The fileformat of a pdiff index stores currently only SHA1 hashes. With
this change, we look for all other hashes we support as well and take
what we get, so that we can work after the release of jessie to get
right of SHA1 if we want to.
Note that the completely patched file is and was checked against the
hashes collected from the Release file, so this transition isn't mission
critical.
|
|
|
|
It is not very extensible to have the supported Hashes hardcoded
everywhere and especially if it is part of virtual method names.
It is also possible that a method does not support the 'best' hash
(yet), so we might end up not being able to verify a file even though we
have a common subset of supported hashes. And those are just two of the
cases in which it is handy to have a more dynamic selection.
The downside is that this is a MAJOR API break, but the HashStringList
has a string constructor for compatibility, so with a bit of luck the
few frontends playing with the acquire system directly are okay.
|
|
With APT::Get::List-Cleanup disabled the ed-style patch files are
lingering in the lists/ directory otherwise. That was kinda okay in the
old none-client-merge as the filename was always the same so it was
constantly overridden, but now with different names for client-merge
quiet a few could pill up on the system and are used by the next call
as it picks them up based on the filename.
|
|
Providing the benefits of both without the downsides :)
(ABI breaks or external dependencies)
For this Anthonys rred is equipped with:
- magic-filename-pickup of patches rather than explicit messages
- use of FileFd instead of FILE* to get on-the-fly uncompress
of the gzip compressed pdiff patches
The acquire code in turn stops checking for apt-file's helper
as our own rred is now clever enough for our needs.
|
|
The idea of pdiffs is to avoid downloading the hole file by patching the
existing index. This works very well, but becomes slow if a lot of
patches needs to be applied to reconstruct an up-to-date index and in
recent years more and more dinstall (or similar) runs are executed
creating more and more pdiffs in the same amount of time, so pdiffs
became less useful.
The solution is simple: Reduce the amount of patches (which are very
small) which need to be applied on top of the index we have available
(which is usually pretty big).
This can be done in two ways: Either merge the patches on the
server-side so that the client has to download only one patch or the
patches are all downloaded and merged on the client-side.
The first needs a client who is doing one step at a time who can also
skip patches if it needs (APT supports this for a long time now).
The later is implemented by this commit, but depends on the server NOT
merging the patches and the patches being in a strict order in which no
patch is skipped.
This is traditionally the case for dak, but other repository creators
support merging – e.g. reprepro (which helpfully adds a flag indicating
that the patches are merged). To support both or even mixes a client
needs more information which isn't available for now.
This POC uses the external diffindex-rred included in apt-file to
do the heavy lifting of merging & applying all patches in one pass,
hence to test this feature apt-file needs to be installed.
|
|
Compressing files in 4 different styles eats test-time for no practical
gain if we don't test them explicitly, so default to just building 'gz'
compressed files as it is the simplest compression algorithm supported
Git-Dch: Ignore
|
|
For many commands the output isn't stable (like then dpkg is called) but
the exitcode is, so this helper enhances the common && msgpass ||
msgfail by generating automatically a msgtest and showing the output of
the command in case of failure instead of discarding it unconditionally,
the later being chronic-like behaviour
Git-Dch: Ignore
|
|
Forking only after being ready to accept clients avoids running races
with the tests which sometimes failed on the first 'apt-get update'
(or similar) with the previous background-start and hope for the best…
The commit fixes also some oversight output-order changes in regards to
Description-md5 and (I-M-S) race conditions in various tests.
Git-Dch: Ignore
|
|
|
|
|
|
- operate optional on gzip compressed pdiffs
* apt-pkg/acquire-item.cc:
- don't uncompress downloaded pdiff files before feeding it to rred
|
|
funtions (bash complains)
|