Age | Commit message (Collapse) | Author |
|
We have quite a bit of metadata available for the files we acquire, but
the methods weren't told about it and got just the URI. That is indeed
fine for most, but to avoid methods trying to parse the metadata out of
the provided URIs (and fail horribly in edgecases) we can just as well
be nice and tell them stuff directly.
|
|
Moving the Retry-implementation from individual items to the worker
implementation not only gives every file retry capability instead of
just a selected few but also avoids needing to implement it in each item
(incorrectly).
|
|
This makes it easier to see which headers includes what.
The changes were done by running
git grep -l '#\s*include' \
| grep -E '.(cc|h)$' \
| xargs sed -i -E 's/(^\s*)#(\s*)include/\1#\2 include/'
To modify all include lines by adding a space, and then running
./git-clang-format.sh.
|
|
Including cacheiterators.h before pkgcache.h fails because
pkgcache.h depends on cacheiterators.h.
|
|
Moving the code responsible for parsing the Index file from ::Done into
the slightly earlier ::VerifyDone allows us to still "fail" the download
if we can't make use of the Index for whatever reason, so that the
progress log correctly displays "Ign" instead of "Get" for the file.
This also makes quiet a few debug messages proper error messages (but
those are still hidden by default for Ign lines).
|
|
Most of them in (old) code comments. The two instances of user visible
string changes the po files of the manpages are fixed up as well.
Gbp-Dch: Ignore
Reported-By: spellintian
|
|
In ad9416611ab83f7799f2dcb4bf7f3ef30e9fe6f8 we fall back to asking the
original mirror (e.g. a redirector) if we do not get the expected
result. This works for the indexes, but patches are a different beast
and much simpler. Adding this fallback code here seems like overkill as
they are usually right along their Index file, so actually forward the
relevant settings to the patch items which fixes pdiff support combined
with a redirector and partial mirrors as in such a situation the pdiff
patches would be 404 and the complete index would be downloaded.
|
|
Employ a priority queue instead of a normal queue to hold
the items; and only add items to the running pipeline if
their priority is the same or higher than the priority
of items in the queue.
The priorities are designed for a 3 stage pipeline system:
In stage 1, all Release files and .diff/Index files are fetched. This
allows us to determine what files remain to be fetched, and thus
ensures a usable progress reporting.
In stage 2, all Pdiff patches are fetched, so we can apply them
in parallel with fetching other files in stage 3.
In stage 3, all other files are fetched (complete index files
such as Contents, Packages).
Performance improvements, mainly from fetching the pdiff patches
before complete files, so they can be applied in parallel:
For the 01 Sep 2016 03:35:23 UTC -> 02 Sep 2016 09:25:37 update
of Debian unstable and testing with Contents and appstream for
amd64 and i386, update time reduced from 37 seconds to 24-28
seconds.
Previously, apt would first download new DEP11 icon tarballs
and metadata files, causing the CPU to be idle. By fetching
the diffs in stage 2, we can now patch our contents and Packages
files while we are downloading the DEP11 stuff.
|
|
In af81ab9030229b4ce6cbe28f0f0831d4896fda01 by-hash got implemented as a
special compression type for our usual index files like Packages.
Missing in this scheme was the special .diff/Index index file containing
the info about individual patches for this index file. Deriving from the
index file class directly we inherent the compression handling
infrastructure and in this way also by-hash nearly for free.
Closes: #824926
|
|
Having the detection handled in specific (http) workers means that a
redirection loop over different hostnames isn't detected. Its also not a
good idea have this implement in each method independently even if it
would work
|
|
Weak had no dedicated option before and Insecure and Downgrade were both
global options, which given the effect they all have on security is
rather bad. Setting them for individual repositories only isn't great
but at least slightly better and also more consistent with other
settings for repositories.
|
|
We don't have to initialize the Release files with a set of IndexTargets
to acquire, but instead wait for the Release file to be acquired and
only then ask which IndexTargets to get.
Git-Dch: Ignore
|
|
Progress reporting used an "upper bound" on files we might get, expect
that this wasn't correct in case pdiff entered the picture. So instead
of calculating a value which is perhaps incorrect, we just accept that
we can't tell how many files we are going to download and just keep at
0% until we know. Additionally, if we have pdiffs we wait until we got
these (sub)index files, too.
That could all be done better by downloading all Release files first and
planing with them in hand accordingly, but one step at a time.
|
|
The code naturally evolved from a TransactionManager optional to a
required setup which resulted in various places doing unneeded checks
suggesting a more complicated setup than is actually needed.
Git-Dch: Ignore
|
|
Redirection services like httpredir.debian.org tend to use a set of
mirrors from which they pick a mirror at "random" for each requested
file, which is usually benefitial for the download of debs, but for the
index files this can quickly cause problems (aka hashsum mismatches) if
the two (or more) mirrors involved are only slightly out-of-sync.
This commit "resolves" this issue by using the mirror we ended up using
to get the (signed) Release file directly to get the index files
belonging to this Release file instead of asking the redirection
service which eliminates the risk of hitting out-of-sync mirrors.
As an obvious downside the redirection service can't serve partial
mirrors anymore for indexes and the download of indexes indexed in the
same Release file can't be done in parallel (from different mirrors).
This does not effect the download of non-index files like deb-files as
out-of-sync mirrors aren't a huge problem there, so the parallel
download outweights a potentially 404 error (also because this causes no
errenous downloads while hashsum mismatches download the entire file
before finding out that it was pointless).
The rational for this is that indexes are relative to the Release file.
If we would be talking about a HTML page including images, such a
behaviour is obvious and intended – not doing it means in the best case
a bunch of "useless" requests which will all be answered with a
redirect.
|
|
Users tend to report these errors with just this error message… not very
actionable and hard to figure out if this is a temporary or 'permanent'
mirror-sync issue or even the occasional apt bug.
Showing the involved hashsums and modification times should help in
triaging these kind of bugs – and eventually we will have less of them
via by-hash.
The subheaders aren't marked for translation for now as they are
technical glibberish and probably easier to deal with if not translated.
After all, our iconic "Hash Sum mismatch" is translated at least.
These additions were proposed in #817240 by Peter Palfrader.
|
|
Calling the (non-existent) reporter multiple times for the same error
with different codes for the same error (e.g. hashsum) is a bit strange.
It also doesn't need to be a public API. Ideally that would all look and
behave slightly different, but we will worry about that at the time this
is actually (planed to be) used somewhere…
Git-Dch: Ignore
|
|
We want to keep track of the state of a transaction overall to base
future decisions on it, but as a pre-requirement we have to make sure
that a transaction isn't commited twice (which happened if the download
of InRelease failed and Release takes over).
It also happened to create empty commits after a transaction was already
aborted in cases in which the Release files were rejected.
This isn't effecting security at the moment, but to ensure this isn't
happening again and can never be bad a bunch of fatal error messages are
added to make regressions on this front visible.
|
|
pkgAcqChangelog has the default behaviour of downloading a changelog to
a temporary directory (inside /tmp, not /tmp directly), which is cleaned
up on shutdown, but this can be overridden to store the changelog more
permanently – but that caries a permission problem.
For changelog we can 'easily' solve this by always downloading to a
temporary directory and only move it out of there on done.
|
|
Downloading and storing are two different operations were different
compression types can be preferred. For downloading we provide the
choice via Acquire::CompressionTypes::Order as there is a choice to
be made between download size and speed – and limited by whats available
in the repository.
Storage on the other hand has all compressions currently supported by
apt available and to reduce runtime of tools accessing these files the
compression type should be a low-cost format in terms of decompression.
apt traditionally stores its indexes uncompressed on disk, but has
options to keep them compressed. Now that apt downloads additional files
we also deal with files which simply can't be stored uncompressed as
they are just too big (like Contents for apt-file). Traditionally they
are downloaded in a low-cost format (gz) as repositories do not provide
other formats, but there might be even lower-cost formats and for
download we could introduce higher-cost in the repositories.
Downloading an entire index potentially requires recompression to
another format, so an update takes potentially longer – but big files
are usually updated via pdiffs which has to de- and re-compress anyhow
and does it on the fly anyhow, so there is no extra time needed and in
general it seems to be benefitial to invest the time in update to save
time later on file access.
|
|
There is no reason to enforce that the file we start the bootstrap with
is compressed with a compressor which is available online. This allows
us to change the on-disk format as well as deals with repositories
adding/removing support for a specific compressor.
|
|
This should make it more obvious that CHANGEPATH is a placeholder which
apt will replace with a package specific path rather than a string
constant.
Mail-Reference: <87d1upgvaf.fsf@deep-thought.43-1.org>
Mail-Archive: https://lists.debian.org/debian-dak/2015/12/msg00005.html
|
|
Git-Dch: Ignore
|
|
Some additional files like 'Contents' are very big and should therefore
kept compressed on the disk, which apt-file did in the past. It also
implemented pdiff patching of these files by un- and recompressing these
files on-the-fly, with this commit we can do the same – but we can do
this in both pdiff patching styles (client and server merging) and
secured by hashes.
Hashes are in so far slightly complicated as we can't compare the hashes
of the compressed files as we might compress them differently than the
server would (different compressor versions, options, …), so we must
compare the hashes of the uncompressed content.
While this commit has changes in public headers, the classes it changes
are marked as hidden, so nobody can use them directly, which means the
ABI break is internal only.
|
|
Reporting errors from Done() is bad for progress reporting and such, so
factoring this out is a good idea and we start with moving the supposed-
to-be clearsigned file isn't clearsigned out first – improving the error
message in the process as we use the same message for a similar case
(NODATA) as this is what I have to look at with the venue wifi at
DebCamp and the old errormessage doesn't really say anything.
|
|
C++11 adds the 'override' specifier to mark that a method is overriding
a base class method and error out if not. We hide it in the APT_OVERRIDE
macro to ensure that we keep compiling in pre-c++11 standards.
Reported-By: clang-modernize -add-override -override-macros
Git-Dch: Ignore
|
|
History suggests that this comes from an earlier apt-secure
implementation, but never really became a thing, totally unused and
marked as deprecated for "ages" now. Especially as it did nothing even
if it would have been used (libapt itself didn't use it at all).
|
|
Limits which key(s) can be used to sign a repository. Not immensely useful
from a security perspective all by itself, but if the user has
additional measures in place to confine a repository (like pinning) an
attacker who gets the key for such a repository is limited to its
potential and can't use the key to sign its attacks for an other (maybe
less limited) repository… (yes, this is as weak as it sounds, but having
the capability might come in handy for implementing other stuff later).
|
|
indexRecords was used to parse the Release file – mostly the hashes –
while metaIndex deals with downloading the Release file, storing all
indexes coming from this release and … parsing the Release file, but
this time mostly for the other fields.
That wasn't a problem in metaIndex as this was done in the type specific
subclass, but indexRecords while allowing to override the parsing method
did expect by default a specific format.
APT isn't really supporting different types at the moment, but this is
a violation of the abstraction we have everywhere else and, which is the
actual reason for this merge: Options e.g. coming from the sources.list
come to metaIndex naturally, which needs to wrap them up and bring them
into indexRecords, so the acquire system is told about it as they don't
get to see the metaIndex, but they don't really belong in indexRecords
as this is just for storing data loaded from the Release file… the
result is a complete mess.
I am not saying it is a lot prettier after the merge, but at least
adding new options is now slightly easier and there is just one place
responsible for parsing the Release file. That can't hurt.
|
|
Various small leaks here and there. Nothing particularily big, but still
good to fix. Found by the sanitizers while running our testcases.
Reported-By: gcc -fsanitize
Git-Dch: Ignore
|
|
Doing this disables the implicit copy assignment operator (among others)
which would cause hovac if used on the classes as it would just copy the
pointer, not the data the d-pointer points to. For most of the classes
we don't need a copy assignment operator anyway and in many classes it
was broken before as many contain a pointer of some sort.
Only for our Cacheset Container interfaces we define an explicit copy
assignment operator which could later be implemented to copy the data
from one d-pointer to the other if we need it.
Git-Dch: Ignore
|
|
Some of them modify the ABI, but given that we prepare a big one
already, these few hardly count for much.
Git-Dch: Ignore
|
|
To have a chance to keep the ABI for a while we need all three to team
up. One of them missing and we might loose, so ensuring that they are
available is a very tedious but needed task once in a while.
Git-Dch: Ignore
|
|
This is an unlikely event for indexes and co, but it can happen quiet
easily e.g. for changelogs where you want to get the changelogs for
multiple binary package(version)s which happen to all be built from a
single source.
The interesting part is that the Acquire system actually detected this
already and set the item requesting the URI again to StatDone - expect
that this is hardly sufficient: an Item must be Complete=true as well
to be considered truely done and that is only the tip of the ::Done
handling iceberg. So instead of this StatDone hack we allow QItems to be
owned by multiple items and notify all owners about everything now,
so that for the point of each item they got it downloaded just for them.
|
|
Provided is a specialized acquire item which given a version can figure
out the correct URI to try by itself and if not provides an error
message alongside with static methods to get just the URI it would try
to download if it should just be displayed or similar such.
The URI is constructed as follows:
Release files can provide an URI template in the "Changelogs" field,
otherwise we lookup a configuration item based on the "Label" or
"Origin" of the Release file to get a (hopefully known) default value
for now. This template should contain the string CHANGEPATH which is
replaced with the information about the version we want the changelog
for (e.g. main/a/apt/apt_1.1). This middleway was choosen as this path
part was consistent over the three known implementations (+1 defunct),
while the rest of the URI varies widely between them.
The benefit of this construct is that it is now easy to get changelogs
for Debian packages on Ubuntu and vice versa – even at the moment where
the Changelogs field is present nowhere. Strictly better than what
apt-get had before as it would even fail to get changelogs from
security… Now it will notice that security identifies as Origin: Debian
and pick this setting (assuming again that no Changelogs field exists).
If on the other hand security would ship its changelogs in a different
location we could set it via the Label option overruling Origin.
Closes: 687147, 739854, 784027, 787190
|
|
Removes a bunch of duplicated code in the deb-specific parts. Especially
the Description part is now handled centrally by IndexTarget instead of
being duplicated to the derivations of IndexFile.
Git-Dch: Ignore
|
|
Creating and passing around a bunch of pointers of IndexTargets (and of
a vector of pointers of IndexTargets) is probably done to avoid the
'costly' copy of container, but we are really not in a timecritical
operation here and move semantics will help us even further in the
future. On the other hand we never do a proper cleanup of these
pointers, which is very dirty, even if structures aren't that big…
The changes will effecting many items only effect our own hidden class,
so we can do that without fearing breaking interfaces or anything.
Git-Dch: Ignore
|
|
We still need an API for the targets, so slowly prepare the IndexTargets
to let them take this job.
Git-Dch: Ignore
|
|
The code requires every index file we download to have a Package field,
but that doesn't hold true for all index we might want to download in
the future. Some might not even be deb822 formatted files…
The check was needed as apt used to accept unverifiable files like
Translation-*, but nowadays it requires hashes for these as well. Even
for unsigned repositories we interpret the Release file as binding now,
which means this check isn't triggerable expect for repositories which
do not have a Release file at all – something which is highly discouraged!
Git-Dch: Ignore
|
|
Its a bit unclean to create an item just to let the item decide that it
can't do anything and let it fail, so instead we let the item creator
decide in all cases if patching should be attempted.
Also pulls a small trick to get the hashes for the current file without
calculating them by looking at the 'old' Release file if we have it.
Git-Dch: Ignore
|
|
At the moment we only have hashes for the uncompressed pdiff files, but
via the new '$HASH-Download' field in the .diff/Index hashes can be
provided for the .gz compressed pdiff file, which apt will pick up now
and use to verify the download. Now, we "just" need a buy in from the
creators of repositories…
|
|
rred is responsible for unpacking and reading the patch files in one go,
but we currently only have hashes for the uncompressed patch files, so
the handler read the entire patch file before dispatching it to the
worker which would read it again – both with an implicit uncompress.
Worse, while the workers operate in parallel the handler is the central
orchestration unit, so having it busy with work means the workers do
(potentially) nothing.
This means rred is working with 'untrusted' data, which is bad. Yet,
having the unpack in the handler meant that the untrusted uncompress was
done as root which isn't better either. Now, we have it at least
contained in a binary which we can harden a bit better. In the long run,
we want hashes for the compressed patch files through to be safe.
|
|
Having every item having its own code to verify the file(s) it handles
is an errorprune process and easy to break, especially if items move
through various stages (download, uncompress, patching, …). With a giant
rework we centralize (most of) the verification to have a better
enforcement rate and (hopefully) less chance for bugs, but it breaks the
ABI bigtime in exchange – and as we break it anyway, it is broken even
harder.
It shouldn't effect most frontends as they don't deal with the acquire
system at all or implement their own items, but some do and will need to
be patched (might be an opportunity to use apt on-board material).
The theory is simple: Items implement methods to decide if hashes need to
be checked (in this stage) and to return the expected hashes for this
item (in this stage). The verification itself is done in worker message
passing which has the benefit that a hashsum error is now a proper error
for the acquire system rather than a Done() which is later revised to a
Failed().
|
|
Valid-Until protects us from long-living downgrade attacks, but not all
repositories have it and an attacker could still use older but still
valid files to downgrade us. While this makes it sounds like a security
improvement now, its a bit theoretical at best as an attacker with
capabilities to pull this off could just as well always keep us days
(but in the valid period) behind and always knows which state we have,
as we tell him with the If-Modified-Since header. This is also why this
is 'silently' ignored and treated as an IMSHit rather than screamed at
the user as this can at best be an annoyance for attackers.
An error here would 'regularily' be encountered by users by out-of-sync
mirrors serving a single run (e.g. load balancer) or in two consecutive
runs on the other hand, so it would just help teaching people ignore it.
That said, most of the code churn is caused by enforcing this additional
requirement. Crisscross from InRelease to Release.gpg is e.g. very
unlikely in practice, but if we would ignore it an attacker could
sidestep it this way.
|
|
Not all servers we are talking to support If-Modified-Since and some are
not even sending Last-Modified for us, so in an effort to detect such
hits we run a hashsum check on the 'old' compared to the 'new' file, we
got the hashes for the 'new' already for "free" from the methods anyway
and hence just need to calculated the old ones.
This allows us to detect hits even with unsupported servers, which in
turn means we benefit from all the new hit behavior also here.
|
|
Especially pdiff-enhanced downloads have the tendency to fail for
various reasons from which we can recover and even a successful download
used to leave the old unpatched index in partial/.
By adding a new method responsible for making the transaction of an
individual file happen we can at specialisations especially for abort
cases to deal with the cleanup.
This also helps in keeping the compressed indexes around if another
index failed instead of keeping the decompressed files, which we
wouldn't pick up in the next call.
|
|
If we get a IMSHit for the Transaction-Manager (= the InRelease file or
as its still supported fallback Release + Release.gpg combo) we can
assume that every file we would queue based on this manager, but already
have locally is current and hence would get an IMSHit, too. We therefore
save us and the server the trouble and skip the queuing in this case.
Beside speeding up repetative executions of 'apt-get update' this way we
also avoid hitting hashsum errors if the indexes are in fact already
updated, but the Release file isn't yet as it is the case on well
behaving mirrors as Release files is updated last.
The implementation is a bit harder than the theory makes it sound as we
still have to keep reverifying the Release files (e.g. to detect now expired
once to avoid an attacker being able to silently stale us) and have to
handle cases in which the Release file hits, but some indexes aren't
present (e.g. user added a new foreign architecture).
|
|
Calculating the final name of an item which it will have after
everything is done and verified successfully is suprisingly complicated
as while they all follow a simple pattern, the URI and where it is
stored varies between the items.
With some (abibreaking) redesign we can abstract this similar to how it
is already down for the partial file location.
Git-Dch: Ignore
|
|
Git-Dch: Ignore
|
|
We have a bunch of classes which are of no use for the outside world,
but were still exported and so needed to preserve ABI/API. Marking them
as hidden to not export them any longer is a big API break in theory,
but in practice nobody is using them – as if they would its a bug.
|