Age | Commit message (Collapse) | Author |
|
Reference mail:
https://lists.debian.org/debian-l10n-english/2015/11/msg00006.html
|
|
In 8d041b4f we made apt figure out based on the last Release file it has
if it should request a file or not given that the hashes changed or not.
So if we have a last Release file and do a request, do not sent a
Last-Modified header as we expect a change so much that a non-change
would indeed be an error. The Last-Modified header is therefore at best
ignored by the server, so sending it is just wasted effort. In the worst
case as time is a fragile thing the server decides against sending us an
update with the idea that we already have the latest content, which we
know for a fact that we haven't. Given that we sent less information to
the server our request is on its own also less identifiable as coming
from a returning or new user.
The disadvantage is that if we end up getting an old index file after
getting a new Release file from another mirror the old mirror will not
be able to tell us 'Hit', but instead sends us the complete file we
discard, but both lets us end up with the same error class in the end,
so the difference isn't big in practice.
|
|
This 'ignores' the component Release files you can find in Debian
alongside the binary-* directories, which isn't exactly a common
usecase, but it worked before, so lets support it again as this isn't
worse than a valid Release file which is unsigned.
Git-Dch: Ignore
|
|
Reported-By: cppcheck
Git-Dch: Ignore
|
|
Unlinking /dev/null is bad, we shouldn't do that. Also, we should print
at least a warning if we tried to unlink a file but didn't manage to
pull it of (ignoring the case were the file is /dev/null or doesn't
exist in the first place).
This got triggered by a relatively unlikely to cause problem in
pkgAcquire::Worker::PrepareFiles which would while temporary
uncompressed files (which are set to keep compressed) figure out that to
files are the same and prepare for sharing by deleting them. Bad move.
That also shows why not printing a warning is a bad idea as this hide
the error for in non-root test runs.
Git-Dch: Ignore
|
|
Based on a discussion with Niels Thykier who asked for Contents-all this
implements apt trying for all architecture dependent files to get a file
for the architecture all, which is treated internally now as an official
architecture which is always around (like native). This way arch:all
data can be shared instead of duplicated for each architecture requiring
the user to download the same information again and again.
There is one problem however: In Debian there is already a binary-all/
Packages file, but the binary-any files still include arch:all packages,
so that downloading this file now would be a waste of time, bandwidth
and diskspace. We therefore need a way to decide if it makes sense to
download the all file for Packages in Debian or not. The obvious answer
would be a special flag in the Release file indicating this, which would
need to default to 'no' and every reasonable repository would override
it to 'yes' in a few years time, but the flag would be there "forever".
Looking closer at a Release file we see the field "Architectures", which
doesn't include 'all' at the moment. With the idea outlined above that
'all' is a "proper" architecture now, we interpret this field as being
authoritative in declaring which architectures are supported by this
repository. If it says 'all', apt will try to get all, if not it will be
skipped. This gives us another interesting feature: If I configure a
source to download armel and mips, but it declares it supports only
armel apt will now print a notice saying as much. Previously this was a
very cryptic failure. If on the other hand the repository supports mips,
too, but for some reason doesn't ship mips packages at the moment, this
'missing' file is silently ignored (= that is the same as the repository
including an empty file).
The Architectures field isn't mandatory through, so if it isn't there,
we assume that every architecture is supported by this repository, which
skips the arch:all if not listed in the release file.
|
|
Removals in the acquire progress can be pretty important, so a failure
should be silently ignored, so we wrap our unlink call in a slightly
more forgiving wrapper checking things.
Git-Dch: Ignore
|
|
Git-Dch: Ignore
|
|
The manpage is also slightly updated to work better as a central hub to
push people from all angles into the right directions without writting a
book disguised as an error message.
|
|
Insecure (aka unsigned) repositories are bad, period. We want to get
right of them finally and as a first step we are printing scary
warnings. This is already done, this commit just changes the messages to
be more consistent and prevents them from being displayed if
authenticity is guaranteed some other way (as indicated with
trusted=yes).
The idea is to first print the pure fact like "repository isn't signed"
as a warning (and later as an error), while giving an explaination in a
immediately following notice (which is displayed only in quiet level 0:
so in interactive use, not in scripts and alike).
Closes: 796549
|
|
Commit 653ef26c70dc9c0e2cbfdd4e79117876bb63e87d broke the camels back in
sofar that everything works in terms of our internal use of copy:/, but
external use is completely destroyed. This is kinda the reverse of what
happened in "parallel" in the sid branch, where external use was mostly
fine, internal and external exploded on the GzipIndexes option.
We fix this now by rewriting our internal use by letting copy:/ only do
what the name suggests it does: Copy files and not uncompress them
on-the-fly. Then we teach copy and the uncompressors how to deal with
/dev/null and use it as destination file in case we don't want to store
the uncompressed files on disk.
Closes: 799158
|
|
We uses a small trick to implement the fallback: We make it so, that
by-hash is a special compression algorithm and apt already knows how to
deal with fallback between compression algorithms.
The drawback with implementing this fallback is that a) we are guessing
again and more importantly b) by-hash is only tried for the first
compression algorithm we want to acquire, not for all as before – but
flipping between by-hash and well-known for each compression algorithm
seems to be not really worth it as it seems unlikely that there will
actually be mirrors who only mirror a subset of compressioned files, but
have by-hash enabled.
The user-experience is the usual fallback one: You see "Ign" lines in
the apt update output. The fallback is implemented as a transition
feature, so a (potentially huge) mirror network doesn't need a flagday.
It is not meant as a "someday we might" or "we don't, but some of our
mirrors might" option – we want to cut down on the 'Ign' lines front so
that they become meaningful – if we wanted to spam everyone with them, we
could enable by-hash by default for all repositories…
sources.list and config options are better suited for this.
Closes: 798919
|
|
This changes the semantics of the option (which is renamed too) to be a
yes/no value with the special additional value "force" as this allows
by-hash to be disabled even if the repository indicates it would be
supported and is more in line with our other yes/no options like pdiff
which disable themselves if no support can be detected.
The feature wasn't documented so far and hasn't reached a (un)stable
release yet, so changing it without trying too hard to keep
compatibility seems okay.
|
|
Our error reporting is historically grown into some kind of mess.
A while ago I implemented stacking for the global error which is used in
this commit now to wrap calls to functions which do not report (all)
errors via return, so that only failures in those calls cause a failure
to propergate down the chain rather than failing if anything
(potentially totally unrelated) has failed at some point in the past.
This way we can avoid stopping the entire acquire process just because a
single source produced an error for example. It also means that after
the acquire process the cache is generated – even if the acquire
process had failures – as we still have the old good data around we can and
should generate a cache for (again).
There are probably more instances of this hiding, but all these looked
like the easiest to work with and fix with reasonable (aka net-positive)
effects.
|
|
Using libpam-tmpdir caused us to create our download tmp directory in
root's private tmp before changing to _apt, which wouldn't have access
to it.
By extending our GetTempDir method with an optional wrapper changing the
effective user, we can test if a given user can access the directory and
ignore TMPDIR if not instead of ignoring TMPDIR completely.
Closes: 797270
|
|
Some additional files like 'Contents' are very big and should therefore
kept compressed on the disk, which apt-file did in the past. It also
implemented pdiff patching of these files by un- and recompressing these
files on-the-fly, with this commit we can do the same – but we can do
this in both pdiff patching styles (client and server merging) and
secured by hashes.
Hashes are in so far slightly complicated as we can't compare the hashes
of the compressed files as we might compress them differently than the
server would (different compressor versions, options, …), so we must
compare the hashes of the uncompressed content.
While this commit has changes in public headers, the classes it changes
are marked as hidden, so nobody can use them directly, which means the
ABI break is internal only.
|
|
Disabling pdiffs can be useful occasionally, like if you have a fast
local mirror where the download doesn't matter, but still want to use it
for non-local mirrors. Also, some users might prefer it to only use it
for very big indextargets like Contents.
|
|
First of, the temporary directory we download the changelog to needs to
be owned by _apt, but that also means that we don't need to check if we
could/should drop privs as the download happens to a dedicated tempdir
and only after that it is moved to its final location by a privileged user.
|
|
Git-Dch: ignore
|
|
|
|
I never understood why there is an extra newline in those messages, so
now is as good time as any to drop them. Lets see if someone complains
with a good reason to keep it…
|
|
Reporting errors from Done() is bad for progress reporting and such, so
factoring this out is a good idea and we start with moving the supposed-
to-be clearsigned file isn't clearsigned out first – improving the error
message in the process as we use the same message for a similar case
(NODATA) as this is what I have to look at with the venue wifi at
DebCamp and the old errormessage doesn't really say anything.
|
|
Redirectors like httpredir.debian.org orchestra the download from
multiple (hopefully close) mirrors while having only a single central
sources.list entry by using redirects. This has the effect that the
progress report always shows the source it started with, not the mirror
it ends up fetching from, which is especially problematic for error
reporting as having a report for a "Hashsum mismatch" for the redirector
URI is next to useless as nobody knows which URI it was really fetched
from (regardless of it coming from a user or via the report script) from
this output alone. You would need to enable debug output and hope for
the same situation to arise again…
We hence reuse the UsedMirror field of the mirror:// method and detect
redirects which change the site and declare this new site as the
UsedMirrror (and adapt the description).
The disadvantage is that there is no obvious mapping anymore (it is
relatively easy to guess through with some experience) from progress
lines to sources.list lines, so error messages need to take care to use
the Target description (rather than current Item description) if they
want to refer to the sources.list entry.
|
|
QuereURI already skips the aquire of the real file in such a case, but
it can't detect pdiffs this way. Those already have a handling if the
file wasn't changed in between two Release files, so we just add an
other check for a Release file hit here, too.
Git-Dch: Ignore
|
|
C++11 adds the 'override' specifier to mark that a method is overriding
a base class method and error out if not. We hide it in the APT_OVERRIDE
macro to ensure that we keep compiling in pre-c++11 standards.
Reported-By: clang-modernize -add-override -override-macros
Git-Dch: Ignore
|
|
There is an option to keep all targets (Packages, Sources, …) compressed
for a while now, but the all-or-nothing approach is a bit limited for
our purposes with additional targets as some of them are very big
(Contents) and rarely used in comparison, so keeping them compressed by
default can make sense, while others are still unpacked.
Most interesting is the copy-change maybe: Copy is used by the acquire
system as an uncompressor and it is hence expected that it returns the
hashes for the "output", not the input. Now, in the case of keeping a
file compressed, the output is never written to disk, but generated in
memory and we should still validated it, so for compressed files copy is
expected to return the hashes of the uncompressed file. We used to use
the config option to enable on-the-fly decompress in the method, but in
reality copy is never used in a way where it shouldn't decompress a
compressed file to get its hashes, so we can save us the trouble of
sending this information to the method and just do it always.
|
|
Limits which key(s) can be used to sign a repository. Not immensely useful
from a security perspective all by itself, but if the user has
additional measures in place to confine a repository (like pinning) an
attacker who gets the key for such a repository is limited to its
potential and can't use the key to sign its attacks for an other (maybe
less limited) repository… (yes, this is as weak as it sounds, but having
the capability might come in handy for implementing other stuff later).
|
|
These options could be set via configuration before, but the connection
to the actual sources is so strong that they should really be set in the
sources.list instead – especially as this can be done a lot more
specific rather than e.g. disabling Valid-Until for all sources at once.
Valid-Until-* names are chosen instead of the Min/Max-ValidTime as this
seems like a better name and their use in the wild is probably low
enough that this isn't going to confuse anyone if we have to names for
the same thing in different areas.
In the longrun, the config options should be removed, but for now
documentation hinting at the new options is good enough as these are the
kind of options you set once across many systems with different apt
versions, so the new way should work everywhere first before we
deprecate the old way.
|
|
indexRecords was used to parse the Release file – mostly the hashes –
while metaIndex deals with downloading the Release file, storing all
indexes coming from this release and … parsing the Release file, but
this time mostly for the other fields.
That wasn't a problem in metaIndex as this was done in the type specific
subclass, but indexRecords while allowing to override the parsing method
did expect by default a specific format.
APT isn't really supporting different types at the moment, but this is
a violation of the abstraction we have everywhere else and, which is the
actual reason for this merge: Options e.g. coming from the sources.list
come to metaIndex naturally, which needs to wrap them up and bring them
into indexRecords, so the acquire system is told about it as they don't
get to see the metaIndex, but they don't really belong in indexRecords
as this is just for storing data loaded from the Release file… the
result is a complete mess.
I am not saying it is a lot prettier after the merge, but at least
adding new options is now slightly easier and there is just one place
responsible for parsing the Release file. That can't hurt.
|
|
Various small leaks here and there. Nothing particularily big, but still
good to fix. Found by the sanitizers while running our testcases.
Reported-By: gcc -fsanitize
Git-Dch: Ignore
|
|
Doing this disables the implicit copy assignment operator (among others)
which would cause hovac if used on the classes as it would just copy the
pointer, not the data the d-pointer points to. For most of the classes
we don't need a copy assignment operator anyway and in many classes it
was broken before as many contain a pointer of some sort.
Only for our Cacheset Container interfaces we define an explicit copy
assignment operator which could later be implemented to copy the data
from one d-pointer to the other if we need it.
Git-Dch: Ignore
|
|
Some of them modify the ABI, but given that we prepare a big one
already, these few hardly count for much.
Git-Dch: Ignore
|
|
To have a chance to keep the ABI for a while we need all three to team
up. One of them missing and we might loose, so ensuring that they are
available is a very tedious but needed task once in a while.
Git-Dch: Ignore
|
|
Again, consistency is the main sellingpoint here, but this way it is now
also easier to explain that some files move through different stages and
lines are printed for them hence multiple times: That is a bit hard to
believe if the number is changing all the time, but now that it keeps
consistent.
|
|
All other methods call it, so they should follow along even if the work
they do afterwards is hardly breathtaking and usually results in a
URIDone pretty soon, but the acquire system tells the individual item
about this via a virtual method call, so even through none of our
existing items contains any critical code in these, maybe one day they
might. Consistency at least once…
Which is also why this has a good sideeffect: file: and cdrom: requests
appear now in the 'apt-get update' output. Finally - it never made sense
to hide them for me. Okay, I guess it made before the new hit behavior,
but now that you can actually see the difference in an update it makes
sense to see if a file: repository changed or not as well.
|
|
This is an unlikely event for indexes and co, but it can happen quiet
easily e.g. for changelogs where you want to get the changelogs for
multiple binary package(version)s which happen to all be built from a
single source.
The interesting part is that the Acquire system actually detected this
already and set the item requesting the URI again to StatDone - expect
that this is hardly sufficient: an Item must be Complete=true as well
to be considered truely done and that is only the tip of the ::Done
handling iceberg. So instead of this StatDone hack we allow QItems to be
owned by multiple items and notify all owners about everything now,
so that for the point of each item they got it downloaded just for them.
|
|
Provided is a specialized acquire item which given a version can figure
out the correct URI to try by itself and if not provides an error
message alongside with static methods to get just the URI it would try
to download if it should just be displayed or similar such.
The URI is constructed as follows:
Release files can provide an URI template in the "Changelogs" field,
otherwise we lookup a configuration item based on the "Label" or
"Origin" of the Release file to get a (hopefully known) default value
for now. This template should contain the string CHANGEPATH which is
replaced with the information about the version we want the changelog
for (e.g. main/a/apt/apt_1.1). This middleway was choosen as this path
part was consistent over the three known implementations (+1 defunct),
while the rest of the URI varies widely between them.
The benefit of this construct is that it is now easy to get changelogs
for Debian packages on Ubuntu and vice versa – even at the moment where
the Changelogs field is present nowhere. Strictly better than what
apt-get had before as it would even fail to get changelogs from
security… Now it will notice that security identifies as Origin: Debian
and pick this setting (assuming again that no Changelogs field exists).
If on the other hand security would ship its changelogs in a different
location we could set it via the Label option overruling Origin.
Closes: 687147, 739854, 784027, 787190
|
|
We used to read the Release file for each Packages file and store the
data in the PackageFile struct even through potentially many Packages
(and Translation-*) files could use the same data. The point of the
exercise isn't the duplicated data through. Having the Release files as
first-class citizens in the Cache allows us to properly track their
state as well as allows us to use the information also for files which
aren't in the cache, but where we know to which Release file they
belong (Sources are an example for this).
This modifies the pkgCache structs, especially the PackagesFile struct
which depending on how libapt users access the data in these structs can
mean huge breakage or no visible change. As a single data point:
aptitude seems to be fine with this. Even if there is breakage it is
trivial to fix in a backportable way while avoiding breakage for
everyone would be a huge pain for us.
Note that not all PackageFile structs have a corresponding ReleaseFile.
In particular the dpkg/status file as well as *.deb files have not. As
these have only a Archive property need, the Component property takes
over this duty and the ReleaseFile remains zero. This is also the reason
why it isn't needed nor particularily recommended to change from
PackagesFile to ReleaseFile blindly. Sticking with the earlier is
usually the better option.
|
|
Removes a bunch of duplicated code in the deb-specific parts. Especially
the Description part is now handled centrally by IndexTarget instead of
being duplicated to the derivations of IndexFile.
Git-Dch: Ignore
|
|
Creating and passing around a bunch of pointers of IndexTargets (and of
a vector of pointers of IndexTargets) is probably done to avoid the
'costly' copy of container, but we are really not in a timecritical
operation here and move semantics will help us even further in the
future. On the other hand we never do a proper cleanup of these
pointers, which is very dirty, even if structures aren't that big…
The changes will effecting many items only effect our own hidden class,
so we can do that without fearing breaking interfaces or anything.
Git-Dch: Ignore
|
|
We still need an API for the targets, so slowly prepare the IndexTargets
to let them take this job.
Git-Dch: Ignore
|
|
The code requires every index file we download to have a Package field,
but that doesn't hold true for all index we might want to download in
the future. Some might not even be deb822 formatted files…
The check was needed as apt used to accept unverifiable files like
Translation-*, but nowadays it requires hashes for these as well. Even
for unsigned repositories we interpret the Release file as binding now,
which means this check isn't triggerable expect for repositories which
do not have a Release file at all – something which is highly discouraged!
Git-Dch: Ignore
|
|
If we have a file on disk and the hashes are the same in the new Release
file and the old one we have on disk we know that if we ask the server
for the file, we will at best get an IMS hit – at worse the server
doesn't support this and sends us the (unchanged) file and we have to
run all our checks on it again for nothing. So, we can save ourselves
(and the servers) some unneeded requests if we figure this out on our
own.
|
|
Its a bit unclean to create an item just to let the item decide that it
can't do anything and let it fail, so instead we let the item creator
decide in all cases if patching should be attempted.
Also pulls a small trick to get the hashes for the current file without
calculating them by looking at the 'old' Release file if we have it.
Git-Dch: Ignore
|
|
At the moment we only have hashes for the uncompressed pdiff files, but
via the new '$HASH-Download' field in the .diff/Index hashes can be
provided for the .gz compressed pdiff file, which apt will pick up now
and use to verify the download. Now, we "just" need a buy in from the
creators of repositories…
|
|
rred is responsible for unpacking and reading the patch files in one go,
but we currently only have hashes for the uncompressed patch files, so
the handler read the entire patch file before dispatching it to the
worker which would read it again – both with an implicit uncompress.
Worse, while the workers operate in parallel the handler is the central
orchestration unit, so having it busy with work means the workers do
(potentially) nothing.
This means rred is working with 'untrusted' data, which is bad. Yet,
having the unpack in the handler meant that the untrusted uncompress was
done as root which isn't better either. Now, we have it at least
contained in a binary which we can harden a bit better. In the long run,
we want hashes for the compressed patch files through to be safe.
|
|
Having every item having its own code to verify the file(s) it handles
is an errorprune process and easy to break, especially if items move
through various stages (download, uncompress, patching, …). With a giant
rework we centralize (most of) the verification to have a better
enforcement rate and (hopefully) less chance for bugs, but it breaks the
ABI bigtime in exchange – and as we break it anyway, it is broken even
harder.
It shouldn't effect most frontends as they don't deal with the acquire
system at all or implement their own items, but some do and will need to
be patched (might be an opportunity to use apt on-board material).
The theory is simple: Items implement methods to decide if hashes need to
be checked (in this stage) and to return the expected hashes for this
item (in this stage). The verification itself is done in worker message
passing which has the benefit that a hashsum error is now a proper error
for the acquire system rather than a Done() which is later revised to a
Failed().
|
|
If we e.g. fail on hash verification for Packages.xz its highly unlikely
that it will be any better with Packages.gz, so we just waste download
bandwidth and time. It also causes us always to fallback to the
uncompressed Packages file for which the error will finally be reported,
which in turn confuses users as the file usually doesn't exist on the
mirrors, so a bug in apt is suspected for even trying it…
|
|
Valid-Until protects us from long-living downgrade attacks, but not all
repositories have it and an attacker could still use older but still
valid files to downgrade us. While this makes it sounds like a security
improvement now, its a bit theoretical at best as an attacker with
capabilities to pull this off could just as well always keep us days
(but in the valid period) behind and always knows which state we have,
as we tell him with the If-Modified-Since header. This is also why this
is 'silently' ignored and treated as an IMSHit rather than screamed at
the user as this can at best be an annoyance for attackers.
An error here would 'regularily' be encountered by users by out-of-sync
mirrors serving a single run (e.g. load balancer) or in two consecutive
runs on the other hand, so it would just help teaching people ignore it.
That said, most of the code churn is caused by enforcing this additional
requirement. Crisscross from InRelease to Release.gpg is e.g. very
unlikely in practice, but if we would ignore it an attacker could
sidestep it this way.
|
|
Not all servers we are talking to support If-Modified-Since and some are
not even sending Last-Modified for us, so in an effort to detect such
hits we run a hashsum check on the 'old' compared to the 'new' file, we
got the hashes for the 'new' already for "free" from the methods anyway
and hence just need to calculated the old ones.
This allows us to detect hits even with unsupported servers, which in
turn means we benefit from all the new hit behavior also here.
|