Age | Commit message (Collapse) | Author |
|
It shouldn't be too common, but sometimes people have multiple mirrors
in the sources or otherwise repositories with the same content. Now that
we gracefully can handle multiple requests to the same URI, we can also
fold multiple requests with the same expected hashes into one. Note that
this isn't trying to find oppertunities for merging, but just merges if
it happens to encounter the oppertunity for it.
This is most obvious in the new testcase actually as it needs to delay
the action to give the acquire system enough time to figure out that
they can be merged.
|
|
Again, consistency is the main sellingpoint here, but this way it is now
also easier to explain that some files move through different stages and
lines are printed for them hence multiple times: That is a bit hard to
believe if the number is changing all the time, but now that it keeps
consistent.
|
|
All other methods call it, so they should follow along even if the work
they do afterwards is hardly breathtaking and usually results in a
URIDone pretty soon, but the acquire system tells the individual item
about this via a virtual method call, so even through none of our
existing items contains any critical code in these, maybe one day they
might. Consistency at least once…
Which is also why this has a good sideeffect: file: and cdrom: requests
appear now in the 'apt-get update' output. Finally - it never made sense
to hide them for me. Okay, I guess it made before the new hit behavior,
but now that you can actually see the difference in an update it makes
sense to see if a file: repository changed or not as well.
|
|
This is an unlikely event for indexes and co, but it can happen quiet
easily e.g. for changelogs where you want to get the changelogs for
multiple binary package(version)s which happen to all be built from a
single source.
The interesting part is that the Acquire system actually detected this
already and set the item requesting the URI again to StatDone - expect
that this is hardly sufficient: an Item must be Complete=true as well
to be considered truely done and that is only the tip of the ::Done
handling iceberg. So instead of this StatDone hack we allow QItems to be
owned by multiple items and notify all owners about everything now,
so that for the point of each item they got it downloaded just for them.
|
|
'file' isn't using the destination file per-se, but returns another name
via "Filename" header. It still should deal with destination files as
they could exist (pkgAcqFile e.g. creates links in that location) and
are potentially bogus.
|
|
Provided is a specialized acquire item which given a version can figure
out the correct URI to try by itself and if not provides an error
message alongside with static methods to get just the URI it would try
to download if it should just be displayed or similar such.
The URI is constructed as follows:
Release files can provide an URI template in the "Changelogs" field,
otherwise we lookup a configuration item based on the "Label" or
"Origin" of the Release file to get a (hopefully known) default value
for now. This template should contain the string CHANGEPATH which is
replaced with the information about the version we want the changelog
for (e.g. main/a/apt/apt_1.1). This middleway was choosen as this path
part was consistent over the three known implementations (+1 defunct),
while the rest of the URI varies widely between them.
The benefit of this construct is that it is now easy to get changelogs
for Debian packages on Ubuntu and vice versa – even at the moment where
the Changelogs field is present nowhere. Strictly better than what
apt-get had before as it would even fail to get changelogs from
security… Now it will notice that security identifies as Origin: Debian
and pick this setting (assuming again that no Changelogs field exists).
If on the other hand security would ship its changelogs in a different
location we could set it via the Label option overruling Origin.
Closes: 687147, 739854, 784027, 787190
|
|
Translation-* files are internally handled as PackageFiles which isn't
super nice, but giving them their own struct is a bit overkill so let it
be for the moment. They always appeared in the policy output because of
this through and now that they are properly linked to a ReleaseFile they
even display all the pinning information on them, but they don't contain
any packages which could be pinned… No problem, but useless and
potentially confusing output.
Adding a 'NoPackages' flag which can be set on those files and be used
in applications seems like a simple way to fix this display issue.
|
|
This is mainly visible in the policy, so that you can now pin by b= and
let it only effect Packages files of this architecture and hence the
packages coming from it (which do not need to be from this architecture,
but very likely are in a normal repository setup).
If you should pin by architecture in this way is a different question…
Closes: 687255
|
|
Selecting targets based on the Release they belong to isn't to
unrealistic. In fact, it is assumed to be the most used case so it is
made the default especially as this allows to bundle another thing we
have to be careful with: Filenames and only showing targets we have
acquired.
Closes: 752702
|
|
We used to read the Release file for each Packages file and store the
data in the PackageFile struct even through potentially many Packages
(and Translation-*) files could use the same data. The point of the
exercise isn't the duplicated data through. Having the Release files as
first-class citizens in the Cache allows us to properly track their
state as well as allows us to use the information also for files which
aren't in the cache, but where we know to which Release file they
belong (Sources are an example for this).
This modifies the pkgCache structs, especially the PackagesFile struct
which depending on how libapt users access the data in these structs can
mean huge breakage or no visible change. As a single data point:
aptitude seems to be fine with this. Even if there is breakage it is
trivial to fix in a backportable way while avoiding breakage for
everyone would be a huge pain for us.
Note that not all PackageFile structs have a corresponding ReleaseFile.
In particular the dpkg/status file as well as *.deb files have not. As
these have only a Archive property need, the Component property takes
over this duty and the ReleaseFile remains zero. This is also the reason
why it isn't needed nor particularily recommended to change from
PackagesFile to ReleaseFile blindly. Sticking with the earlier is
usually the better option.
|
|
Downloading additional files is only half the job. We still need a way
to allow external tools to know where the files are they requested for
download given that we don't want them to choose their own location.
'apt-get files' is our answer to this showing by default in a deb822
format information about each IndexTarget with the potential to filter
the records based on lines and an option to change the output format.
The command serves also as an example on how to get to this information
via libapt.
|
|
Strings are easy to typo and we can keep the extensibility we require
here with a simple enum we can append to without endangering ABI.
Git-Dch: Ignore
|
|
Removes a bunch of duplicated code in the deb-specific parts. Especially
the Description part is now handled centrally by IndexTarget instead of
being duplicated to the derivations of IndexFile.
Git-Dch: Ignore
|
|
It is a rather strange sight that index items use SiteOnly which strips
the Path, while e.g. deb files are downloaded with NoUserPassword which
does not. Important to note here is that for the file transport Path is
pretty important as there is no Host which would be displayed by Site,
which always resulted in "interesting" unspecific errors for "file:".
Adding a 'middle' ground between the two which does show the Path but
potentially modifies it (it strips a pending / at the end if existing)
solves this "file:" issue, syncs the output and in the end helps to
identify which file is meant exactly in progress output and co as a
single site can have multiple repositories in different paths.
|
|
We need a general way to get from a sources.list entry to IndexTargets
and with this change we can move from pkgSourceList over the list of
metaIndexes it includes to the IndexTargets each metaIndex can have.
Git-Dch: Ignore
|
|
Creating and passing around a bunch of pointers of IndexTargets (and of
a vector of pointers of IndexTargets) is probably done to avoid the
'costly' copy of container, but we are really not in a timecritical
operation here and move semantics will help us even further in the
future. On the other hand we never do a proper cleanup of these
pointers, which is very dirty, even if structures aren't that big…
The changes will effecting many items only effect our own hidden class,
so we can do that without fearing breaking interfaces or anything.
Git-Dch: Ignore
|
|
We still need an API for the targets, so slowly prepare the IndexTargets
to let them take this job.
Git-Dch: Ignore
|
|
We have two places in the code which need to iterate over targets and do
certain things with it. The first one is actually creating these targets
for download and the second instance pepares certain targets for
reading.
Git-Dch: Ignore
|
|
For some reason travis seems to be unhappy about it claiming it
is not defined. Well, lets not think to deeply about it…
Git-Dch: Ignore
|
|
First pass at making the acquire system capable of downloading files
based on configuration rather than hardcoded entries. It is now possible
to instruct 'deb' and 'deb-src' sources.list lines to download more than
just Packages/Translation-* and Sources files. Details on how to do that
can be found in the included documentation file.
|
|
The code requires every index file we download to have a Package field,
but that doesn't hold true for all index we might want to download in
the future. Some might not even be deb822 formatted files…
The check was needed as apt used to accept unverifiable files like
Translation-*, but nowadays it requires hashes for these as well. Even
for unsigned repositories we interpret the Release file as binding now,
which means this check isn't triggerable expect for repositories which
do not have a Release file at all – something which is highly discouraged!
Git-Dch: Ignore
|
|
If we have a file on disk and the hashes are the same in the new Release
file and the old one we have on disk we know that if we ask the server
for the file, we will at best get an IMS hit – at worse the server
doesn't support this and sends us the (unchanged) file and we have to
run all our checks on it again for nothing. So, we can save ourselves
(and the servers) some unneeded requests if we figure this out on our
own.
|
|
Its a bit unclean to create an item just to let the item decide that it
can't do anything and let it fail, so instead we let the item creator
decide in all cases if patching should be attempted.
Also pulls a small trick to get the hashes for the current file without
calculating them by looking at the 'old' Release file if we have it.
Git-Dch: Ignore
|
|
At the moment we only have hashes for the uncompressed pdiff files, but
via the new '$HASH-Download' field in the .diff/Index hashes can be
provided for the .gz compressed pdiff file, which apt will pick up now
and use to verify the download. Now, we "just" need a buy in from the
creators of repositories…
|
|
Git-Dch: Ignore
|
|
The rred parser is very accepting regarding 'invalid' files. Given that
we can't trust the input it might be a bit too relaxed. In any case,
checking for more errors can't hurt given that we support only a very
specific subset of ed commands.
|
|
rred is responsible for unpacking and reading the patch files in one go,
but we currently only have hashes for the uncompressed patch files, so
the handler read the entire patch file before dispatching it to the
worker which would read it again – both with an implicit uncompress.
Worse, while the workers operate in parallel the handler is the central
orchestration unit, so having it busy with work means the workers do
(potentially) nothing.
This means rred is working with 'untrusted' data, which is bad. Yet,
having the unpack in the handler meant that the untrusted uncompress was
done as root which isn't better either. Now, we have it at least
contained in a binary which we can harden a bit better. In the long run,
we want hashes for the compressed patch files through to be safe.
|
|
Having every item having its own code to verify the file(s) it handles
is an errorprune process and easy to break, especially if items move
through various stages (download, uncompress, patching, …). With a giant
rework we centralize (most of) the verification to have a better
enforcement rate and (hopefully) less chance for bugs, but it breaks the
ABI bigtime in exchange – and as we break it anyway, it is broken even
harder.
It shouldn't effect most frontends as they don't deal with the acquire
system at all or implement their own items, but some do and will need to
be patched (might be an opportunity to use apt on-board material).
The theory is simple: Items implement methods to decide if hashes need to
be checked (in this stage) and to return the expected hashes for this
item (in this stage). The verification itself is done in worker message
passing which has the benefit that a hashsum error is now a proper error
for the acquire system rather than a Done() which is later revised to a
Failed().
|
|
If we e.g. fail on hash verification for Packages.xz its highly unlikely
that it will be any better with Packages.gz, so we just waste download
bandwidth and time. It also causes us always to fallback to the
uncompressed Packages file for which the error will finally be reported,
which in turn confuses users as the file usually doesn't exist on the
mirrors, so a bug in apt is suspected for even trying it…
|
|
Conflicts:
apt-pkg/pkgcache.h
debian/changelog
methods/https.cc
methods/server.cc
test/integration/test-apt-download-progress
|
|
Git-Dch: ignore
|
|
Conflicts:
apt-pkg/deb/dpkgpm.cc
|
|
The underlying problem is that libapt-pkg does not correctly parse these
provides. Internally, it creates a version named "baz:i386" with
architecture amd64. Of course, such a package name is invalid and thus
this version is completely inaccessible. Thus, this bug should not cause
apt to accept a broken situation as valid. Nevertheless, it prevents
using architecture qualified depends.
Closes: 777071
|
|
Add a regression test that reproduced the hang of apt when a
partial file is present.
Git-Dch: ignore
|
|
The variable "Size" was misleading and caused bug #1445239. To
avoid similar issues in the future, rename it to make the meaning
more obvious.
git-dch: ignore
|
|
The apt http code parses Content-Length and Content-Range. For
both requests the variable "Size" is used and the semantic for
this Size is the total file size. However Content-Length is not
the entire file size for partital file requests. For servers that
send the Content-Range header first and then the Content-Length
header this can lead to globbing of Size so that its less than
the real file size. This may lead to a subsequent passing of a
negative number into the CircleBuf which leads to a endless
loop that writes data.
Thanks to Anton Blanchard for the analysis and initial patch.
LP: #1445239
|
|
|
|
Valid-Until protects us from long-living downgrade attacks, but not all
repositories have it and an attacker could still use older but still
valid files to downgrade us. While this makes it sounds like a security
improvement now, its a bit theoretical at best as an attacker with
capabilities to pull this off could just as well always keep us days
(but in the valid period) behind and always knows which state we have,
as we tell him with the If-Modified-Since header. This is also why this
is 'silently' ignored and treated as an IMSHit rather than screamed at
the user as this can at best be an annoyance for attackers.
An error here would 'regularily' be encountered by users by out-of-sync
mirrors serving a single run (e.g. load balancer) or in two consecutive
runs on the other hand, so it would just help teaching people ignore it.
That said, most of the code churn is caused by enforcing this additional
requirement. Crisscross from InRelease to Release.gpg is e.g. very
unlikely in practice, but if we would ignore it an attacker could
sidestep it this way.
|
|
Not all servers we are talking to support If-Modified-Since and some are
not even sending Last-Modified for us, so in an effort to detect such
hits we run a hashsum check on the 'old' compared to the 'new' file, we
got the hashes for the 'new' already for "free" from the methods anyway
and hence just need to calculated the old ones.
This allows us to detect hits even with unsupported servers, which in
turn means we benefit from all the new hit behavior also here.
|
|
It isn't used much compared to what the methodname suggests, but in the
remaining uses it can't hurt to check more than strictly necessary by
calculating and verifying with all hashes we can compare with rather
than "just" the best known hash.
|
|
If we have the expected hashes we can check with them if the file we
have in partial we got a 416 for is the expected file. We detected this
with same-size before, but not every server sends a good Content-Range
header with a 416 response.
|
|
While it is mostly busywork to rewrite all instances it actually fixes
bugs as the data storage used by the new method is std::string rather
than a char*, the later mostly created by c_str() from a std::string
which the caller has to ensure keeps in scope – something apt-ftparchive
actually didn't ensure and relied on copy-on-write behavior instead
which c++11 forbids and hence the new default gcc abi doesn't use it.
|
|
TFRewrite is okay, but it has obscure limitations (256 Tags), even more
obscure bugs (order for renames is defined by the old name) and the
interface is very c-style encouraging bad usage like we do it in
apt-ftparchive passing massive amounts of c_str() from std::string in.
The old-style is marked as deprecated accordingly. The next commit will
fix all places in the apt code to not use the old-style anymore.
|
|
In 66c3875df391b1120b43831efcbe88a78569fbfe we workaround/fixed a
problem where the code makes the assumption that the compiler uses
copy-on-write implementations for std::string. Turns out that for c++11
compatibility gcc >= 5 will stop doing this by default.
|
|
dpkg and dak know various field names and order them in their output,
while we have yet another order and have to play catch up with them as
we are sitting between chairs here and neither order is ideal for us,
too.
A little testcase is from now on supposed to help ensureing that we do
not derivate to far away from which fields dpkg knows and orders.
|
|
This rename with value is ordered by the 'old' name 'Source', but should
be ordered by the new name… by splitting the operation in a delete and a
new field we can easily fix this problem locally for now.
|
|
The helper expects to be told if it should generate messages, not where
these messages should be printed – as it isn't printing such messages,
but puts them in _error. apt-get uses in other methods a helper
specialisation which does also print stuff to a stream through, so this
is likely a copy&paste error.
Git-Dch: Ignore
|
|
Git-Dch: Ignore
|
|
Slightly rewriting the code to ensure we only use two sources for the
versions as it could otherwise be confusing to look at.
|
|
dpkg -l < 1.16.2 loads the available file and hence sees a package which
later versions do not see, leading to failures on travis-ci.
The different versions also have slightly different messages.
Git-Dch: Ignore
|