Age | Commit message (Collapse) | Author |
|
Curl requires URLs to be urlencoded. We are however giving it
undecoded URLs. This causes it go completely nuts if there is
a space in the URI, producing requests like:
GET /a file HTTP/1.1
which the servers then interpret as a GET request for "/a" with
HTTP version "file" or some other non-sense.
This works around the issue by encoding the path component of
the URL. I'm not sure if we should encode other parts of the URL
as well, this one seems to do the trick for the actual issue at
hand.
A more correct fix is to avoid the dequoting and (re-)quoting
of URLs when a redirect occurs / a new request is sent. That's
been on the radar for probably a year or two now, but nobody
bothered implementing that yet.
LP: #1651923
(cherry picked from commit 994515e689dcc5f963f5fed58284831750a5da03)
|
|
If the server told us in a previous request that it isn't supporting
Ranges with bytes via an Accept-Ranges header missing bytes, we don't
try to formulate requests using Ranges.
|
|
We keep various information bits about the server around, some only
effecting the currently handled file (like sizes) while others
should be persistent (like pipeline detections). http used to reset all
file-related manually, which is a bit silly if we already have a Reset()
method – which does reset all through –, so extending it with a
parameter for reuse and calling it from https too (as this was
previously resetting by just creating a new state struct – it uses no
value of the persistent state-keeping yet as it supports no pipelining).
Gbp-Dch: Ignore
|
|
With apts http transport supporting socks5h proxies and all the work
in terms of configuration of methods based on the name it is called with
it becomes surprisingly easy to implement Tor support equally (and
perhaps even a bit exceeding) what is available currently in
apt-transport-tor.
How this will turn out to be handled packaging wise we will see in
https://lists.debian.org/deity/2016/08/msg00012.html , but until this is
resolved we can add the needed support without actively enabling it for
now, so that this can be tested better.
|
|
The https method implemented for a long while now a hardcoded fallback
to the same options in http, which, while it works, is rather inflexible
if we want to allow the methods to use another name to change their
behavior slightly, like apt-transport-tor does to https – most of the
diff being s#https#tor#g which then fails to do the full circle
fallthrough tor -> https -> http for https sources. With this config
infrastructure this could be implemented now.
|
|
cURL which backs our https implementation can handle redirects on its
own, but by dealing with them on our own we gain finer control over which
redirections will be performed (we don't like https → http) and by whom
so that redirections to other hosts correctly spawn a new https method
dealing with these instead of letting the current one deal with it.
|
|
Closes: #623443
|
|
|
|
If we don't give a specific error to report up it is likely that all
error currently in the error stack are equally important, so reporting
just one could turn out to be confusing e.g. if name resolution failed
in a SRV record list.
|
|
All apt versions support numeric as well as 3-character timezones just
fine and its actually hard to write code which doesn't "accidently"
accepts it. So why change? Documenting the Date/Valid-Until fields in
the Release file is easy to do in terms of referencing the
datetime format used e.g. in the Debian changelogs (policy §4.4). This
format specifies only the numeric timezones through, not the nowadays
obsolete 3-character ones, so in the interest of least surprise we should
use the same format even through it carries a small risk of regression
in other clients (which encounter repositories created with
apt-ftparchive).
In case it is really regressing in practice, the hidden option
-o APT::FTPArchive::Release::NumericTimezone=0
can be used to go back to good old UTC as timezone.
The EDSP and EIPP protocols use this 'new' format, the text interface
used to communicate with the acquire methods does not for compatibility
reasons even if none of our methods would be effected and I doubt any
other would (in these instances the timezone is 'GMT' as that is what
HTTP/1.1 requires). Note that this is only true for apt talking to
methods, (libapt-based) methods talking to apt will respond with the
'new' format. It is therefore strongly adviced to support both also in
method input.
|
|
We use a wild mixture of C and C++ ways of generating output, so having
a consistent world-view in both styles sounds like a good idea and
should help in preventing regressions.
|
|
when using the https transport mechanism, $no_proxy is ignored if apt is
getting it's proxy information from $https_proxy (as opposed to
Acquire::https::Proxy somewhere in apt config). if the source of proxy
information is Acquire::https::Proxy set in apt.conf (or apt.conf.d),
then $no_proxy is honored.
|
|
This converts all callers that read machine-generated data,
callers that might work with user input are not converted.
|
|
Reported-By: cppcheck
Git-Dch: Ignore
|
|
Unlinking /dev/null is bad, we shouldn't do that. Also, we should print
at least a warning if we tried to unlink a file but didn't manage to
pull it of (ignoring the case were the file is /dev/null or doesn't
exist in the first place).
This got triggered by a relatively unlikely to cause problem in
pkgAcquire::Worker::PrepareFiles which would while temporary
uncompressed files (which are set to keep compressed) figure out that to
files are the same and prepare for sharing by deleting them. Bad move.
That also shows why not printing a warning is a bad idea as this hide
the error for in non-root test runs.
Git-Dch: Ignore
|
|
Detecting network errors has some benefits in the acquire system as if
we can't connect to a host trying it for a million files is pointless.
http and co which use connect.cc deal with this, but https which uses
curl had connection failures as "normal" errors which could potentially
be worked around (like trying Release instead of the failed InRelease).
Git-Dch: Ignore
|
|
Reported-By: gcc -fsanitize=address -fno-sanitize=vptr
Git-Dch: Ignore
|
|
Conflicts:
apt-pkg/pkgcache.h
debian/changelog
methods/https.cc
methods/server.cc
test/integration/test-apt-download-progress
|
|
Git-Dch: ignore
|
|
Conflicts:
apt-pkg/deb/dpkgpm.cc
|
|
The variable "Size" was misleading and caused bug #1445239. To
avoid similar issues in the future, rename it to make the meaning
more obvious.
git-dch: ignore
|
|
Not all servers we are talking to support If-Modified-Since and some are
not even sending Last-Modified for us, so in an effort to detect such
hits we run a hashsum check on the 'old' compared to the 'new' file, we
got the hashes for the 'new' already for "free" from the methods anyway
and hence just need to calculated the old ones.
This allows us to detect hits even with unsupported servers, which in
turn means we benefit from all the new hit behavior also here.
|
|
If we have the expected hashes we can check with them if the file we
have in partial we got a 416 for is the expected file. We detected this
with same-size before, but not every server sends a good Content-Range
header with a 416 response.
|
|
We do this in HTTP already to give the CPU some exercise while the disk
is heavily spinning (or flashing?) to store the data avoiding the need
to reread the entire file again later on to calculate the hashes – which
happens outside of the eyes of progress reporting, so you might ended up
with a bunch of https workers 'stuck' at 100% while they were busy
calculating hashes.
This is a bummer for everyone using apt as a connection speedtest as the
https method works slower now (not really, it just isn't reporting done
too early anymore).
|
|
Methods get told which hashes are expected by the acquire system, which
means we can use this list to restrict what we calculate in the methods
as any extra we are calculating is wasted effort as we can't compare it
with anything anyway.
Adding support for a new hash algorithm is therefore 'free' now and if a
algorithm is no longer provided in a repository for a file, we
automatically stop calculating it.
In practice this results in a speed-up in Debian as we don't have SHA512
here (so far), so we practically stop calculating it.
|
|
The worker expects that the methods tell him when they start or finish
downloading a file. Various information pieces are passed along in this
report including the (expected) filesize. https was using a "global"
struct for reporting which made it 'reuse' incorrect values in some
cases like a non-existent InRelease fallbacking to Release{,.gpg}
resulting in a size-mismatch warning. Reducing the scope and redesigning
the setting of the values we can fix this and related issues.
Closes: 777565, 781509
Thanks: Robert Edmonds and Anders Kaseorg for initial patchs
|
|
It might be quite interesting which file (content) made curl freak out
and other methods keep the file around as well.
Git-Dch: Ignore
|
|
to 404"
This reverts commit 1296bc7c466181a7978c313c40a041b34ce3eaeb.
|
|
The worker expects that the methods tell him when they start or finish
downloading a file. Various information pieces are passed along in this report
including the (expected) filesize. https is using a "global" struct for
reporting which made it 'reuse' incorrect values in some cases like a
non-existent InRelease fallbacking to Release{,.gpg} resulting in an incorrect
size-mismatch warning scaring and desensitizing users as well as being subject
to a race between the write_data and progress callbacks generating incorrect
progress reporting and potentially the same error message.
Other branches as well as the bugreports contain 'better' fixes making the
struct local and other sensible changes, but are larger as a result, so in
this version we opted for short diff with minimal effect above else instead.
Closes: 777565, 781509
Thanks: Robert Edmonds and Anders Kaseorg for initial patchs
|
|
|
|
Bug #778375 uncovered that https wasn't properly integrated in the class
family tree of http as it was supposed to be leading to a NULL pointer
dereference. Fixing this 'properly' was deemed to much diff for
practically no gain that late in the release, so commit
0c2dc43d4fe1d026650b5e2920a021557f9534a6 just fixed the synptom, while
this commit here is fixing the cause plus adding a test.
|
|
|
|
Add a explicit ReceivedData to HttpsMethod that indicates when
we got data from the connection so that we can send URISTart()
to the parent.
This is needed because URIStart got moved in f9b4f12d from
the progress_callback to write_data() and it only checks for
Res.Size. In the old code if progress_callback is called by
libcurl (and sets Res.Size) before write_data is called then
URIStart() is never send. Making this a explicit ReceivedData
variable fixes this issue.
|
|
Real webservers (like apache) actually send an error page with a 416
response, but our client didn't expect it leaving the page on the socket
to be parsed as response for the next request (http) or as file content
(https), which isn't what we want at all… Symptom is a "Bad header line"
as html usually doesn't parse that well to an http-header.
This manifests itself e.g. if we have a complete file (or larger) in
partial/ which isn't discarded by If-Range as the server doesn't support
it (or it is just newer, think: mirror rotation).
It is a sort-of regression of 78c72d0ce22e00b194251445aae306df357d5c1a,
which removed the filesize - 1 trick, but this had its own problems…
To properly test this our webserver gains the ability to reply with
transfer-encoding: chunked as most real webservers will use it to send
the dynamically generated error pages.
(The tests and their binary helpers had to be slightly modified to
apply, but the patch to fix the issue itself is unchanged.)
Closes: 768797
|
|
Real webservers (like apache) actually send an error page with a 416
response, but our client didn't expect it leaving the page on the socket
to be parsed as response for the next request (http) or as file content
(https), which isn't what we want at all… Symptom is a "Bad header line"
as html usually doesn't parse that well to an http-header.
This manifests itself e.g. if we have a complete file (or larger) in
partial/ which isn't discarded by If-Range as the server doesn't support
it (or it is just newer, think: mirror rotation).
It is a sort-of regression of 78c72d0ce22e00b194251445aae306df357d5c1a,
which removed the filesize - 1 trick, but this had its own problems…
To properly test this our webserver gains the ability to reply with
transfer-encoding: chunked as most real webservers will use it to send
the dynamically generated error pages.
Closes: 768797
|
|
Do not drop privileges in the methods when using a older version of
libapt that does not support the chown magic in partial/ yet. To
do this DropPrivileges() now will ignore a empty Apt::Sandbox::User.
Cleanup all hardcoded _apt along the way.
|
|
Communicate the fail reason from the methods to the parent
and Rename() failed files.
|
|
|
|
|
|
|
|
|
|
|
|
Add a new "Debian-apt" user that owns the /var/lib/apt/lists
and /var/cache/apt/archive directories. The methods
http, https, ftp, gpgv, gzip switch to this user when they
start.
Thanks to Julian and "ioerror" and tors "switch_id()" code.
|
|
When doing Acquire::http{,s}::Proxy-Auto-Detect, run the auto-detect
command for each host instead of only once. This should make using
"proxy" from libproxy-tools feasible which can then be used for PAC
style or other proxy configurations.
Closes: #759264
|
|
|
|
|
|
Beside being a bit cleaner it hopefully also resolves oddball problems
I have with high levels of parallel jobs.
Git-Dch: Ignore
Reported-By: iwyu (include-what-you-use)
|
|
Reported-By: gcc -Wunused-parameter
Git-Dch: Ignore
|
|
Git-Dch: Ignore
Reported-By: gcc -Wpedantic
|
|
|