Age | Commit message (Collapse) | Author |
|
This 'method' is the abstract base for http and https and should as such
be called out like this rather using an easily confused name.
Gbp-Dch: Ignore
|
|
Having a Reset(bool) method to partially reset certain variables like
the download size always were strange, so this commit splits the
ServerState into an additional RequestState living on the stack for as
long as we deal with this request causing an automatic "reset".
There is much to do still to make this code look better, but this is a
good first step which compiles cleanly and passes all tests, so keeping
it as history might be beneficial and due to avoiding explicit memory
allocations it ends up fixing a small memory leak in https, too.
Closes: #440057
|
|
[Comment from commiter:] I have the feeling that the issue itself is
fixed for a while already as nowadays we have testcases involving a
webserver closing the connection on error (look for "closeOnError") and
no even remotely recent reports about it, but moving the content
clearance above the failure report is a valid change and shouldn't hurt.
Closes: #465572
|
|
If a server closes a connection after sending us a file that tends to
mean that its a type of server who always closes the connection – it is
therefore relatively pointless to try pipelining with it even if it
isn't a problem by itself: apt is just restarting the pipeline each
time after it got served one file and the connection is closed.
The problem starts if one or more proxies are between the server and apt
and they disagree about how the connection should be as in the
bugreporters case where the responses apt gets contain both Keep-Alive
and Proxy-Connection headers (which apt both ignores) indicating a
proxy is trying to keep a connection open while the response also
contains "Connection: close" indicating the opposite which apt
understands and respects as it is required to do.
We avoid stepping into this abyss by not performing pipelining anymore
if we got a respond with the indication to close connection if the
response was otherwise a success – error messages are sent by some
servers via this method as their pages tend to be created dynamically
and hence their size isn't known a priori to them.
Closes: #832113
|
|
If the server told us in a previous request that it isn't supporting
Ranges with bytes via an Accept-Ranges header missing bytes, we don't
try to formulate requests using Ranges.
|
|
We keep various information bits about the server around, some only
effecting the currently handled file (like sizes) while others
should be persistent (like pipeline detections). http used to reset all
file-related manually, which is a bit silly if we already have a Reset()
method – which does reset all through –, so extending it with a
parameter for reuse and calling it from https too (as this was
previously resetting by just creating a new state struct – it uses no
value of the persistent state-keeping yet as it supports no pipelining).
Gbp-Dch: Ignore
|
|
It seems completely pointless from a server-POV to sent empty header
fields, so most of them don't do it (simply proven by this limitation
existing since day one) – but it is technically allowed by the RFC as
the surounding whitespaces are optional and Github seems to like sending
"X-Geo-Block-List:\r\n" since recently (bug reports in other http
clients indicate July) at least sometimes as the reporter claims to have
seen it on https only even through it can happen with both.
Closes: 834048
|
|
With apts http transport supporting socks5h proxies and all the work
in terms of configuration of methods based on the name it is called with
it becomes surprisingly easy to implement Tor support equally (and
perhaps even a bit exceeding) what is available currently in
apt-transport-tor.
How this will turn out to be handled packaging wise we will see in
https://lists.debian.org/deity/2016/08/msg00012.html , but until this is
resolved we can add the needed support without actively enabling it for
now, so that this can be tested better.
|
|
The https method implemented for a long while now a hardcoded fallback
to the same options in http, which, while it works, is rather inflexible
if we want to allow the methods to use another name to change their
behavior slightly, like apt-transport-tor does to https – most of the
diff being s#https#tor#g which then fails to do the full circle
fallthrough tor -> https -> http for https sources. With this config
infrastructure this could be implemented now.
|
|
cURL which backs our https implementation can handle redirects on its
own, but by dealing with them on our own we gain finer control over which
redirections will be performed (we don't like https → http) and by whom
so that redirections to other hosts correctly spawn a new https method
dealing with these instead of letting the current one deal with it.
|
|
Having the detection handled in specific (http) workers means that a
redirection loop over different hostnames isn't detected. Its also not a
good idea have this implement in each method independently even if it
would work
|
|
The rewrite in 742f67eaede80d2f9b3631d8697ebd63b8f95427 is based on the
assumption that the pipeline will always be at least one item short each
time it is called, but the logs in #832113 suggest that this isn't
always the case. I fail to see how at the moment, but the old
implementation had this behavior, so restoring it can't really hurt, can
it?
|
|
Also fixes message itself to mention the correct option name as noticed
in #832113.
|
|
Seen in #828011 if we fail to parse a header field like Last-Modified we
end up interpreting the data as response header for coming requests in
case we don't rotate to a new server in DNS rotation.
|
|
We use a wild mixture of C and C++ ways of generating output, so having
a consistent world-view in both styles sounds like a good idea and
should help in preventing regressions.
|
|
Setting the C++ locale via std::locale::global(std::locale("")); which
would otherwise default to the default C locale (aka: unaffected by
setlocale) effects the formatting of numeric types in IO streams, which
for output for humans is perfectly sensible, but breaks our many text
interfaces used and parsed by us and others without expecting the
numbers to be formatted.
Closes: #825396
|
|
We have this situation in cases were parts of the transaction are
refused (e.g. in a hashsum mismatch) and rerun the update (e.g. in the
hope that we get a mirror which is synced this time).
Previously we would ask the server with an if-range and in the best case
recieve a 416 in response (less featureful server might end up giving us
the entire file again or we get the wrong file this time giving us a
hashsum mismatch…), which is a waste of time if we know already by
checking the hashsums that we got the complete and correct file.
|
|
Reported-By: cppcheck
Git-Dch: Ignore
|
|
Just enabling it for anyone breaks with HTTP/1.0 servers and
proxies sometimes.
Closes: #810796
|
|
This converts all callers that read machine-generated data,
callers that might work with user input are not converted.
|
|
Allows users who know what they are getting themselves into with this
trick to e.g. disable privilege dropping for e.g. file:// until they can
fix up the permissions on those repositories. It helps also the test
framework and people with a similar setup (= me) to run in less modified
environments.
|
|
Unlinking /dev/null is bad, we shouldn't do that. Also, we should print
at least a warning if we tried to unlink a file but didn't manage to
pull it of (ignoring the case were the file is /dev/null or doesn't
exist in the first place).
This got triggered by a relatively unlikely to cause problem in
pkgAcquire::Worker::PrepareFiles which would while temporary
uncompressed files (which are set to keep compressed) figure out that to
files are the same and prepare for sharing by deleting them. Bad move.
That also shows why not printing a warning is a bad idea as this hide
the error for in non-root test runs.
Git-Dch: Ignore
|
|
Reported-By: gcc -fsanitize=address -fno-sanitize=vptr
Git-Dch: Ignore
|
|
Reported-By: codespell
|
|
Conflicts:
apt-pkg/pkgcache.h
debian/changelog
methods/https.cc
methods/server.cc
test/integration/test-apt-download-progress
|
|
The variable "Size" was misleading and caused bug #1445239. To
avoid similar issues in the future, rename it to make the meaning
more obvious.
git-dch: ignore
|
|
The apt http code parses Content-Length and Content-Range. For
both requests the variable "Size" is used and the semantic for
this Size is the total file size. However Content-Length is not
the entire file size for partital file requests. For servers that
send the Content-Range header first and then the Content-Length
header this can lead to globbing of Size so that its less than
the real file size. This may lead to a subsequent passing of a
negative number into the CircleBuf which leads to a endless
loop that writes data.
Thanks to Anton Blanchard for the analysis and initial patch.
LP: #1445239
|
|
If we have the expected hashes we can check with them if the file we
have in partial we got a 416 for is the expected file. We detected this
with same-size before, but not every server sends a good Content-Range
header with a 416 response.
|
|
We do this in HTTP already to give the CPU some exercise while the disk
is heavily spinning (or flashing?) to store the data avoiding the need
to reread the entire file again later on to calculate the hashes – which
happens outside of the eyes of progress reporting, so you might ended up
with a bunch of https workers 'stuck' at 100% while they were busy
calculating hashes.
This is a bummer for everyone using apt as a connection speedtest as the
https method works slower now (not really, it just isn't reporting done
too early anymore).
|
|
Methods get told which hashes are expected by the acquire system, which
means we can use this list to restrict what we calculate in the methods
as any extra we are calculating is wasted effort as we can't compare it
with anything anyway.
Adding support for a new hash algorithm is therefore 'free' now and if a
algorithm is no longer provided in a repository for a file, we
automatically stop calculating it.
In practice this results in a speed-up in Debian as we don't have SHA512
here (so far), so we practically stop calculating it.
|
|
Bug #778375 uncovered that https wasn't properly integrated in the class
family tree of http as it was supposed to be leading to a NULL pointer
dereference. Fixing this 'properly' was deemed to much diff for
practically no gain that late in the release, so commit
0c2dc43d4fe1d026650b5e2920a021557f9534a6 just fixed the synptom, while
this commit here is fixing the cause plus adding a test.
|
|
|
|
Do not crash in ServerState::HeaderLine if there is no Owner.
Closes: #778375
|
|
Real webservers (like apache) actually send an error page with a 416
response, but our client didn't expect it leaving the page on the socket
to be parsed as response for the next request (http) or as file content
(https), which isn't what we want at all… Symptom is a "Bad header line"
as html usually doesn't parse that well to an http-header.
This manifests itself e.g. if we have a complete file (or larger) in
partial/ which isn't discarded by If-Range as the server doesn't support
it (or it is just newer, think: mirror rotation).
It is a sort-of regression of 78c72d0ce22e00b194251445aae306df357d5c1a,
which removed the filesize - 1 trick, but this had its own problems…
To properly test this our webserver gains the ability to reply with
transfer-encoding: chunked as most real webservers will use it to send
the dynamically generated error pages.
(The tests and their binary helpers had to be slightly modified to
apply, but the patch to fix the issue itself is unchanged.)
Closes: 768797
|
|
Real webservers (like apache) actually send an error page with a 416
response, but our client didn't expect it leaving the page on the socket
to be parsed as response for the next request (http) or as file content
(https), which isn't what we want at all… Symptom is a "Bad header line"
as html usually doesn't parse that well to an http-header.
This manifests itself e.g. if we have a complete file (or larger) in
partial/ which isn't discarded by If-Range as the server doesn't support
it (or it is just newer, think: mirror rotation).
It is a sort-of regression of 78c72d0ce22e00b194251445aae306df357d5c1a,
which removed the filesize - 1 trick, but this had its own problems…
To properly test this our webserver gains the ability to reply with
transfer-encoding: chunked as most real webservers will use it to send
the dynamically generated error pages.
Closes: 768797
|
|
Git-Dch: ignore
|
|
The Maximum-Size protection breaks the http pipeline reorder code
because it relies on that the object got fetched entirely so that
it can compare the hash of the downloaded data. So instead of
stopping when the Maximum-Size of the expected item is reached we
only stop when the maximum size of the biggest item in the queue
is reached. This way the pipeline reoder code keeps working.
|
|
|
|
|
|
'unsigned int'
Git-Dch: Ignore
Reported-By: cppcheck
|
|
Conflicts:
apt-pkg/acquire-item.cc
apt-pkg/acquire-item.h
apt-pkg/cachefilter.h
configure.ac
debian/changelog
|
|
Prefix all answers with the URL that the answer is for. This
helps when debugging and pipeline is enabled.
|
|
This ensures that we can stop downloading if the server send
too much data by accident (or by a malicious attempt)
|
|
Now that methods have the expected hashes available they can check if
the response from the server is what they expected. Pipelining is one of
those areas in which servers can mess up by not supporting it properly,
which forced us to disable it for the time being. Now, we check if
we got a response out of order, which we can not only use to disable
pipelining automatically for the next requests, but we can fix it up
just like the server responded in proper order for the current requests.
To ensure that this little trick works pipelining is only attempt if we
have hashsums for all the files in the chain which in theory reduces the
use of pipelining usage even on the many servers which work properly,
but in practice only the InRelease file (or similar such) will be
requested without a hashsum – and as it is the only file requested in
that stage it can't be pipelined even if we wanted to.
Some minor annoyances remain: The display of the progress we have
doesn't reflect this change, so it looks like the same package gets
downloaded multiple times while others aren't at all. Further more,
partial files are not supported in this recovery as the received data
was appended to the wrong file, so the hashsum doesn't match.
Both seem to be minor enough to reenable pipelining by default until
further notice through to test if it really solves the problem.
This therefore reverts commit 8221431757c775ee875a061b184b5f6f2330f928.
|
|
Beside being a bit cleaner it hopefully also resolves oddball problems
I have with high levels of parallel jobs.
Git-Dch: Ignore
Reported-By: iwyu (include-what-you-use)
|
|
Git-Dch: Ignore
Reported-By: gcc -Wpedantic
|
|
|
|
switch protocols at random is a bad idea if e.g. http can switch to
file, so we limit the possibilities to http to http and http to https.
As very few people (less than 1% according to popcon) have https
installed this likely changes nothing in terms of failure. The commit is
adding a friendly hint which package needs to be installed though.
|
|
cppcheck complains about the obsolete utime as it was removed in
POSIX1.2008 and recommends usage of utimensat/futimens instead
as those are in POSIX and so commit 9ce3cfc9 switched to them.
It is just that they aren't as portable as the standard suggests:
At least our kFreeBSD and Hurd ports stumble over it at runtime.
So to make both, the ports and cppcheck happy, we use utimes instead.
Closes: 738567
|
|
The most "visible" change is from utime to utimensat/futimens
as the first one isn't part of POSIX anymore.
Reported-By: cppcheck
Git-Dch: Ignore
|