Age | Commit message (Collapse) | Author |
|
Failing on too much data is good, but we can do better by checking for
exact filesizes as we know with hashsums how large a file should be, so
if we get a file which has a size we do not expect we can drop it
directly, regardless of if the file is larger or smaller than what we
expect which should catch most cases which would end up as hashsum
errors later now a lot sooner.
|
|
We tend to operate on rather large static files, which means we usually
get Content-Length information from the server. If we combine this
information with the filesize we are expecting (factoring in pipelining)
we can avoid reading a bunch of data we are ending up rejecting anyhow
by just closing the connection saving bandwidth and time both for the
server as well as the client.
|
|
This makes it easier to see which headers includes what.
The changes were done by running
git grep -l '#\s*include' \
| grep -E '.(cc|h)$' \
| xargs sed -i -E 's/(^\s*)#(\s*)include/\1#\2 include/'
To modify all include lines by adding a space, and then running
./git-clang-format.sh.
|
|
That's just ridiculous these days.
Gbp-Dch: ignore
|
|
HTTPS proxies just require unwrapping the TLS layer at the proxy
connection, that's easy, and of course sending proxy-specific
headers that are sent on "http" proxies.
|
|
Proxying HTTPS traffic requires the proxy providing the
CONNECT method. This implements the client side of it,
although it is a bit hacky.
HTTP connect is a normal HTTP CONNECT request, followed
by a normal HTTP response, just that the body of the
response is the TCP stream of the target host.
We use a special wrapper in case there are data bytes
in the header packets - in that case, the bytes are
stored in a buffer and the buffer will be drained first,
afterwards the connection continues directly with the
TCP stream (with one more vcall).
Also: Do not send full URI to https destinations when proxying,
as we are directly interfacing with the destination data stream.
|
|
The apt-transport-tor package operates via simple symlinks which can
result in 'http' being called as 'tor+https', so it must pick up the
right configuration pieces and trigger https support also in plus names.
|
|
GnuTLS can already have data pending in its buffers, we need
to to drain that first otherwise select() might block
indefinitely.
Gbp-Dch: ignore
|
|
The http method will eventually replace the curl-based
https method, but for now, this is an opt-in experiment
that can be enabled by setting Dir::Bin::Methods::https
to "http".
Known issues:
- We do not support HTTPS proxies yet
- We do not support proxying HTTPS connections yet (CONNECT)
- IssuerCert and SslForceVersion are unsupported
Gbp-Dch: Full
|
|
Use std::unique_ptr<MethodFd> everywhere we used an
integer-based file descriptor before. This allows us
to implement stuff like TLS support easily.
Gbp-Dch: ignore
|
|
This 'method' is the abstract base for http and https and should as such
be called out like this rather using an easily confused name.
Gbp-Dch: Ignore
|
|
Having a Reset(bool) method to partially reset certain variables like
the download size always were strange, so this commit splits the
ServerState into an additional RequestState living on the stack for as
long as we deal with this request causing an automatic "reset".
There is much to do still to make this code look better, but this is a
good first step which compiles cleanly and passes all tests, so keeping
it as history might be beneficial and due to avoiding explicit memory
allocations it ends up fixing a small memory leak in https, too.
Closes: #440057
|
|
Suggested in #529794
|
|
The 0.0.0.0:0 tor reports is pretty useless by itself, but even if an IP
would be reported it is better to show the user the hostname we wanted
the proxy to connect to in the same error message. We improve upon it
further by looking for this bind address in particular and remap error
messages slightly to give users a better chance of figuring out what
went wrong. Upstream Tor can't do that as it is technically wrong.
|
|
If the server told us in a previous request that it isn't supporting
Ranges with bytes via an Accept-Ranges header missing bytes, we don't
try to formulate requests using Ranges.
|
|
We keep various information bits about the server around, some only
effecting the currently handled file (like sizes) while others
should be persistent (like pipeline detections). http used to reset all
file-related manually, which is a bit silly if we already have a Reset()
method – which does reset all through –, so extending it with a
parameter for reuse and calling it from https too (as this was
previously resetting by just creating a new state struct – it uses no
value of the persistent state-keeping yet as it supports no pipelining).
Gbp-Dch: Ignore
|
|
With apts http transport supporting socks5h proxies and all the work
in terms of configuration of methods based on the name it is called with
it becomes surprisingly easy to implement Tor support equally (and
perhaps even a bit exceeding) what is available currently in
apt-transport-tor.
How this will turn out to be handled packaging wise we will see in
https://lists.debian.org/deity/2016/08/msg00012.html , but until this is
resolved we can add the needed support without actively enabling it for
now, so that this can be tested better.
|
|
Socks support is a requested feature in sofar that the internet is
actually believing Acquire::socks::Proxy would exist. It doesn't and
this commit isn't adding it as that isn't how our configuration works,
but it allows Acquire::http::Proxy="socks5h://…". The HTTPS method was
changed already to support socks proxies (all versions) via curl. This
commit implements only SOCKS5 (RFC1928) with no auth or pass&user auth
(RFC1929), but not GSSAPI which is required by the RFC. The 'h' in the
protocol name further indicates that DNS resolution is delegated to the
socks proxy rather than performed locally.
The implementation works and was tested with Tor as socks proxy for
which implementing socks5h only can actually be considered a feature.
Closes: 744934
|
|
The https method implemented for a long while now a hardcoded fallback
to the same options in http, which, while it works, is rather inflexible
if we want to allow the methods to use another name to change their
behavior slightly, like apt-transport-tor does to https – most of the
diff being s#https#tor#g which then fails to do the full circle
fallthrough tor -> https -> http for https sources. With this config
infrastructure this could be implemented now.
|
|
cURL which backs our https implementation can handle redirects on its
own, but by dealing with them on our own we gain finer control over which
redirections will be performed (we don't like https → http) and by whom
so that redirections to other hosts correctly spawn a new https method
dealing with these instead of letting the current one deal with it.
|
|
Having the detection handled in specific (http) workers means that a
redirection loop over different hostnames isn't detected. Its also not a
good idea have this implement in each method independently even if it
would work
|
|
Closes: #623443
|
|
Followup of b58e2c7c56b1416a343e81f9f80cb1f02c128e25.
Still a regression of sorts of 8b79c94af7f7cf2e5e5342294bc6e5a908cacabf.
Closes: 832044
|
|
methods/http.cc:640:13: runtime error: reference binding to null pointer
of type 'struct FileFd'
This reference is never used in the cases it has a nullptr, so the
practical difference is non-existent, but its a bug still.
Reported-By: gcc -fsanitize=undefined
|
|
All apt versions support numeric as well as 3-character timezones just
fine and its actually hard to write code which doesn't "accidently"
accepts it. So why change? Documenting the Date/Valid-Until fields in
the Release file is easy to do in terms of referencing the
datetime format used e.g. in the Debian changelogs (policy §4.4). This
format specifies only the numeric timezones through, not the nowadays
obsolete 3-character ones, so in the interest of least surprise we should
use the same format even through it carries a small risk of regression
in other clients (which encounter repositories created with
apt-ftparchive).
In case it is really regressing in practice, the hidden option
-o APT::FTPArchive::Release::NumericTimezone=0
can be used to go back to good old UTC as timezone.
The EDSP and EIPP protocols use this 'new' format, the text interface
used to communicate with the acquire methods does not for compatibility
reasons even if none of our methods would be effected and I doubt any
other would (in these instances the timezone is 'GMT' as that is what
HTTP/1.1 requires). Note that this is only true for apt talking to
methods, (libapt-based) methods talking to apt will respond with the
'new' format. It is therefore strongly adviced to support both also in
method input.
|
|
Most servers who close the connection do not send a content-length as
this is redundant information usually, but some might and while testing
with our server and with 'aptwebserver::response-header::Connection' set
to 'close' I noticed that http hangs after a redirect in such cases, so
if we have the information, just use it instead of discarding it.
|
|
We use a wild mixture of C and C++ ways of generating output, so having
a consistent world-view in both styles sounds like a good idea and
should help in preventing regressions.
|
|
Setting the C++ locale via std::locale::global(std::locale("")); which
would otherwise default to the default C locale (aka: unaffected by
setlocale) effects the formatting of numeric types in IO streams, which
for output for humans is perfectly sensible, but breaks our many text
interfaces used and parsed by us and others without expecting the
numbers to be formatted.
Closes: #825396
|
|
Reported-By: gcc -fsanitize=address -fno-sanitize=vptr
Git-Dch: Ignore
|
|
Conflicts:
apt-pkg/pkgcache.h
debian/changelog
methods/https.cc
methods/server.cc
test/integration/test-apt-download-progress
|
|
The apt http code parses Content-Length and Content-Range. For
both requests the variable "Size" is used and the semantic for
this Size is the total file size. However Content-Length is not
the entire file size for partital file requests. For servers that
send the Content-Range header first and then the Content-Length
header this can lead to globbing of Size so that its less than
the real file size. This may lead to a subsequent passing of a
negative number into the CircleBuf which leads to a endless
loop that writes data.
Thanks to Anton Blanchard for the analysis and initial patch.
LP: #1445239
|
|
We do this in HTTP already to give the CPU some exercise while the disk
is heavily spinning (or flashing?) to store the data avoiding the need
to reread the entire file again later on to calculate the hashes – which
happens outside of the eyes of progress reporting, so you might ended up
with a bunch of https workers 'stuck' at 100% while they were busy
calculating hashes.
This is a bummer for everyone using apt as a connection speedtest as the
https method works slower now (not really, it just isn't reporting done
too early anymore).
|
|
Methods get told which hashes are expected by the acquire system, which
means we can use this list to restrict what we calculate in the methods
as any extra we are calculating is wasted effort as we can't compare it
with anything anyway.
Adding support for a new hash algorithm is therefore 'free' now and if a
algorithm is no longer provided in a repository for a file, we
automatically stop calculating it.
In practice this results in a speed-up in Debian as we don't have SHA512
here (so far), so we practically stop calculating it.
|
|
Servers who advertise that they close the connection get the 'Closes'
encoding flag, but this conflicts with servers who response with a
transfer-encoding (e.g. encoding) as it is saved in the same flag.
We have a better flag for the keep-alive (or not) of the connection
anyway, so we check this instead of the encoding.
This is in practice not much of a problem as real servers we talk to are
HTTP1.1 servers (with keep-alive) and there isn't much point in doing
chunked encoding if you are going to close anyway, but our simple
testserver stumbles over this if pressed and its a bit cleaner, too.
Git-Dch: Ignore
|
|
Bug #778375 uncovered that https wasn't properly integrated in the class
family tree of http as it was supposed to be leading to a NULL pointer
dereference. Fixing this 'properly' was deemed to much diff for
practically no gain that late in the release, so commit
0c2dc43d4fe1d026650b5e2920a021557f9534a6 just fixed the synptom, while
this commit here is fixing the cause plus adding a test.
|
|
Real webservers (like apache) actually send an error page with a 416
response, but our client didn't expect it leaving the page on the socket
to be parsed as response for the next request (http) or as file content
(https), which isn't what we want at all… Symptom is a "Bad header line"
as html usually doesn't parse that well to an http-header.
This manifests itself e.g. if we have a complete file (or larger) in
partial/ which isn't discarded by If-Range as the server doesn't support
it (or it is just newer, think: mirror rotation).
It is a sort-of regression of 78c72d0ce22e00b194251445aae306df357d5c1a,
which removed the filesize - 1 trick, but this had its own problems…
To properly test this our webserver gains the ability to reply with
transfer-encoding: chunked as most real webservers will use it to send
the dynamically generated error pages.
(The tests and their binary helpers had to be slightly modified to
apply, but the patch to fix the issue itself is unchanged.)
Closes: 768797
|
|
Real webservers (like apache) actually send an error page with a 416
response, but our client didn't expect it leaving the page on the socket
to be parsed as response for the next request (http) or as file content
(https), which isn't what we want at all… Symptom is a "Bad header line"
as html usually doesn't parse that well to an http-header.
This manifests itself e.g. if we have a complete file (or larger) in
partial/ which isn't discarded by If-Range as the server doesn't support
it (or it is just newer, think: mirror rotation).
It is a sort-of regression of 78c72d0ce22e00b194251445aae306df357d5c1a,
which removed the filesize - 1 trick, but this had its own problems…
To properly test this our webserver gains the ability to reply with
transfer-encoding: chunked as most real webservers will use it to send
the dynamically generated error pages.
Closes: 768797
|
|
Do not drop privileges in the methods when using a older version of
libapt that does not support the chown magic in partial/ yet. To
do this DropPrivileges() now will ignore a empty Apt::Sandbox::User.
Cleanup all hardcoded _apt along the way.
|
|
Communicate the fail reason from the methods to the parent
and Rename() failed files.
|
|
|
|
|
|
When doing Acquire::http{,s}::Proxy-Auto-Detect, run the auto-detect
command for each host instead of only once. This should make using
"proxy" from libproxy-tools feasible which can then be used for PAC
style or other proxy configurations.
Closes: #759264
|
|
This ensures that we can stop downloading if the server send
too much data by accident (or by a malicious attempt)
|
|
|
|
beside reducing code a bit, it avoids oddball problems while building
the string and doesn't trigger static analyse warnings.
|
|
Git-Dch: Ignore
Reported-By: gcc -Wsuggest-attribute={pure,const,noreturn}
|
|
Beside being a bit cleaner it hopefully also resolves oddball problems
I have with high levels of parallel jobs.
Git-Dch: Ignore
Reported-By: iwyu (include-what-you-use)
|
|
server.cc: In member function ‘bool ServerState::HeaderLine(std::string)’:
server.cc:198:72: warning: format ‘%llu’ expects argument of type ‘long long unsigned int*’, but argument 3 has type ‘long long int*’ [-Wformat=]
else if (sscanf(Val.c_str(),"bytes %llu-%*u/%llu",&StartPos,&Size) != 2)
Git-Dch: Ignore
Reported-By: gcc -Wpedantic
|
|
Git-Dch: Ignore
Reported-By: gcc -Wpedantic
|
|
|