summaryrefslogtreecommitdiff
path: root/methods/http.cc
AgeCommit message (Collapse)Author
2020-12-18Implement encoded URI handling in all methodsDavid Kalnischkies
Every method opts in to getting the encoded URI passed along while keeping compat in case we are operated by an older acquire system. Effectively this is just a change for the http-based methods as the others just decode the URI as they work with files directly.
2020-08-11Rewrite HttpServerState::Die()Julian Andres Klode
The old code was fairly confusing, and contradictory. Notably, the second `if` also only applied to the Data state, whereas we already terminated the Data state earlier. This was bad. The else fallback applied in three cases: (1) We reached our limit (2) We are Persistent (3) We are headers Now, it always failed as a transient error if it had nothing left in the buffer. BUT: Nothing left in the buffer is the correct thing to happen if we were fetching content. Checking all combinations for the flags, we can compare the results of Die() between 2.1.7 - the last "known-acceptable-ish" version and this version: 2.1.7 this Data !Persist !Space !Limit OK (A) OK Data !Persist !Space Limit OK (A) OK Data !Persist Space !Limit OK (C) OK Data !Persist Space Limit OK OK Data Persist !Space !Limit ERR ERR * Data Persist !Space Limit OK (B) OK Data Persist Space !Limit ERR ERR Data Persist Space Limit OK OK => Data connections are OK if they have not reached their limit, or are persistent (in which case they'll probably be chunked) Header !Persist !Space !Limit ERR ERR Header !Persist !Space Limit ERR ERR Header !Persist Space !Limit OK OK Header !Persist Space Limit OK OK Header Persist !Space !Limit ERR ERR Header Persist !Space Limit ERR ERR Header Persist Space !Limit OK OK Header Persist Space Limit OK OK => Common scheme here is that header connections are fine if they have read something into the input buffer (Space). The rest does not matter. (A) Non-persistent connections with !space always enter the else clause, hence success (B) no Space means we enter the if/else, we go with else because IsLimit(), and we succeed because we don't have space (C) Having space we do enter the while (WriteSpace()) loop, but we never reach IsLimit(), hence we fall through. Given that our connection is not persistent, we fall through to the else case, and there we win because we have data left to write.
2020-08-11http: Fully flush local file both before/after server readJulian Andres Klode
We do not want to end up in a code path while reading content from the server where we have local data left to write, which can happen if a previous read included both headers and content. Restructure Flush() to accept a new argument to allow incomplete flushs (which do not match our limit), so that it can flush as far as possible, and modify Go() and use that before and after reading from the server.
2020-08-11http: Do not use non-blocking local I/OJulian Andres Klode
This causes some more issues, really.
2020-08-11http: Restore successful exits from Die()Julian Andres Klode
We have successfully finished reading data if our buffer is empty, so we don't need to do any further checks.
2020-08-04http: Always write to the file if there's something to writeJulian Andres Klode
We only add the file to the select() call if we have data to write to it prior to the select() call. This is problematic: Assuming we enter Go() with no data to write to the file, but we read some from the server as well as an EOF, we end up not writing it to the file because we did not add the file to the select. We can't always add the file to the select(), because it's basically always ready and we don't want to wake up if we don't have anything to read or write. So for a solution, let's just always write data to the file if there's data to write to it. If some gets leftover, or if some was already present when we started Go(), it will still be added to the select() call and unblock it. Closes: #959518
2020-07-24http: Redesign reading of pending dataJulian Andres Klode
Instead of reading the data early, disable the timeout for the select() call and read the data later. Also, change Read() to call only once to drain the buffer in such instances. We could optimize this to call read() multiple times if there is also pending stuff on the socket, but that it slightly more complex and should not provide any benefits.
2020-07-24http: On select timeout, error out directly, do not call Die()Julian Andres Klode
The error handling in Die() that's supposed to add useful error messages is not super useful here.
2020-07-24http: Finish copying data from server to file before sending stuff to serverJulian Andres Klode
This avoids a case where we read data, then write to the server and only then realize the connection was closed. It is somewhat slower, though.
2020-07-24http: Die(): Do not flush the buffer, error out insteadJulian Andres Klode
By changing the buffer implementation to return true if it read or wrote something, even on EOF, we should not have a need to flush the buffer in Die() anymore - we should only be calling Die() if the buffer is empty now.
2020-07-24http: Only return false for EOF if we actually did not read anythingJulian Andres Klode
This should avoid the need to Flush the buffer in Die(), because if we read anything, we are returning true, and not entering Die() at that point. Also Write() does not have a concept of EOF, so get rid of code handling that there. Was that copied from Read()?
2020-07-24http: Die(): Merge flushing code from Flush()Julian Andres Klode
Die() needs its own Copy() of Flush() because it needs to return success or failure based on some states, but those are not precisely the same as Flush(), as Flush() will always return false at the end, for example, but we want to fall through to our error handling.
2020-07-24http: Always Close() the connection in Die()Julian Andres Klode
If we reached Die() there was an issue with the server connection, so we should always explicitly close it.
2020-07-02Reorder config check before checking systemd for non-interactive httpDavid Kalnischkies
If this option is disabled (which it is by default in Debian), we don't have to make the call and the checks around it. Not that it really matters that much as if it would we would be better checking only once.
2020-06-23Replace some magic 64*1024 with APT_BUFFER_SIZEJulian Andres Klode
2020-04-09ubuntu: http: Add non-interactive to user agent if run by systemdJulian Andres Klode
Include that apt is being run from a service in the user agent, so traffic can be analysed for interactive vs non-interactive use, and prioritised accordingly. It looks like this now: User-Agent: Debian APT-HTTP/1.3 (2.0.1) non-interactive A previous version included the full service names, but this raised some privacy concerns. LP: #1825000
2019-07-10Fix typos reported by codespell in code commentsDavid Kalnischkies
Also in old changelogs, but nothing really user visible like error messages or alike so barely noteworthy. Reported-By: codespell Gbp-Dch: Ignore
2019-07-08Apply various suggestions by cppcheckDavid Kalnischkies
Reported-By: cppcheck
2019-06-11http: Fix Host header in proxied https connectionsSimon Körner
Currently CONNECT requests use the name of the proxy as Host value, instead of the origin server's name. According to RFC 2616 "The Host field value MUST represent the naming authority of the origin server or gateway given by the original URL." The current implementation causes problems with some proxy vendors. This commit fixes this. [jak: Adding a test case] See merge request apt-team/apt!66
2019-04-30apt-pkg: URI: Add 'explicit' to single argument constructorJulian Andres Klode
This needs a fair amount of changes elsewhere in the code, hence this is separate from the previous commits.
2018-11-25Allow setting Referer header for http methodDavid Kalnischkies
Not needed for common interactions, but for some download-file interactions it could be useful to set a specific referer as some servers do not serve requested files otherwise.
2018-11-13Revert "http: Fix handling of server connection closure"Julian Andres Klode
This reverts commit fb3f36593563d09a8d1727cc7c6deb0b49823ca2. It caused downloads to hang on long-lived connections on certain servers. Gbp-Dch: full
2018-11-12http: Fix handling of server connection closureJulian Andres Klode
If the server closed the connection while we're reading data, and we end up not having any data left to write; that is, for example, we received 0 bytes, then we did not exit before, as we only returned success if there was data to write. This is wrong: Obviously, if we have reached our limit, we are done anyway. It's a bit unclear if we actually ever reached this part, but it does make some sense wrt the bug below. LP: #1801338
2018-05-29Use steady clock source for bandwidth limitationDavid Kalnischkies
Using the time of day for this is slightly wrong just like it is for progress, just less visible.
2018-05-28Remove unused time-tracking from http methodDavid Kalnischkies
The Stats method isn't called anywhere, was partly commented out before, but we keep updating the time for it – lets avoid this pointless busywork. Gbp-Dch: Ignore
2018-05-24Lower default timeout from 120s to 30sJulian Andres Klode
120s is an insanely high default time out, lower it to 30s to make things a bit nicer.
2018-05-07Remove obsolete RCS keywordsGuillem Jover
Prompted-by: Jakub Wilk <jwilk@debian.org>
2018-01-03reimplement and simplify mirror:// methodDavid Kalnischkies
Embedding an entire acquire stack and HTTP logic in the mirror method made it rather heavy weight and fragile. This reimplement goes the other way by doing only the bare minimum in the method itself and instead redirect the actual download of files to their proper methods. The reimplementation drops the (in the real world) unused query-string feature as it isn't really implementable in the new architecture.
2017-12-13report transient errors as transient errorsDavid Kalnischkies
The Fail method for acquire methods has a boolean parameter indicating the transient-nature of a reported error. The problem with this is that Fail is called very late at a point where it is no longer easily identifiable if an error is indeed transient or not, so some calls were and some weren't and the acquire system would later mostly ignore the transient flag and guess by using the FailReason instead. Introducing a tri-state enum we can pass the information about fatal or transient errors through the callstack to generate the correct fails.
2017-11-19Also look at https_proxy for https URLsJulian Andres Klode
We accidentally regressed here in 1.5 when replacing the https method.
2017-10-22Sandbox methods with seccomp-BPF; except cdrom, gpgv, rshJulian Andres Klode
This reduces the number of syscalls to about 140 from about 350 or so, significantly reducing security risks. Also change prepare-release to ignore the architecture lists in the build dependencies when generating the build-depends package for travis. We might want to clean up things a bit more and/or move it somewhere else.
2017-10-22Run Proxy-Auto-Detect script from main processJulian Andres Klode
This avoids running the Proxy-Auto-Detect script inside the untrusted (well, less trusted for now) sandbox. This will allow us to restrict the http method from fork()ing or exec()ing via seccomp.
2017-07-26allow the auth.conf to be root:root ownedDavid Kalnischkies
Opening the file before we drop privileges in the methods allows us to avoid chowning in the acquire main process which can apply to the wrong file (imagine Binary scoped settings) and surprises users as their permission setup is overridden. There are no security benefits as the file is open, so an evil method could as before read the contents of the file, but it isn't worse than before and we avoid permission problems in this setup.
2017-07-26reimplement and document auth.confDavid Kalnischkies
We have support for an netrc-like auth.conf file since 0.7.25 (closing 518473), but it was never documented in apt that it even exists and netrc seems to have fallen out of usage as a manpage for it no longer exists making the feature even more arcane. On top of that the code was a bit of a mess (as it is written in c-style) and as a result the matching of machine tokens to URIs also a bit strange by checking for less specific matches (= without path) first. We now do a single pass over the stanzas. In practice early adopters of the undocumented implementation will not really notice the differences and the 'new' behaviour is simpler to document and more usual for an apt user. Closes: #811181
2017-07-26fail early in http if server answer is too small as wellDavid Kalnischkies
Failing on too much data is good, but we can do better by checking for exact filesizes as we know with hashsums how large a file should be, so if we get a file which has a size we do not expect we can drop it directly, regardless of if the file is larger or smaller than what we expect which should catch most cases which would end up as hashsum errors later now a lot sooner.
2017-07-26fail earlier if server answers with too much dataDavid Kalnischkies
We tend to operate on rather large static files, which means we usually get Content-Length information from the server. If we combine this information with the filesize we are expecting (factoring in pipelining) we can avoid reading a bunch of data we are ending up rejecting anyhow by just closing the connection saving bandwidth and time both for the server as well as the client.
2017-07-12Reformat and sort all includes with clang-formatJulian Andres Klode
This makes it easier to see which headers includes what. The changes were done by running git grep -l '#\s*include' \ | grep -E '.(cc|h)$' \ | xargs sed -i -E 's/(^\s*)#(\s*)include/\1#\2 include/' To modify all include lines by adding a space, and then running ./git-clang-format.sh.
2017-07-03Stop bragging about old speeds in http.cc commentsJulian Andres Klode
That's just ridiculous these days. Gbp-Dch: ignore
2017-06-30http: Add support for https:// proxiesJulian Andres Klode
HTTPS proxies just require unwrapping the TLS layer at the proxy connection, that's easy, and of course sending proxy-specific headers that are sent on "http" proxies.
2017-06-30http: Add support for CONNECT proxying to HTTPS locationsJulian Andres Klode
Proxying HTTPS traffic requires the proxy providing the CONNECT method. This implements the client side of it, although it is a bit hacky. HTTP connect is a normal HTTP CONNECT request, followed by a normal HTTP response, just that the body of the response is the TCP stream of the target host. We use a special wrapper in case there are data bytes in the header packets - in that case, the bytes are stored in a buffer and the buffer will be drained first, afterwards the connection continues directly with the TCP stream (with one more vcall). Also: Do not send full URI to https destinations when proxying, as we are directly interfacing with the destination data stream.
2017-06-28support tor+https being handled by httpDavid Kalnischkies
The apt-transport-tor package operates via simple symlinks which can result in 'http' being called as 'tor+https', so it must pick up the right configuration pieces and trigger https support also in plus names.
2017-06-28methods: http: Drain pending data before selectingJulian Andres Klode
GnuTLS can already have data pending in its buffers, we need to to drain that first otherwise select() might block indefinitely. Gbp-Dch: ignore
2017-06-28methods: Add HTTPS support to http method, using GnuTLSJulian Andres Klode
The http method will eventually replace the curl-based https method, but for now, this is an opt-in experiment that can be enabled by setting Dir::Bin::Methods::https to "http". Known issues: - We do not support HTTPS proxies yet - We do not support proxying HTTPS connections yet (CONNECT) - IssuerCert and SslForceVersion are unsupported Gbp-Dch: Full
2017-06-28methods: connect: Switch from int fds to new MethodFdJulian Andres Klode
Use std::unique_ptr<MethodFd> everywhere we used an integer-based file descriptor before. This allows us to implement stuff like TLS support easily. Gbp-Dch: ignore
2016-12-31rename ServerMethod to BaseHttpMethodDavid Kalnischkies
This 'method' is the abstract base for http and https and should as such be called out like this rather using an easily confused name. Gbp-Dch: Ignore
2016-12-31separating state variables regarding server/requestDavid Kalnischkies
Having a Reset(bool) method to partially reset certain variables like the download size always were strange, so this commit splits the ServerState into an additional RequestState living on the stack for as long as we deal with this request causing an automatic "reset". There is much to do still to make this code look better, but this is a good first step which compiles cleanly and passes all tests, so keeping it as history might be beneficial and due to avoiding explicit memory allocations it ends up fixing a small memory leak in https, too. Closes: #440057
2016-11-11http: skip connection cleanup if we close it anyhowDavid Kalnischkies
Suggested in #529794
2016-11-10improve SOCKS error messages for http slightlyDavid Kalnischkies
The 0.0.0.0:0 tor reports is pretty useless by itself, but even if an IP would be reported it is better to show the user the hostname we wanted the proxy to connect to in the same error message. We improve upon it further by looking for this bind address in particular and remap error messages slightly to give users a better chance of figuring out what went wrong. Upstream Tor can't do that as it is technically wrong.
2016-08-16don't sent Range requests if we know its not acceptedDavid Kalnischkies
If the server told us in a previous request that it isn't supporting Ranges with bytes via an Accept-Ranges header missing bytes, we don't try to formulate requests using Ranges.
2016-08-16reorganize server-states resetting in http/httpsDavid Kalnischkies
We keep various information bits about the server around, some only effecting the currently handled file (like sizes) while others should be persistent (like pipeline detections). http used to reset all file-related manually, which is a bit silly if we already have a Reset() method – which does reset all through –, so extending it with a parameter for reuse and calling it from https too (as this was previously resetting by just creating a new state struct – it uses no value of the persistent state-keeping yet as it supports no pipelining). Gbp-Dch: Ignore