summaryrefslogtreecommitdiff
path: root/methods/https.cc
AgeCommit message (Collapse)Author
2017-01-17https: Quote path in URL before passing it to curlJulian Andres Klode
Curl requires URLs to be urlencoded. We are however giving it undecoded URLs. This causes it go completely nuts if there is a space in the URI, producing requests like: GET /a file HTTP/1.1 which the servers then interpret as a GET request for "/a" with HTTP version "file" or some other non-sense. This works around the issue by encoding the path component of the URL. I'm not sure if we should encode other parts of the URL as well, this one seems to do the trick for the actual issue at hand. A more correct fix is to avoid the dequoting and (re-)quoting of URLs when a redirect occurs / a new request is sent. That's been on the radar for probably a year or two now, but nobody bothered implementing that yet. LP: #1651923
2016-12-31rename ServerMethod to BaseHttpMethodDavid Kalnischkies
This 'method' is the abstract base for http and https and should as such be called out like this rather using an easily confused name. Gbp-Dch: Ignore
2016-12-31separating state variables regarding server/requestDavid Kalnischkies
Having a Reset(bool) method to partially reset certain variables like the download size always were strange, so this commit splits the ServerState into an additional RequestState living on the stack for as long as we deal with this request causing an automatic "reset". There is much to do still to make this code look better, but this is a good first step which compiles cleanly and passes all tests, so keeping it as history might be beneficial and due to avoiding explicit memory allocations it ends up fixing a small memory leak in https, too. Closes: #440057
2016-12-08Honour Acquire::ForceIPv4/6 in the https transportLukasz Kawczynski
2016-08-16don't sent Range requests if we know its not acceptedDavid Kalnischkies
If the server told us in a previous request that it isn't supporting Ranges with bytes via an Accept-Ranges header missing bytes, we don't try to formulate requests using Ranges.
2016-08-16reorganize server-states resetting in http/httpsDavid Kalnischkies
We keep various information bits about the server around, some only effecting the currently handled file (like sizes) while others should be persistent (like pipeline detections). http used to reset all file-related manually, which is a bit silly if we already have a Reset() method – which does reset all through –, so extending it with a parameter for reuse and calling it from https too (as this was previously resetting by just creating a new state struct – it uses no value of the persistent state-keeping yet as it supports no pipelining). Gbp-Dch: Ignore
2016-08-11http: auto-configure for local Tor proxy if called as 'tor'David Kalnischkies
With apts http transport supporting socks5h proxies and all the work in terms of configuration of methods based on the name it is called with it becomes surprisingly easy to implement Tor support equally (and perhaps even a bit exceeding) what is available currently in apt-transport-tor. How this will turn out to be handled packaging wise we will see in https://lists.debian.org/deity/2016/08/msg00012.html , but until this is resolved we can add the needed support without actively enabling it for now, so that this can be tested better.
2016-08-10implement generic config fallback for methodsDavid Kalnischkies
The https method implemented for a long while now a hardcoded fallback to the same options in http, which, while it works, is rather inflexible if we want to allow the methods to use another name to change their behavior slightly, like apt-transport-tor does to https – most of the diff being s#https#tor#g which then fails to do the full circle fallthrough tor -> https -> http for https sources. With this config infrastructure this could be implemented now.
2016-08-10use the same redirection handling for http and httpsDavid Kalnischkies
cURL which backs our https implementation can handle redirects on its own, but by dealing with them on our own we gain finer control over which redirections will be performed (we don't like https → http) and by whom so that redirections to other hosts correctly spawn a new https method dealing with these instead of letting the current one deal with it.
2016-08-10fail on unsupported http/https proxy settingsDavid Kalnischkies
Closes: #623443
2016-08-10support all socks-proxy known to curl in https methodDavid Kalnischkies
2016-07-06report all instead of first error up the acquire chainDavid Kalnischkies
If we don't give a specific error to report up it is likely that all error currently in the error stack are equally important, so reporting just one could turn out to be confusing e.g. if name resolution failed in a SRV record list.
2016-07-02use +0000 instead of UTC by default as timezone in outputDavid Kalnischkies
All apt versions support numeric as well as 3-character timezones just fine and its actually hard to write code which doesn't "accidently" accepts it. So why change? Documenting the Date/Valid-Until fields in the Release file is easy to do in terms of referencing the datetime format used e.g. in the Debian changelogs (policy §4.4). This format specifies only the numeric timezones through, not the nowadays obsolete 3-character ones, so in the interest of least surprise we should use the same format even through it carries a small risk of regression in other clients (which encounter repositories created with apt-ftparchive). In case it is really regressing in practice, the hidden option -o APT::FTPArchive::Release::NumericTimezone=0 can be used to go back to good old UTC as timezone. The EDSP and EIPP protocols use this 'new' format, the text interface used to communicate with the acquire methods does not for compatibility reasons even if none of our methods would be effected and I doubt any other would (in these instances the timezone is 'GMT' as that is what HTTP/1.1 requires). Note that this is only true for apt talking to methods, (libapt-based) methods talking to apt will respond with the 'new' format. It is therefore strongly adviced to support both also in method input.
2016-05-28use std::locale::global instead of setlocaleDavid Kalnischkies
We use a wild mixture of C and C++ ways of generating output, so having a consistent world-view in both styles sounds like a good idea and should help in preventing regressions.
2016-04-27refactored no_proxy code to work regardless of where https proxy is setPatrick Cable
when using the https transport mechanism, $no_proxy is ignored if apt is getting it's proxy information from $https_proxy (as opposed to Acquire::https::Proxy somewhere in apt config). if the source of proxy information is Acquire::https::Proxy set in apt.conf (or apt.conf.d), then $no_proxy is honored.
2015-12-27Convert most callers of isspace() to isspace_ascii()Julian Andres Klode
This converts all callers that read machine-generated data, callers that might work with user input are not converted.
2015-11-05apply various suggestions made by cppcheckDavid Kalnischkies
Reported-By: cppcheck Git-Dch: Ignore
2015-11-04wrap every unlink call to check for != /dev/nullDavid Kalnischkies
Unlinking /dev/null is bad, we shouldn't do that. Also, we should print at least a warning if we tried to unlink a file but didn't manage to pull it of (ignoring the case were the file is /dev/null or doesn't exist in the first place). This got triggered by a relatively unlikely to cause problem in pkgAcquire::Worker::PrepareFiles which would while temporary uncompressed files (which are set to keep compressed) figure out that to files are the same and prepare for sharing by deleting them. Bad move. That also shows why not printing a warning is a bad idea as this hide the error for in non-root test runs. Git-Dch: Ignore
2015-11-04set failreasons similar to connect.cc based on curl errorsDavid Kalnischkies
Detecting network errors has some benefits in the acquire system as if we can't connect to a host trying it for a million files is pointless. http and co which use connect.cc deal with this, but https which uses curl had connection failures as "normal" errors which could potentially be worked around (like trying Release instead of the failed InRelease). Git-Dch: Ignore
2015-09-14fix two memory leaks reported by gccDavid Kalnischkies
Reported-By: gcc -fsanitize=address -fno-sanitize=vptr Git-Dch: Ignore
2015-05-22Merge branch 'debian/sid' into debian/experimentalMichael Vogt
Conflicts: apt-pkg/pkgcache.h debian/changelog methods/https.cc methods/server.cc test/integration/test-apt-download-progress
2015-05-22Update methods/https.cc now that ServerState::Size is renamedMichael Vogt
Git-Dch: ignore
2015-05-22Merge remote-tracking branch 'upstream/debian/jessie' into debian/sidMichael Vogt
Conflicts: apt-pkg/deb/dpkgpm.cc
2015-05-22Rename "Size" in ServerState to TotalFileSizeMichael Vogt
The variable "Size" was misleading and caused bug #1445239. To avoid similar issues in the future, rename it to make the meaning more obvious. git-dch: ignore
2015-05-13detect Releasefile IMS hits even if the server doesn'tDavid Kalnischkies
Not all servers we are talking to support If-Modified-Since and some are not even sending Last-Modified for us, so in an effort to detect such hits we run a hashsum check on the 'old' compared to the 'new' file, we got the hashes for the 'new' already for "free" from the methods anyway and hence just need to calculated the old ones. This allows us to detect hits even with unsupported servers, which in turn means we benefit from all the new hit behavior also here.
2015-05-12detect 416 complete file in partial by expected hashDavid Kalnischkies
If we have the expected hashes we can check with them if the file we have in partial we got a 416 for is the expected file. We detected this with same-size before, but not every server sends a good Content-Range header with a 416 response.
2015-04-19calculate hashes while downloading in httpsDavid Kalnischkies
We do this in HTTP already to give the CPU some exercise while the disk is heavily spinning (or flashing?) to store the data avoiding the need to reread the entire file again later on to calculate the hashes – which happens outside of the eyes of progress reporting, so you might ended up with a bunch of https workers 'stuck' at 100% while they were busy calculating hashes. This is a bummer for everyone using apt as a connection speedtest as the https method works slower now (not really, it just isn't reporting done too early anymore).
2015-04-19calculate only expected hashes in methodsDavid Kalnischkies
Methods get told which hashes are expected by the acquire system, which means we can use this list to restrict what we calculate in the methods as any extra we are calculating is wasted effort as we can't compare it with anything anyway. Adding support for a new hash algorithm is therefore 'free' now and if a algorithm is no longer provided in a repository for a file, we automatically stop calculating it. In practice this results in a speed-up in Debian as we don't have SHA512 here (so far), so we practically stop calculating it.
2015-04-19improve https method queue progress reportingDavid Kalnischkies
The worker expects that the methods tell him when they start or finish downloading a file. Various information pieces are passed along in this report including the (expected) filesize. https was using a "global" struct for reporting which made it 'reuse' incorrect values in some cases like a non-existent InRelease fallbacking to Release{,.gpg} resulting in a size-mismatch warning. Reducing the scope and redesigning the setting of the values we can fix this and related issues. Closes: 777565, 781509 Thanks: Robert Edmonds and Anders Kaseorg for initial patchs
2015-04-19do not unlink https file on general errorDavid Kalnischkies
It might be quite interesting which file (content) made curl freak out and other methods keep the file around as well. Git-Dch: Ignore
2015-04-13Revert "HttpsMethod::Fetch(): Zero the FetchResult object when leaving due ↵Michael Vogt
to 404" This reverts commit 1296bc7c466181a7978c313c40a041b34ce3eaeb.
2015-04-07properly handle expected filesize in httpsDavid Kalnischkies
The worker expects that the methods tell him when they start or finish downloading a file. Various information pieces are passed along in this report including the (expected) filesize. https is using a "global" struct for reporting which made it 'reuse' incorrect values in some cases like a non-existent InRelease fallbacking to Release{,.gpg} resulting in an incorrect size-mismatch warning scaring and desensitizing users as well as being subject to a race between the write_data and progress callbacks generating incorrect progress reporting and potentially the same error message. Other branches as well as the bugreports contain 'better' fixes making the struct local and other sensible changes, but are larger as a result, so in this version we opted for short diff with minimal effect above else instead. Closes: 777565, 781509 Thanks: Robert Edmonds and Anders Kaseorg for initial patchs
2015-04-07HttpsMethod::Fetch(): Zero the FetchResult object when leaving due to 404Robert Edmonds
2015-03-16derive more of https from http methodDavid Kalnischkies
Bug #778375 uncovered that https wasn't properly integrated in the class family tree of http as it was supposed to be leading to a NULL pointer dereference. Fixing this 'properly' was deemed to much diff for practically no gain that late in the release, so commit 0c2dc43d4fe1d026650b5e2920a021557f9534a6 just fixed the synptom, while this commit here is fixing the cause plus adding a test.
2015-03-16merge debian/sid into debian/experimentalDavid Kalnischkies
2015-01-05Fix missing URIStart() for https downloadsMichael Vogt
Add a explicit ReceivedData to HttpsMethod that indicates when we got data from the connection so that we can send URISTart() to the parent. This is needed because URIStart got moved in f9b4f12d from the progress_callback to write_data() and it only checks for Res.Size. In the old code if progress_callback is called by libcurl (and sets Res.Size) before write_data is called then URIStart() is never send. Making this a explicit ReceivedData variable fixes this issue.
2014-12-22dispose http(s) 416 error page as non-contentDavid Kalnischkies
Real webservers (like apache) actually send an error page with a 416 response, but our client didn't expect it leaving the page on the socket to be parsed as response for the next request (http) or as file content (https), which isn't what we want at all… Symptom is a "Bad header line" as html usually doesn't parse that well to an http-header. This manifests itself e.g. if we have a complete file (or larger) in partial/ which isn't discarded by If-Range as the server doesn't support it (or it is just newer, think: mirror rotation). It is a sort-of regression of 78c72d0ce22e00b194251445aae306df357d5c1a, which removed the filesize - 1 trick, but this had its own problems… To properly test this our webserver gains the ability to reply with transfer-encoding: chunked as most real webservers will use it to send the dynamically generated error pages. (The tests and their binary helpers had to be slightly modified to apply, but the patch to fix the issue itself is unchanged.) Closes: 768797
2014-12-09dispose http(s) 416 error page as non-contentDavid Kalnischkies
Real webservers (like apache) actually send an error page with a 416 response, but our client didn't expect it leaving the page on the socket to be parsed as response for the next request (http) or as file content (https), which isn't what we want at all… Symptom is a "Bad header line" as html usually doesn't parse that well to an http-header. This manifests itself e.g. if we have a complete file (or larger) in partial/ which isn't discarded by If-Range as the server doesn't support it (or it is just newer, think: mirror rotation). It is a sort-of regression of 78c72d0ce22e00b194251445aae306df357d5c1a, which removed the filesize - 1 trick, but this had its own problems… To properly test this our webserver gains the ability to reply with transfer-encoding: chunked as most real webservers will use it to send the dynamically generated error pages. Closes: 768797
2014-10-13Fix backward compatiblity of the new pkgAcquireMethod::DropPrivsOrDie()Michael Vogt
Do not drop privileges in the methods when using a older version of libapt that does not support the chown magic in partial/ yet. To do this DropPrivileges() now will ignore a empty Apt::Sandbox::User. Cleanup all hardcoded _apt along the way.
2014-10-07Send "Fail-Reason: MaximumSizeExceeded" from the methodMichael Vogt
Communicate the fail reason from the methods to the parent and Rename() failed files.
2014-10-07make expected-size a maximum-size check as this is what we want at this pointMichael Vogt
2014-10-07add ftp expected size checkMichael Vogt
2014-10-07methods/https.cc: use File->Tell() here tooMichael Vogt
2014-10-06make http size check workMichael Vogt
2014-09-24methods: Fail if we cannot drop privilegesJulian Andres Klode
2014-09-24Drop Privileges to "Debian-apt" in most acquire methodsMichael Vogt
Add a new "Debian-apt" user that owns the /var/lib/apt/lists and /var/cache/apt/archive directories. The methods http, https, ftp, gpgv, gzip switch to this user when they start. Thanks to Julian and "ioerror" and tors "switch_id()" code.
2014-09-02Make Proxy-Auto-Detect check for each hostMichael Vogt
When doing Acquire::http{,s}::Proxy-Auto-Detect, run the auto-detect command for each host instead of only once. This should make using "proxy" from libproxy-tools feasible which can then be used for PAC style or other proxy configurations. Closes: #759264
2014-08-26make https honor ExpectedSize as wellMichael Vogt
2014-04-26enforce LFS for partial files in https range requestsDavid Kalnischkies
2014-03-13cleanup headers and especially #includes everywhereDavid Kalnischkies
Beside being a bit cleaner it hopefully also resolves oddball problems I have with high levels of parallel jobs. Git-Dch: Ignore Reported-By: iwyu (include-what-you-use)