summaryrefslogtreecommitdiff
path: root/apt-pkg/acquire-item.cc
AgeCommit message (Collapse)Author
2016-11-14imbue .diff/Index parsing with C.UTF-8 as wellDavid Kalnischkies
In 3bdff17c894d0c3d0f813d358fc45d7a263f3552 we did it for the datetime parsing, but we use the same style in the parsing for pdiff (where the size of the file is in the middle of the three fields) so imbueing here as well is a good idea. (cherry picked from commit 1136a707b7792394ea4b1d039dda4f321fec9da4)
2016-11-14prevent C++ locale number formatting in text APIs (try 2)David Kalnischkies
Followup of b58e2c7c56b1416a343e81f9f80cb1f02c128e25. Still a regression of sorts of 8b79c94af7f7cf2e5e5342294bc6e5a908cacabf. Closes: 832044 (cherry picked from commit 7303e11ff28f920a6277c159aa46f80c007350bb)
2016-10-05changelog: Respect Dir setting for local changelog gettingJulian Andres Klode
This fixes issues with chroots, but the goal here was to get the test suite working on systems without dpkg. (cherry picked from commit 2ed62ba6abcad809d1898a40950f86217af73812)
2016-10-05set the correct item FileSize in by-hash caseDavid Kalnischkies
In af81ab9030229b4ce6cbe28f0f0831d4896fda01 we implement by-hash as a special compression type, which breaks this filesize setting as the code is looking for a foobar.by-hash file then. Dealing this slightly gets us the intended value. Note that this has no direct effect as this value will be set in other ways, too, and could only effect progress reporting. Gbp-Dch: Ignore (cherry picked from commit 3084ef2292642d43e533654354a4929abe55d91b)
2016-08-31rred: truncate result file before writing to itDavid Kalnischkies
If another file in the transaction fails and hence dooms the transaction we can end in a situation in which a -patched file (= rred writes the result of the patching to it) remains in the partial/ directory. The next apt call will perform the rred patching again and write its result again to the -patched file, but instead of starting with an empty file as intended it will override the content previously in the file which has the same result if the new content happens to be longer than the old content, but if it isn't parts of the old content remain in the file which will pass verification as the new content written to it matches the hashes and if the entire transaction passes the file will be moved the lists/ directory where it might or might not trigger errors depending on if the old content which remained forms a valid file together with the new content. This has no real security implications as no untrusted data is involved: The old content consists of a base file which passed verification and a bunch of patches which all passed multiple verifications as well, so the old content isn't controllable by an attacker and the new one isn't either (as the new content alone passes verification). So the best an attacker can do is letting the user run into the same issue as in the report. Closes: #831762 (cherry picked from commit 0e071dfe205ad21d8b929b4bb8164b008dc7c474)
2016-08-31verify hash of input file in rredDavid Kalnischkies
We read the entire input file we want to patch anyhow, so we can also calculate the hash for that file and compare it with what he had expected it to be. Note that this isn't really a security improvement as a) the file we patch is trusted & b) if the input is incorrect, the result will hardly be matching, so this is just for failing slightly earlier with a more relevant error message (althrough, in terms of rred its ignored and complete download attempt instead). (cherry picked from commit 6e71ec6fcdcaa926c98fa58cd4af38e42556df15)
2016-05-10don't show NO_PUBKEY warning if repo is signed by another keyDavid Kalnischkies
Daniel Kahn Gillmor highlights in the bugreport that security isn't improving by having the user import additional keys – especially as importing keys securely is hard. The bugreport was initially about dropping the warning to a notice, but in given the previously mentioned observation and the fact that we weren't printing a warning (or a notice) for expired or revoked keys providing a signature we drop it completely as the code to display a message if this was the only key is in another path – and is considered critical. Closes: 618445 (Backported from commit fb7b11ebb852fa255053ecab605bc9cfe9de0603)
2016-04-14ensure outdated files are dropped without lists-cleanupDavid Kalnischkies
Tested via (newly) empty index files, but effects also files dropped from the repository or an otherwise changed repository config.
2016-04-14silently skip acquire of empty index filesDavid Kalnischkies
There is just no point in taking the time to acquire empty files – especially as it will be tiny non-empty compressed files usually.
2016-04-14fix Alt-Filename handling of file methodDavid Kalnischkies
A silly of-by-one error in the stripping of the extension to check for the uncompressed filename broken in an attempt to support all compressions in commit a09f6eb8fc67cd2d836019f448f18580396185e5. Fixing this highlights also mistakes in the handling of the Alt-Filename in libapt which would cause apt to remove the file from the repository (if root has the needed rights – aka the disk isn't readonly or similar)
2016-04-07stop handling items in doomed transactionsDavid Kalnischkies
With the previous commit we track the state of transactions, so we can now use our knowledge to avoid processing data for a transaction which was already closed (via an abort in this case). This is needed as multiple independent processes are interacting in the process, so there isn't a simple immediate full-engine stop and it would also be bad to teach each and every item how to check if its manager has failed subordinate and what to do in that case. In the pdiff case, which deals (potentially) with many items during its lifetime e.g. a hashsum mismatch in another file can abort the transaction the file we try to patch via pdiff belongs to. This causes some of the items (which are already done) to be aborted with it, but items still in the process of acquisition continue in the processing and will later try to use all the items together failing in strange ways as cleanup already happened. The chosen solution is to dry up the communication channels instead by ignoring new requests for data acquisition, canceling requests which are not assigned to a queue and not calling Done/Failed on items anymore. This means that e.g. already started or pending (e.g. pipelined) downloads aren't stopped and continue as normal for now, but they remain in partial/ and aren't processed further so the next update command will pick them up and put them to good use while the current process fails updating (for this transaction group) in an orderly fashion. Closes: 817240 Thanks: Barr Detwix & Vincent Lefevre for log files
2016-04-07ensure transaction states are changed only onceDavid Kalnischkies
We want to keep track of the state of a transaction overall to base future decisions on it, but as a pre-requirement we have to make sure that a transaction isn't commited twice (which happened if the download of InRelease failed and Release takes over). It also happened to create empty commits after a transaction was already aborted in cases in which the Release files were rejected. This isn't effecting security at the moment, but to ensure this isn't happening again and can never be bad a bunch of fatal error messages are added to make regressions on this front visible.
2016-03-19refactor loading of previous release fileDavid Kalnischkies
There is really no need to have the same code three times. Git-Dch: Ignore
2016-03-16Get accurate progress reporting in apt update againMichael Vogt
For the non-pdiff case, we have can have accurate progress reporting because after fetching the {,In}Release files we know how many IndexFiles will be fetched and what size they have. Therefore init the filesize early (in pkgAcqIndex::Init) and ensure that in Acquire::Pulse() looks at already downloaded bits when calculating the progress in Acquire::Pulse. Also improve debug output of Debug::acquire::progress
2016-03-14don't use Desc.URI to calculate .diff/Index filenamesDavid Kalnischkies
The URI descibing an item can change via mirrors/redirectors which causes the .diff/Index files to get the wrong names in storage. Git-Dch: Ignore
2016-03-14require $(HASH)-Download field in .diff/Index filesDavid Kalnischkies
Now that we ignore SHA1-only files it makes sense to require also the provision of hashes for the compressed patches as this was introduced in the same patchset as support for non-SHA1 hashes in the file itself in dak and adding support in other archive creators (if they support pdiffs at all) will likely be in the same batch. The reason for the change itself is simple: If you are 'scared' enough about the security of SHA1, you shouldn't uncompress a file you haven't verified at all – after all, it could be exploiting a bug or a zip bomb.
2016-03-07Fix several typosVeres Lajos
This effectively merges branch 'typofixes-vlajos-20150807' of github.com:vlajos/apt with the following commit: commit 13cacb3e2e2352ba701e769fc889e3344fabbf7e Author: Veres Lajos <vlajos@gmail.com> Date: Sun Aug 9 00:12:53 2015 +0100 typofix - https://github.com/vlajos/misspell_fixer It has been rebased for a better commit message.
2016-03-06do not move not-failed pdiff-patches into CWD on failureDavid Kalnischkies
If a single pdiff fails, we have to fail the entire patching endeavour and fall back to getting the complete file instead. That is easy in serverside merged pdiffs as we get them one by one. For clientside we get them all at once through, which means that a failure in one has to stop the entire pipeline, which works as expected (as proven by the bugreporters as they don't even notice it happening). The problem is just that the first failing pdiff will do the cleanup, so another pdiff which happens to be successfully acquired after we processed the failure doesn't find the file it is supposed to use as a basename anymore, so the patch is renamed to what should be the unique extension and moved into the current working directory. Processing is then stopped as the patch realizes that it isn't the last one which completed downloading. On the plus side this means this is neither us using a bad temporary location nor a security problem. It "just" overrides unconditionally files in your current working directory (if you happen to have them named like a pdiff patch – a bit unlikely perhaps) and so drops files there which are never used again. I guess this was introduced in 4e3c5633b1e74b4f58b95f339cfbbf4cbf21ab3e for real as I made the need for the existence of the base file rather explicit, but the potential lingers in the code for far longer. Closes: #816837
2016-03-06deal with partially downloaded changelogsDavid Kalnischkies
Changelogs are relatively small and we have no hashes for them, but we had partial support for them before, so lets stick to it. This also deletes the (partial) file before moving the downloaded file into its place – rename(2) should be doing this by itself, but testing on semaphoreci suggests that this isn't always the case (error is "Stale file handle") and we don't need an atomic replace here, so be explicit. Git-Dch: Ignore
2016-02-26Add missing numeric includes in files using std::accumulate()Julian Andres Klode
Reported-By: Helmut Grohne on IRC
2016-02-11always download changelogs into /tmp firstDavid Kalnischkies
pkgAcqChangelog has the default behaviour of downloading a changelog to a temporary directory (inside /tmp, not /tmp directly), which is cleaned up on shutdown, but this can be overridden to store the changelog more permanently – but that caries a permission problem. For changelog we can 'easily' solve this by always downloading to a temporary directory and only move it out of there on done.
2016-02-11use local changelog from /usr/share/doc if possibleDavid Kalnischkies
If pkgAcqChangelog is told to acquire the changelog for a version it will check first if this version is installed on the disk and if so will use the local changelog in /usr/share/doc (possibily/likely gz compressed) instead of downloading the file from the web. An option is provided to disable this, which is enabled by default for the Ubuntu vendor as they truncate the local changelogs – and for apts --print-uris action.
2016-01-08remove uncompressed leftover partial file before pdiff bootstrapDavid Kalnischkies
The code already deals with compressed leftovers, but forgot the uncompressed files. The opertunity is picked to reorder this code and add debug messages about the actions taken as well as produce such a leftover file in the associated testcase.
2016-01-08use filesize of compressed pdiffs for the limit if possibleDavid Kalnischkies
With the addition of the $HASH-Download field in the .diff/Index we got the size of the compressed patches for 'free', so if that information is available we can use it for a more fitting calculation of the size requirements of the patches vs. the complete file. Note that this predicts a too small size in the transition case in which the information isn't available for all patches, but figuring this out would be a lot of code for practically nothing as only one update can ever be in such a transition phase.
2016-01-08keep compressed indexes in a low-cost formatDavid Kalnischkies
Downloading and storing are two different operations were different compression types can be preferred. For downloading we provide the choice via Acquire::CompressionTypes::Order as there is a choice to be made between download size and speed – and limited by whats available in the repository. Storage on the other hand has all compressions currently supported by apt available and to reduce runtime of tools accessing these files the compression type should be a low-cost format in terms of decompression. apt traditionally stores its indexes uncompressed on disk, but has options to keep them compressed. Now that apt downloads additional files we also deal with files which simply can't be stored uncompressed as they are just too big (like Contents for apt-file). Traditionally they are downloaded in a low-cost format (gz) as repositories do not provide other formats, but there might be even lower-cost formats and for download we could introduce higher-cost in the repositories. Downloading an entire index potentially requires recompression to another format, so an update takes potentially longer – but big files are usually updated via pdiffs which has to de- and re-compress anyhow and does it on the fly anyhow, so there is no extra time needed and in general it seems to be benefitial to invest the time in update to save time later on file access.
2016-01-08allow pdiff bootstrap from all supported compressorsDavid Kalnischkies
There is no reason to enforce that the file we start the bootstrap with is compressed with a compressor which is available online. This allows us to change the on-disk format as well as deals with repositories adding/removing support for a specific compressor.
2016-01-08ensure compression cleanup even without lists-cleanupDavid Kalnischkies
If we store files compressed in lists/ and the file switched compression formats we happened to retain the "old" format, but by default the cleanup process catched this oversight and removed the file. [The initial situation described doesn't arise as we store no files by default compressed and even with apt-file configuring Contents files, we don't really have that problem as there is just .gz files for those.] We solve this by just removing any uncompressed as well as compressed (we support) file just before we move the 'new' version of the file in.
2016-01-08use one 'store' method to rule all (de)compressorsDavid Kalnischkies
Adding a new compressor method meant adding a new method as well – even if that boilt down to just linking to our generalized decompressor with a new name. That is unneeded busywork if we can instead just call the generalized decompressor and let it figure out which compressor to use based on the filenames rather than by program name. For compatibility we ship still 'gzip', 'bzip2' and co, but they are just links to our "new" 'store' method.
2015-12-27allow repositories to forbid arch:all for specific index targetsDavid Kalnischkies
Debian has a Packages file for arch:all already, but the arch:any files contain arch:all packages as well, so downloading it would be a total waste of resources. Getting this solved is on the list of things to do, but it is also the hardest part – for index targets like Contents the situation is much easier and less server/client implementations are involved so we might not want to stall them. A repository can now declare via: No-Support-for-Architecture-all: Packages that even if an arch:all Packages exists, it shouldn't be downloaded, so that support for Contents files can be added now. See also 1dd20368486820efb6ef4476ad739e967174bec4 for the implementation of downloading arch:all index targets, which this is limiting. The field uses the name of the target from the apt configuration for simplicity and is negative by design as this field is intended to be supported/needed only for a "short" time (one or two Debian releases). While this commit theoretically supports any target, its expected to only see "Packages" as a value in reality.
2015-12-14show a more descriptive error for weak Release filesDavid Kalnischkies
If we can't work with the hashes we parsed from the Release file we display now an error message if the Release file includes only weak hashes instead of downloading the indexes and failing to verify them with "Hash Sum mismatch" even through the hashes didn't mismatch (they were just weak). If for some (unlikely) reason we have got weak hashes only for individual targets we will show a warning to this effect (again, befor downloading and failing the index itself). Closes: 806459
2015-12-13parse .diff/Index hashes in reverse orderDavid Kalnischkies
Reversing the parsing order ensures that we parse weaker hashes (like SHA1) before we touch newer/stronger hashes (like SHA256) as the weaker ones will usually be there for a longer time already with data already present, which we would discard if we start with the strong one first. The discarding is visible in the debug logs: File X wasn't in the list for the first parsed hash! (history) File X wasn't in the list for the first parsed hash! (patches) which if file X is part of the patch-path means apt will not find a path and fallback to acquire the whole file instead needlessly. If file X isn't part of the patch-path that is no problem, so that effects only the update-call which updates with patches coming from before and after the addition of a new hash.
2015-12-02use @CHANGEPATH@ as placeholder in changelog URI templatesDavid Kalnischkies
This should make it more obvious that CHANGEPATH is a placeholder which apt will replace with a package specific path rather than a string constant. Mail-Reference: <87d1upgvaf.fsf@deep-thought.43-1.org> Mail-Archive: https://lists.debian.org/debian-dak/2015/12/msg00005.html
2015-11-25slightly rephrase notice shown for insecure repositoriesJustin B Rye
Git-Dch: Ignore
2015-11-21review of new/changed translatable program stringsJustin B Rye
Reference mail: https://lists.debian.org/debian-l10n-english/2015/11/msg00006.html
2015-11-21do not sent Last-Modified if we expect a changed fileDavid Kalnischkies
In 8d041b4f we made apt figure out based on the last Release file it has if it should request a file or not given that the hashes changed or not. So if we have a last Release file and do a request, do not sent a Last-Modified header as we expect a change so much that a non-change would indeed be an error. The Last-Modified header is therefore at best ignored by the server, so sending it is just wasted effort. In the worst case as time is a fragile thing the server decides against sending us an update with the idea that we already have the latest content, which we know for a fact that we haven't. Given that we sent less information to the server our request is on its own also less identifiable as coming from a returning or new user. The disadvantage is that if we end up getting an old index file after getting a new Release file from another mirror the old mirror will not be able to tell us 'Hit', but instead sends us the complete file we discard, but both lets us end up with the same error class in the end, so the difference isn't big in practice.
2015-11-05"support" unsigned Release files without hashes againDavid Kalnischkies
This 'ignores' the component Release files you can find in Debian alongside the binary-* directories, which isn't exactly a common usecase, but it worked before, so lets support it again as this isn't worse than a valid Release file which is unsigned. Git-Dch: Ignore
2015-11-05apply various suggestions made by cppcheckDavid Kalnischkies
Reported-By: cppcheck Git-Dch: Ignore
2015-11-04wrap every unlink call to check for != /dev/nullDavid Kalnischkies
Unlinking /dev/null is bad, we shouldn't do that. Also, we should print at least a warning if we tried to unlink a file but didn't manage to pull it of (ignoring the case were the file is /dev/null or doesn't exist in the first place). This got triggered by a relatively unlikely to cause problem in pkgAcquire::Worker::PrepareFiles which would while temporary uncompressed files (which are set to keep compressed) figure out that to files are the same and prepare for sharing by deleting them. Bad move. That also shows why not printing a warning is a bad idea as this hide the error for in non-root test runs. Git-Dch: Ignore
2015-11-04support arch:all data e.g. in separate Packages fileDavid Kalnischkies
Based on a discussion with Niels Thykier who asked for Contents-all this implements apt trying for all architecture dependent files to get a file for the architecture all, which is treated internally now as an official architecture which is always around (like native). This way arch:all data can be shared instead of duplicated for each architecture requiring the user to download the same information again and again. There is one problem however: In Debian there is already a binary-all/ Packages file, but the binary-any files still include arch:all packages, so that downloading this file now would be a waste of time, bandwidth and diskspace. We therefore need a way to decide if it makes sense to download the all file for Packages in Debian or not. The obvious answer would be a special flag in the Release file indicating this, which would need to default to 'no' and every reasonable repository would override it to 'yes' in a few years time, but the flag would be there "forever". Looking closer at a Release file we see the field "Architectures", which doesn't include 'all' at the moment. With the idea outlined above that 'all' is a "proper" architecture now, we interpret this field as being authoritative in declaring which architectures are supported by this repository. If it says 'all', apt will try to get all, if not it will be skipped. This gives us another interesting feature: If I configure a source to download armel and mips, but it declares it supports only armel apt will now print a notice saying as much. Previously this was a very cryptic failure. If on the other hand the repository supports mips, too, but for some reason doesn't ship mips packages at the moment, this 'missing' file is silently ignored (= that is the same as the repository including an empty file). The Architectures field isn't mandatory through, so if it isn't there, we assume that every architecture is supported by this repository, which skips the arch:all if not listed in the release file.
2015-11-04centralize unlink checks in acquire-itemDavid Kalnischkies
Removals in the acquire progress can be pretty important, so a failure should be silently ignored, so we wrap our unlink call in a slightly more forgiving wrapper checking things. Git-Dch: Ignore
2015-11-04do not cleanup .diff/Index files on HitDavid Kalnischkies
Git-Dch: Ignore
2015-11-04refer to apt-secure(8) in unsecure repositories warningDavid Kalnischkies
The manpage is also slightly updated to work better as a central hub to push people from all angles into the right directions without writting a book disguised as an error message.
2015-11-04rework errors and warnings around insecure repositoriesDavid Kalnischkies
Insecure (aka unsigned) repositories are bad, period. We want to get right of them finally and as a first step we are printing scary warnings. This is already done, this commit just changes the messages to be more consistent and prevents them from being displayed if authenticity is guaranteed some other way (as indicated with trusted=yes). The idea is to first print the pure fact like "repository isn't signed" as a warning (and later as an error), while giving an explaination in a immediately following notice (which is displayed only in quiet level 0: so in interactive use, not in scripts and alike). Closes: 796549
2015-11-04unbreak the copy-method claiming hashsum mismatch since ~exp9David Kalnischkies
Commit 653ef26c70dc9c0e2cbfdd4e79117876bb63e87d broke the camels back in sofar that everything works in terms of our internal use of copy:/, but external use is completely destroyed. This is kinda the reverse of what happened in "parallel" in the sid branch, where external use was mostly fine, internal and external exploded on the GzipIndexes option. We fix this now by rewriting our internal use by letting copy:/ only do what the name suggests it does: Copy files and not uncompress them on-the-fly. Then we teach copy and the uncompressors how to deal with /dev/null and use it as destination file in case we don't want to store the uncompressed files on disk. Closes: 799158
2015-09-14fallback to well-known URI if by-hash failsDavid Kalnischkies
We uses a small trick to implement the fallback: We make it so, that by-hash is a special compression algorithm and apt already knows how to deal with fallback between compression algorithms. The drawback with implementing this fallback is that a) we are guessing again and more importantly b) by-hash is only tried for the first compression algorithm we want to acquire, not for all as before – but flipping between by-hash and well-known for each compression algorithm seems to be not really worth it as it seems unlikely that there will actually be mirrors who only mirror a subset of compressioned files, but have by-hash enabled. The user-experience is the usual fallback one: You see "Ign" lines in the apt update output. The fallback is implemented as a transition feature, so a (potentially huge) mirror network doesn't need a flagday. It is not meant as a "someday we might" or "we don't, but some of our mirrors might" option – we want to cut down on the 'Ign' lines front so that they become meaningful – if we wanted to spam everyone with them, we could enable by-hash by default for all repositories… sources.list and config options are better suited for this. Closes: 798919
2015-09-14add by-hash sources.list option and document all of by-hashDavid Kalnischkies
This changes the semantics of the option (which is renamed too) to be a yes/no value with the special additional value "force" as this allows by-hash to be disabled even if the repository indicates it would be supported and is more in line with our other yes/no options like pdiff which disable themselves if no support can be detected. The feature wasn't documented so far and hasn't reached a (un)stable release yet, so changing it without trying too hard to keep compatibility seems okay.
2015-09-14avoid using global PendingError to avoid failing too often too soonDavid Kalnischkies
Our error reporting is historically grown into some kind of mess. A while ago I implemented stacking for the global error which is used in this commit now to wrap calls to functions which do not report (all) errors via return, so that only failures in those calls cause a failure to propergate down the chain rather than failing if anything (potentially totally unrelated) has failed at some point in the past. This way we can avoid stopping the entire acquire process just because a single source produced an error for example. It also means that after the acquire process the cache is generated – even if the acquire process had failures – as we still have the old good data around we can and should generate a cache for (again). There are probably more instances of this hiding, but all these looked like the easiest to work with and fix with reasonable (aka net-positive) effects.
2015-08-31ignore for _apt inaccessible TMPDIR in pkgAcqChangelogDavid Kalnischkies
Using libpam-tmpdir caused us to create our download tmp directory in root's private tmp before changing to _apt, which wouldn't have access to it. By extending our GetTempDir method with an optional wrapper changing the effective user, we can test if a given user can access the directory and ignore TMPDIR if not instead of ignoring TMPDIR completely. Closes: 797270
2015-08-28implement PDiff patching for compressed filesDavid Kalnischkies
Some additional files like 'Contents' are very big and should therefore kept compressed on the disk, which apt-file did in the past. It also implemented pdiff patching of these files by un- and recompressing these files on-the-fly, with this commit we can do the same – but we can do this in both pdiff patching styles (client and server merging) and secured by hashes. Hashes are in so far slightly complicated as we can't compare the hashes of the compressed files as we might compress them differently than the server would (different compressor versions, options, …), so we must compare the hashes of the uncompressed content. While this commit has changes in public headers, the classes it changes are marked as hidden, so nobody can use them directly, which means the ABI break is internal only.
2015-08-27sources.list and indextargets option for pdiffsDavid Kalnischkies
Disabling pdiffs can be useful occasionally, like if you have a fast local mirror where the download doesn't matter, but still want to use it for non-local mirrors. Also, some users might prefer it to only use it for very big indextargets like Contents.