summaryrefslogtreecommitdiff
path: root/apt-pkg
AgeCommit message (Collapse)Author
2016-01-02add optional support for comments in pkgTagFileDavid Kalnischkies
APT usually deals with perfectly formatted files generated automatically be other programs – and as it has to parse multiple MBs of such files it tries to be fast rather than forgiving. This was always a problem if we reused this parser for files with a deb822 syntax which are mostly written by hand however, like apt_preferences or the deb822-style sources as these can include stray newlines and more importantly comments all over the place. As a stopgap we had pkgUserTagSection which deals at least with comments before and after a given stanza, but comments in between weren't really supported and now that we support parsing debian/control for e.g. build-dep we face the full comment problem e.g. with comments inbetween multi-line fields (like Build-Depends). We can't easily deal with this on the pkgTagSection level as the interface gives access to 'raw' char-pointers for performance reasons so we would need to optionally add a buffer here on which we could remove comments to hand out pointers into this buffer instead. The interface is quite large already and supports writing stanzas as well, which does not support comments at all either. So while in future it might make sense to have a parser setup which deals with and keeps comments in this commit we opt for the simpler solution for now: We officially declare that pkgTagSection does not support comments and instead expect the caller to deal with them, which in our case is pkgTagFile: pkgTagFile is extended with an additional mode which can deal with comments by dropping them from the buffer which will later form the input of pkgTagSection. The actual implementation is slightly more complex than this sentence suggests at first on one hand to have good performance and on the other to allow jumping directly to stanzas with offsets collected in a previous run (like our cache generation does it for example).
2015-12-29Do not sync the cache fileJulian Andres Klode
Integrity is taken care of by the checksum now.
2015-12-29Add support for calculating hashes over the entire cacheJulian Andres Klode
2015-12-29pkgCacheGenerator: Allow passing down an already created cacheJulian Andres Klode
If we already have opened a cache, there is no point in having to open it again.
2015-12-29pkgTagSection::Scan: Fix read of uninitialized valueJulian Andres Klode
We ignored the boundary of the buffer we were reading in while scanning for spaces.
2015-12-29strutl.cc: Add declarations for the compat _ascii() functionsJulian Andres Klode
This shuts up gcc Gbp-Dch: ignore
2015-12-29Turn tolower_ascii() and isspace_ascii() into inline functionsJulian Andres Klode
To preserve compatibility, the new inline functions have _inline as a suffix, and a macro defines the old names to refer to the inline variants. The old functions are still preserved for binary compatibility. Also simplify the implementation of both functions.
2015-12-29Switch to DJB hashing and use prime number as table sizeJulian Andres Klode
On my testing system, consisting of unstable and experimental, this reduces the average chain from 6.5 to 4.5, and the longest chain from 17 to 15.
2015-12-28BufferedFileFdPrivate: Make InternalFlush() save against errorsJulian Andres Klode
Previously, if flush errored inside the loop, data could have already been written to the wrapped descriptor without having been removed from the buffer. Also try to work around EINTR here. A better solution might be to have the individual privates detect an interrupt and return 0 in such a case, instead of relying on errno being untouched in between the syscall and the return from InternalWrite.
2015-12-28aptconfiguration: Set default compression level to 6Julian Andres Klode
Since commit 7a68effcb904b4424b54a30e448b6f2560cd1078, the xz and lzma compressors read the level of compression they shall use. A default of -9 is too much for them, this will use 674 MB, according to the xz manual page. Level -6 on the other hand only needs 94 MB memory for compression. This causes autopkgtest failures in the test-compressed-indexes test, as not enough memory exists to proceed. Change the other compression levels to 6 as well: The gzip and bzip2 FileFd backends do not read them, and use their code's default level which is 6, so do the same for external methods.
2015-12-28BufferedWriter: flushing: Check for written < size instead of <=Julian Andres Klode
This avoids some issues with InternalWrite returning 0 because it just cannot write stuff at the moment.
2015-12-27deal with empty values properly in deb822 parserDavid Kalnischkies
Regression introduced in 8710a36a01c0cb1648926792c2ad05185535558e, but such fields are unlikely in practice as it is just as simple to not have a field at all with the same result of not having a value. Closes: 808102
2015-12-27allow repositories to forbid arch:all for specific index targetsDavid Kalnischkies
Debian has a Packages file for arch:all already, but the arch:any files contain arch:all packages as well, so downloading it would be a total waste of resources. Getting this solved is on the list of things to do, but it is also the hardest part – for index targets like Contents the situation is much easier and less server/client implementations are involved so we might not want to stall them. A repository can now declare via: No-Support-for-Architecture-all: Packages that even if an arch:all Packages exists, it shouldn't be downloaded, so that support for Contents files can be added now. See also 1dd20368486820efb6ef4476ad739e967174bec4 for the implementation of downloading arch:all index targets, which this is limiting. The field uses the name of the target from the apt configuration for simplicity and is negative by design as this field is intended to be supported/needed only for a "short" time (one or two Debian releases). While this commit theoretically supports any target, its expected to only see "Packages" as a value in reality.
2015-12-27pkgcachegen.h: Hack around unordered_map not existing before C++11Julian Andres Klode
This is for public users only, which cannot use the class at all, except for the static methods.
2015-12-27FileFd: Add a buffered writing modeJulian Andres Klode
This is somewhat experimental right now, and might not work for everyone, so it is on an opt-in basis.
2015-12-27FildFd: Introduce a Flush() function and call it from Close()Julian Andres Klode
The flush function can be used for buffered writers.
2015-12-27FileFdPrivate: Add getter and setter for fieldsJulian Andres Klode
We will soon implement a buffered writing decorator and we will need to forward attribute changes to those.
2015-12-27fileutl: simple_buffer: Add write() and full() methodsJulian Andres Klode
These can be used to implement write buffering Gbp-Dch: ignore
2015-12-27fileutl: simple_buffer: Mark accessors as constJulian Andres Klode
Suggested by David. Gbp-Dch: ignore
2015-12-27FileFdPrivate: Extract SimpleBuffer and mark it as hiddenJulian Andres Klode
Gbp-Dch: ignore
2015-12-27ParseDepends: Mark branches for build-dep parsing as unlikelyJulian Andres Klode
We do not see those branches at all during normal mode of operation (that is, during cache generation), so tell the compiler about it.
2015-12-27debListParser: Do not validate Description-md5 for correctness twiceJulian Andres Klode
The Set() method returns false if the input is no hex number, so simply use that.
2015-12-27Hex2Digit: Do not use isxdigit()Niels Thykier
We directly check if we are a hex digit in HexDigit, so use that information. [jak@debian.org: Commit message wording]
2015-12-27debListParser: ParseDepends: Only query native arch if neededJulian Andres Klode
This makes the code parsing architecture lists slower, but on the other hand, improves the more generic case of reading dependencies from Packages files.
2015-12-27pkgcachegen: Use std::unordered_map instead of std::mapJulian Andres Klode
std::unordered_map is faster than std::map in our use case, reducing cache generation time by about 10% in my benchmark.
2015-12-27Convert most callers of isspace() to isspace_ascii()Julian Andres Klode
This converts all callers that read machine-generated data, callers that might work with user input are not converted.
2015-12-27Introduce isspace_ascii() for use by parsersJulian Andres Klode
This is like isspace(), but ignores the current locale.
2015-12-26Refactor InternalReadLine to not unroll Size == 0 caseJulian Andres Klode
There is not much point and this is more readable. Gbp-Dch: ignore
2015-12-26Change InternalReadLine to always use buffer.read() return valueJulian Andres Klode
This is mostly a documentation issue, as the size we want to read is always less than or equal to the size of the buffer, so the return value will be the same as the size argument. Nonetheless, people wondered about it, and it seems clearer to just always use the return value.
2015-12-26Get rid of memmove() in our read bufferingJulian Andres Klode
This further improves our performance, and rred on uncompressed files now spents 78% of its time in writing. Which means that we should really look at buffering those.
2015-12-26Use a hardcoded buffer size of 4096 to fix performanceJulian Andres Klode
The code uses memmove() to move parts of the buffer to the front when the buffer is only partially read. By simply reading one page at a time, the maximum size of bytes that must be moved has a hard limit, and performance improves: In one test case, consisting of a 430 MB Contents file, and a 75K PDiff, applying the PDiff previously took about 48 seconds and now completes in 2 seconds. Further speed up can be achieved by buffering writes, they account for about 60% of the run-time now.
2015-12-24Mark all FileFdPrivate classes as hidden1.1.6Julian Andres Klode
Gbp-Dch: ignore
2015-12-23fix new[] vs delete mismatch introduced by b3db9d81David Kalnischkies
And as we are at it lets fix the 'style' issue I introduced with the filefd changes as well. Reported-By: gcc -fsanitize's & cppcheck Git-Dch: Ignore
2015-12-23use a dynamic buffer for ReadLineDavid Kalnischkies
We don't need the buffer that often - only for ReadLine - as it is only occasionally used, so it is actually more efficient to allocate it if needed instead of statically by default. It also allows the caller to influence the buffer size instead of hardcoding it. Git-Dch: Ignore
2015-12-23implement a buffer system for FileFd::ReadLineDavid Kalnischkies
The default implementation of ReadLine was very naive by just reading each character one-by-one. That is kinda okay for libraries implementing compression as they have internal buffers (but still not great), but while working with files directly or via a pipe as there is no buffer there so all those reads are in fact system calls. This commit introduces an internal buffer in the FileFd implementation which is only used by ReadLine. The more low-level Read and all other actions remain unbuffered – they just changed to deal with potential "left-overs" in the buffer correctly. Closes: 808579
2015-12-22parse xz-compression level from configurationDavid Kalnischkies
If we use the library to compress xz, still try to understand and pick up the arguments we would have used to call xz to figure out which level the user wants us to use instead of defaulting to level 6 (which is the default level of xz).
2015-12-22follow dpkg and xz and use CRC64 for xz compressionDavid Kalnischkies
dpkg switched from CRC32 to CRC64 in 777915108d9d36d022dc4fc4151a615fc95e5032 with the message: | This is the default CRC used by the xz command-line tool, align with | it and switch from CRC32 to CRC64. It should provide slightly better | detection against damaged data, at a negligible speed difference.
2015-12-22shuffle compressor-specific code into private subclassesDavid Kalnischkies
This isn't implementing any new features, it is "just" moving code around from FileFd methods which decided on each call how to handle the request by including all logic for all possible compressor backends in the method body to a model in which backend-specifics are implemented in a FileFdPrivate subclass. This avoids a big chunk of #ifdef's and should make it a tiny bit more obvious which backend uses which code. The execution of the idea is slightly uglified by the need to preserve ABI and API which causes liberal befriending. Git-Dch: Ignore
2015-12-19Do not try to read in FileFd::Read() if Size is 0Julian Andres Klode
There's no point trying to read 0 bytes, so let's just not do this and switch to a while loop like in Write(). Gbp-Dch: ignore
2015-12-19Do nothing in FileFd::Write() if Size is 0Julian Andres Klode
Turn the do-while loop into while loops, so it simply does nothing if the Size is already 0. This reverts commit c0b271edc2f6d9e5dea5ac82fbc911f0e3adfa7a which introduced a fix for a specific instance of the issue in the CopyFile() function. Closes: #808381
2015-12-19CopyFile: avoid failing on EOF on some systemsPino Toscano
On EOF, ToRead will be 0, which might trigger on some systems (e.g. on the Hurd) an error due to the invalid byte count passed to write(). The whole loop already checks for ToRead != 0, so perform the writing step only when there was actual data read. Closes: #808381
2015-12-19CopyFile: fix BufSize to a sane valuePino Toscano
Commit e977b8b9234ac5db32f2f0ad7e183139b988340d tries to make BufSize calculated based on the size of the buffer; the problem is that std::unique_ptr::size() returns a pointer to the data, so sizeof() equals to the size of a pointer (later divided by sizeof(char), which is 1). The result is that the CopyFile copies in chunks of 8 bytes, which is not exactly ideal... As solution, declare BufSize in advance, and use its value to allocate the Buf array. Closes: #808381
2015-12-14pkgcache: Make hash arch-independent using fixed size integerJulian Andres Klode
This helps writing test cases. Also adapt the test case that expected 64-bit. Nothing changes performance wise, the distribution of the hash values remains intact.
2015-12-14tagfile: Hardcode error message for out of range integer valuesJulian Andres Klode
This makes the test suite work on 32 bit-long platforms. Gbp-Dch: ignore
2015-12-14non-existing directories don't need to be cleanedDavid Kalnischkies
Trying to clean up directories which do not exist seems rather silly if you think about it, so let apt think about it and stop it. Depends a bit on the caller if this is fixing anything for them as they might try to acquire a lock or doing other clever things as apt does. Closes: 807477
2015-12-14support regex and co in 'apt-cache policy $pkg' againDavid Kalnischkies
Regression of 1e064088bf7b3e29cd36d30760fb3e4143a1a49a (1.1~exp4) which moved code around and renamed methods heavily ending up calling the wrong method matching packagenames only instead of calling the full array. Most commands work with versions, so this managed to fly under the radar for quite a while. Closes: 807870
2015-12-14show a more descriptive error for weak Release filesDavid Kalnischkies
If we can't work with the hashes we parsed from the Release file we display now an error message if the Release file includes only weak hashes instead of downloading the indexes and failing to verify them with "Hash Sum mismatch" even through the hashes didn't mismatch (they were just weak). If for some (unlikely) reason we have got weak hashes only for individual targets we will show a warning to this effect (again, befor downloading and failing the index itself). Closes: 806459
2015-12-13parse .diff/Index hashes in reverse orderDavid Kalnischkies
Reversing the parsing order ensures that we parse weaker hashes (like SHA1) before we touch newer/stronger hashes (like SHA256) as the weaker ones will usually be there for a longer time already with data already present, which we would discard if we start with the strong one first. The discarding is visible in the debug logs: File X wasn't in the list for the first parsed hash! (history) File X wasn't in the list for the first parsed hash! (patches) which if file X is part of the patch-path means apt will not find a path and fallback to acquire the whole file instead needlessly. If file X isn't part of the patch-path that is no problem, so that effects only the update-call which updates with patches coming from before and after the addition of a new hash.
2015-12-13fix typos and docs in GlobalError documentationDavid Kalnischkies
Reported-By: Manuel A. Fernandez Montecelo <mafm@debian.org> Git-Dch: Ignore
2015-12-11mmap: Define _DEFAULT_SOURCE instead of _BSD_SOURCEJulian Andres Klode
Fixes a warning reported by gcc. Gbp-Dch: ignore