summaryrefslogtreecommitdiff
path: root/apt-pkg/deb/deblistparser.cc
AgeCommit message (Collapse)Author
2017-12-13convert various c-style casts to C++-styleDavid Kalnischkies
gcc was warning about ignored type qualifiers for all of them due to the last 'const', so dropping that and converting to static_cast in the process removes the here harmless warning to avoid hidden real issues in them later on. Reported-By: gcc Gbp-Dch: Ignore
2017-07-12Reformat and sort all includes with clang-formatJulian Andres Klode
This makes it easier to see which headers includes what. The changes were done by running git grep -l '#\s*include' \ | grep -E '.(cc|h)$' \ | xargs sed -i -E 's/(^\s*)#(\s*)include/\1#\2 include/' To modify all include lines by adding a space, and then running ./git-clang-format.sh.
2017-07-12Drop cacheiterators.h includeJulian Andres Klode
Including cacheiterators.h before pkgcache.h fails because pkgcache.h depends on cacheiterators.h.
2017-06-28Strip 0: epochs from the version hashJulian Andres Klode
This should fix some issues with dpkg normalizing such values. Suprisingly enough apt treats the Version: field the same, even with epoch vs without, but not when searching, and does not strip the 0: from the output.
2017-02-10Do not package names representing .dsc/.deb/... filesJulian Andres Klode
In the case of build-dep and other commands where a file can be passed we must make sure not to normalize the path name as that can have odd side effects, or well, cause the operation to do nothing. Test for build-dep-file is adjusted to perform the vcard check once as "vcard" and once as "VCard", thus testing that this solves the reported bug. We inline the std::transform() and optimize it a bit to not write anything in the common case (package names are defined to be lowercase, the whole transformation is just for names that should not exist...) to counter the performance hit of the added find() call (it's about 0.15% more instructions than with the existing transform, but we save about 0.67% in writes...). Closes: #854794
2017-01-02ParseDepends: Support passing the desired architectureNiels Thykier
This is useful for e.g. Britney, where the Build-Depends would have to be parsed for multiple architectures. With this change, the call can choose the architecture without having to mess with the config. Signed-off-by: Niels Thykier <niels@thykier.net> Closes: #845969 (jak@d.o: made the code compile)
2016-11-22Do not use MD5SumValue for Description_md5()Julian Andres Klode
Our profile says we spend about 5% of the time transforming the hex digits into the binary format used by HashsumValue, all for comparing them against the other strings. That makes no sense at all. According to callgrind, this reduces the overall instruction count from 5,3 billion to 5 billion in my example, which roughly matches the 5%.
2016-11-22debListParser: Micro-optimize AvailableDescriptionLanguages()Julian Andres Klode
Generating a string for each version we see is somewhat inefficient. The problem here is that the Description tag names are longer than 15 byte, and thus require an allocation on the heap, which we should avoid. It seems reasonable that 20 characters works for all languages codes used for archive descriptions, but if not, there's a warning, so we'll catch that. This should improve performance by about 2%.
2016-11-22Optimize VersionHash() to not need temporary copy of inputJulian Andres Klode
Stop copying stuff, and just parse the bytes one by-one to the newly created AddCRC16Byte. This improves the instruction count for an update run from 720,850,121 to 455,801,749 according to callgrind.
2016-11-22Introduce tolower_ascii_unsafe() and use it for hashingJulian Andres Klode
This one has some obvious collisions for non-alphabetical characters, like some control characters also hashing to numbers, but we don't really have those, and these are hash functions which are not collision free to begin with.
2016-11-22debListParser: Convert to use pkgTagSection::Key-based lookupJulian Andres Klode
This basically gets rid of 40-50% of the hash table lookups, making things a bit faster that way, and the profiles look far cleaner.
2016-11-11add hidden config to set packages as Essential/ImportantDavid Kalnischkies
You can pretty much achieve the same with a local dummy package if you want to, but libapt has an inbuilt setting for essential: "apt" which can be overridden with this option as well – it could be helpful in quick tests and what not so adding this alternative shouldn't really hurt much. We aren't going to document them much through as care must be taken in regards to the binary caches as they aren't invalidated by config options alone, so the effects of old settings could still be in them, similar to the other already existing pkgCacheGen option(s). Closes: 767891 Thanks: Anthony Towns for initial patch
2016-09-18VersionHash: Do not skip too long dependency linesJulian Andres Klode
If the dependency line does not contain spaces in the repository but does in the dpkg status file (because dpkg normalized the dependency list), the dpkg line might be longer than the line in the repository. If it now happens to be longer than 1024 characters, it would be skipped, causing the hashes to be out of date. Note that we have to bump the minor cache version again as this changes the format slightly, and we might get mismatches with an older src cache otherwise. Fixes Debian/apt#23
2016-07-01reinstalling local deb file is no downgradeDavid Kalnischkies
If we have a (e.g. locally built) deb file installed and do try to install it again apt complained about this being a downgrade, but it wasn't as it is the very same version… it was just confused into not merging the versions together which looks like a downgrade then. The same size assumption is usually good, but given that volatile files are parsed last (even after the status file) the base assumption no longer holds, but is easy to adept without actually changing anything in practice.
2016-06-28Fix buffer overflow in debListParser::VersionHash()Julian Andres Klode
If a package file is formatted in a way that that no space follows a deprecated "<", we would reformat it to "<=" and increase the length of the output by 1, which can break. Under normal circumstances with "<=" this should not be an issue. Closes: #828812
2016-03-06get group again after potential remap in Source: parseDavid Kalnischkies
Mysteriously segfaults only on i386 for me, but at least one reporter had the same behavior and it makes sense that this is the problem as the parsing of Source: was fixed in 1.2.2 – before the not remapped group was not used. We don't use our usual Dynamic<> trick here as we don't have it in the parser. Its a bit of a layer violation to do this parsing here, but its how it is always was… Until next time with this lovely kind of problem. Closes: 812251 Thanks: Francesco Poli and Marc Haber for testdata.
2016-01-26convert Version() and Architecture() to APT::StringViewDavid Kalnischkies
Part of hidden classes, so conversion is abi-free. Git-Dch: Ignore
2016-01-26remove unused Description methods in listparsersDavid Kalnischkies
These virtual methods are implemented in hidden classes, so we can drop them without breaking the ABI. Git-Dch: Ignore
2016-01-26parse version correctly from binary Source fieldDavid Kalnischkies
In commit a221efc331693f8905da870141756c892911c433 I promoted the source package name and version to the binary cache for faster access by e.g. EDSP, but due to changing the interpretation length to soon we always ignored the version part of the Source field, so that packages ended up having the binary version as source version – which while usually just fine it is wrong for binary rebuilds. Closes: 812492
2016-01-25treat an empty dependency field just like it doesn't existDavid Kalnischkies
Git-Dch: Ignore
2016-01-15use APT::StringView for GrabWordDavid Kalnischkies
Git-Dch: Ignore
2016-01-14fix M-A:foreign provides creation for unknown archsDavid Kalnischkies
Architectures for packages which do not belong to the native nor a foreign architecture (dubbed barbarian for now) which are marked M-A:foreign still provide in their own architecture even if not for others. Also, other M-A:foreign (and allowed) packages provide in these barbarian architectures.
2016-01-08debListParser: Convert another ParseDepends to StringViewJulian Andres Klode
I overlooked this Gbp-Dch: ignore
2016-01-08AvailableDescriptionLanguages: Use one string for all iterationsJulian Andres Klode
Do not create strings within the loop, that creates one string per language and does more work than needed. Instead, reserve enough space at the beginning and assign the prefix, and then resize and append inside the loop. Also call exists with the string itself instead of the c_str(), this means that the lookup uses the size information in the string now and does not have to call strlen() on it.
2016-01-08Replace compare() == 0 checks with this == other checksJulian Andres Klode
This improves performance, as we now can ignore unequal strings based on their length already. Gbp-Dch: ignore
2016-01-07Switch performance critical code to use APT::StringViewJulian Andres Klode
This improves performance of the cache generation on my ARM platform (4x Cortex A15) by about 10% to 20% from 2.35-2.50 to 2.1 seconds.
2015-12-27ParseDepends: Mark branches for build-dep parsing as unlikelyJulian Andres Klode
We do not see those branches at all during normal mode of operation (that is, during cache generation), so tell the compiler about it.
2015-12-27debListParser: Do not validate Description-md5 for correctness twiceJulian Andres Klode
The Set() method returns false if the input is no hex number, so simply use that.
2015-12-27debListParser: ParseDepends: Only query native arch if neededJulian Andres Klode
This makes the code parsing architecture lists slower, but on the other hand, improves the more generic case of reading dependencies from Packages files.
2015-12-27Convert most callers of isspace() to isspace_ascii()Julian Andres Klode
This converts all callers that read machine-generated data, callers that might work with user input are not converted.
2015-12-11Convert package names from Packages files to lower caseJulian Andres Klode
dpkg does that when reading package files, so we should do the same. This only deals with parsing names from binary package paragraphs, it does not look at source package names and/or the list of binaries in a dsc file. Closes: #807012
2015-10-23deblistparser: Make PrioList constJulian Andres Klode
More safety, less writeable memory.
2015-09-14implement dpkgs vision of interpreting pkg:<arch> dependenciesDavid Kalnischkies
How the Multi-Arch field and pkg:<arch> dependencies interact was discussed at DebConf15 in the "MultiArch BoF". dpkg and apt (among other tools like dose) had a different interpretation in certain scenarios which we resolved by agreeing on dpkg view – and this commit realizes this agreement in code. As was the case so far libapt sticks to the idea of trying to hide MultiArch as much as possible from individual frontends and instead translates it to good old SingleArch. There are certainly situations which can be improved in frontends if they know that MultiArch is upon them, but these are improvements – not necessary changes needed to unbreak a frontend. The implementation idea is simple: If we parse a dependency on foo:amd64 the dependency is formed on a package 'foo:amd64' of arch 'any'. This package is provided by package 'foo' of arch 'amd64', but not by 'foo' of arch 'i386'. Both of those foo packages provide each other through (assuming foo is M-A:foreign) to allow a dependency on 'foo' to be satisfied by either foo of amd64 or i386. Packages can also declare to provide 'foo:amd64' which is translated to providing 'foo:amd64:any' as well. This indirection over provides was chosen as the alternative would be to teach dependency resolvers how to deal with architecture specific dependencies – which violates the design idea of avoiding resolver changes, especially as architecture-specific dependencies are a cornercase with quite a few subtil rules. Handling it all over versioned provides as we already did for M-A in general seems much simpler as it just works for them. This switch to :any has actually a "surprising" benefit as well: Even frontends showing a package name via .Name() [which doesn't show the architecture] will display the "architecture" for dependencies in which it was explicitely requested, while we will not show the 'strange' :any arch in FullName(true) [= pretty-print] either. Before you had to specialcase these and by default you wouldn't get these details shown. The only identifiable disadvantage is that this complicates error reporting and handling. apt-get's ShowBroken has existing problems with virtual packages [it just shows the name without any reason], so that has to be worked on eventually. The other case is that detecting if a package is completely unknown or if it was at least referenced somewhere needs to acount for this "split" – not that it makes a practical difference which error is shown… but its one of the improvements possible.
2015-09-14M-A: allowed pkgs of unconfigured archs do not statisfy :anyDavid Kalnischkies
We parse all architectures we encounter recently, which means we also parse packages from architectures which are neither native nor foreign, but still came onto the system somehow (usually via heavy force).
2015-09-14store ':any' pseudo-packages with 'any' as architectureDavid Kalnischkies
Previously we had python:any:amd64, python:any:i386, … in the cache and the dependencies of an amd64 package would be on python:any:amd64, of an i386 on python:any:i386 and so on. That seems like a relatively pointless endeavor given that they will all be provided by the same packages and therefore also a waste of space. Git-Dch: Ignore
2015-08-31fix some unused parameter/variable warningsDavid Kalnischkies
Reported-By: gcc Git-Dch: Ignore
2015-08-27Do not parse Status fields from remote sourcesJulian Andres Klode
This could allow an attacker to mark a package as installed in a remote package index, as long as the package was not listed in the dpkg status file. This way, an attacker could force the installation of a package during a dist-upgrade, by providing two packages in an index, an older marked as installed, and a newer - apt would "upgrade" to the newer version.
2015-08-17Cleanup includes after running iwyuMichael Vogt
2015-08-10no value for MultiArch field is 'no', not 'none'David Kalnischkies
Git-Dch: Ignore
2015-08-10drop obsolete explicit :none handling in pkgCacheGenDavid Kalnischkies
We archieve the same without the special handling now, so drop this code. Makes supporting this abdomination a little longer bearable as well. Git-Dch: Ignore
2015-08-10parse packages from all architectures into the cacheDavid Kalnischkies
Now that we can dynamically create dependencies and provides as needed rather than requiring to know with which architectures we will deal before running we can allow the listparser to parse all records rather than skipping records of "unknown" architectures. This can e.g. happen if a user has foreign architecture packages in his status file without dpkg knowing about this architecture (or apt configured in this way). A sideeffect is that now arch:all packages are (correctly) recorded as available from any Packages file, not just from the native one – which has its downsides for the resolver as mixed-arch source packages can appear in different architectures at different times, but that is the problem of the resolver and dealing with it in the parser is at best a hack (and also depends on a helpful repository). Another sideeffect is that his allows :none packages to appear in Packages files again as we don't do any kind of checks now, but given that they aren't really supported (anymore) by anyone we can live with that.
2015-08-10elimate duplicated code in pkgIndexFile subclassesDavid Kalnischkies
Trade deduplication of code for a bunch of new virtuals, so it is actually visible how the different indexes behave cleaning up the interface at large in the process. Git-Dch: Ignore
2015-08-10add volatile sources support in libapt-pkgDavid Kalnischkies
Sources are usually defined in sources.list (and co) and are pretty stable, but once in a while a frontend might want to add an additional "source" like a local .deb file to install this package (No support for 'real' sources being added this way as this is a multistep process). We had a hack in place to allow apt-get and apt to pull this of for a short while now, but other frontends are either left in the cold by this and/or the code for it looks dirty with FIXMEs plastering it and has on top of this also some problems (like including these 'volatile' sources in the srcpkgcache.bin file). So the biggest part in this commit is actually the rewrite of the cache generation as it is now potentially a three step process. The biggest problem with adding support now through is that this makes a bunch of previously mostly unusable by externs and therefore hidden classes public, so a bit of further tuneing on this now public API is in order…
2015-08-10just-in-time creation for (explicit) negative depsDavid Kalnischkies
Now that we deal with provides in a more dynamic fashion the last remaining problem is explicit dependencies like 'Conflicts: foo' which have to apply to all architectures, but creating them all at the same time requires us to know all architectures ending up in the cache which isn't needed to be the same set as all foreign architectures. The effect is visible already now through as this prevents the creation of a bunch of virtual packages for arch:all packages and as such also many dependencies, just not very visible if you don't look at the stats… Git-Dch Ignore
2015-08-10just-in-time creation for (implicit) ProvidesDavid Kalnischkies
Expecting the worst is easy to code, but has its disadvantages e.g. by creating package structures which otherwise would have never existed. By creating the provides instead at the time a package structure is added we are well prepared for the introduction of partial architectures, massive amounts of M-A:foreign (and :allowed) and co as far as provides are concerned at least. We have something relatively similar for dependencies already. Many tests are added for both M-A states and the code cleaned to properly support implicit provides for foreign architectures and architectures we 'just' happen to parse. Git-Dch: Ignore
2015-08-10hide implicit deps in apt-cache again by defaultDavid Kalnischkies
Before MultiArch implicits weren't a thing, so they were hidden by default by definition. Adding them for MultiArch solved many problems, but having no reliable way of detecting which dependency (and provides) is implicit or not causes problems everytime we want to output dependencies without confusing our observers with unneeded implementation details. The really notworthy point here is actually that we keep now a better record of how a dependency came to be so that we can later reason about it more easily, but that is hidden so deep down in the library internals that change is more the problems it solves than the change itself.
2015-08-10remove the compatibility markers for 4.13 abiDavid Kalnischkies
We aren't and we will not be really compatible again with the previous stable abi, so lets drop these markers (which never made it into a released version) for good as they have outlived their intend already. Git-Dch: Ignore
2015-08-10bunch of micro-optimizations for depcacheDavid Kalnischkies
DepCache functions are called a lot, so if we can squeeze some drops out of them for free we should do so. Takes also the opportunity to remove some whitespace errors from these functions. Git-Dch: Ignore
2015-08-10make all d-pointer * const pointersDavid Kalnischkies
Doing this disables the implicit copy assignment operator (among others) which would cause hovac if used on the classes as it would just copy the pointer, not the data the d-pointer points to. For most of the classes we don't need a copy assignment operator anyway and in many classes it was broken before as many contain a pointer of some sort. Only for our Cacheset Container interfaces we define an explicit copy assignment operator which could later be implemented to copy the data from one d-pointer to the other if we need it. Git-Dch: Ignore
2015-06-12store Release files data in the CacheDavid Kalnischkies
We used to read the Release file for each Packages file and store the data in the PackageFile struct even through potentially many Packages (and Translation-*) files could use the same data. The point of the exercise isn't the duplicated data through. Having the Release files as first-class citizens in the Cache allows us to properly track their state as well as allows us to use the information also for files which aren't in the cache, but where we know to which Release file they belong (Sources are an example for this). This modifies the pkgCache structs, especially the PackagesFile struct which depending on how libapt users access the data in these structs can mean huge breakage or no visible change. As a single data point: aptitude seems to be fine with this. Even if there is breakage it is trivial to fix in a backportable way while avoiding breakage for everyone would be a huge pain for us. Note that not all PackageFile structs have a corresponding ReleaseFile. In particular the dpkg/status file as well as *.deb files have not. As these have only a Archive property need, the Component property takes over this duty and the ReleaseFile remains zero. This is also the reason why it isn't needed nor particularily recommended to change from PackagesFile to ReleaseFile blindly. Sticking with the earlier is usually the better option.