composer / composer

Dependency Manager for PHP

Home Page:https://getcomposer.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Work Towards Reproducible/Reproducible Builds (Research Needed?)

sarciszewski opened this issue · comments

This is best illustrated by example:

#!/usr/bin/env bash

wget -O hosted.phar https://getcomposer.org/download/1.0.0-alpha9/composer.phar

git checkout 1.0.0-alpha9
bin/compile
xxd composer.phar > ver1.hex
xxd hosted.phar > ver2.hex
diff ver1.hex ver2.hex > shouldbezero.diff

This script produces a 8.9 MB file.

What can be done to make composer.phar builds reproducible?

commented

What is the actual issue here?

The answer to your question was a brief google search away.

I was attempting to verify the .phar available is equivalent to one built from the source, and there were a lot of differences. Ideally, grabbing the .phar and then building the .phar from source should result in identical .phar files. But it did not. Building from source does not give you the same thing you get when you attempt to download a .phar from the Composer website.

The issue here is that, unless builds are deterministic, if getcomposer.org is compromised and the deliverables are trojaned, it will be very hard to detect. If builds were reproducible from source, then one simply has to checkout the release tag, build the .phar, and compare to what's being served.

As a safeguard to the entire PHP developer community, I believe investigating making composer.phar build, byte-for-byte, from the source no matter who builds it is a worthwhile measure.

As a follow-up, this is an open-ended issue for the community.

Unlike the cryptographic signature issue I opened last year and @padraic has been trying to advocate for, I don't know if a solution exists.

That said, if only one project can tackle deterministic builds and the goal is securing PHP developers the world over, the best candidate is Composer.

commented

The phar can sometimes lag a few minutes behind the master branch (or more even, not sure on that to be honest). It's still an alpha product, constantly changing. Why would you expect it to result in identical builds? Also, there are bound to be differences since there are artifacts in the code that relate to when the build was run (for example, a timestamp and possibly also a commit hash). So they will never be 100% identical. See Compiler.php for specifics.

The phar can sometimes lag a few minutes behind the master branch (or more even, not sure on that to be honest). It's still an alpha product, constantly changing.

Quoting my first post:

wget -O hosted.phar https://getcomposer.org/download/1.0.0-alpha9/composer.phar
git checkout 1.0.0-alpha9

I wasn't checking the master branch.

commented

And you also didn't read my full reply. Maybe take the time to do so before replying wastefully?

The build timestamp in the version output that alcohol mentioned is the big one I can think of. The signature is the second. Then take in to consideration what a PHAR is, a compressed archive of files. Each file has a creation time, modified time, access time, uid, gid, permissions, etc... I don't know how much of this metadata is persisted when the PHAR is built, but I know some of it is... Your best bet is to extract the PHAR, set the release date constant back to the placeholders in src/Composer/Composer.php and diff your directories.

commented

We tested 1.0.0-alpha9 with our new pharaoh auditing utility on the .phar downloaded from https://getcomposer.org and the one built from source.

The results are available here: https://gist.github.com/paragonie-scott/ccb86b34ff0577d229bc

@alcohol

Why would you expect it to result in identical builds?

It's not expected behaviour, it's requested behaviour. Deterministic builds are a highly desirable property to prevent targeted malware attacks. If a skilled analyst can audit the source code, then verify that the deliverable is identical to what they get when they build from source, there can be a reasonable assurance that the .phar deliverables have not been tampered with. Even in the absence of GPG or OpenSSL signatures.

Also, there are bound to be differences since there are artifacts in the code that relate to when the build was run (for example, a timestamp and possibly also a commit hash).

Pharaoh extracts the one you provide and the one we build and compares them with the git diff utility, thereby making the timestamps and commit hashes moot.

The artifacts in the code are precisely what can be addressed in a stable release to make the builds deterministic.

@slbmeh

The signature is the second.

To the best of my knowledge, Composer doesn't actually employ an asymmetric cryptographic signature (e.g. OpenSSL) in the PHAR building process, so building from source ought to produce the same signature (because hash-functions are deterministic) if the underlying code is identical.

Your best bet is to extract the PHAR ... diff your directories.

This excerpt is precisely what v0.1.1 of Pharaoh does. 👍

On Topic

EDIT: Actually, I misread the diff. There are changes outside the autoloader.

We still believe that small tweaks to the code to make builds deterministic would allow for automated threat detection and prevention by independent third parties. Which is sort of the entire goal on our end. Therefore, I'd like to request that this thread stay open for future discussions along this vein.

commented

I understand you tested the alpha release specifically, but my point is that most users simply download the 'latest' snapshot. This snapshot points at the master branch and could potentially lag behind a commit (or several) depending on when you download it. The 'version' is stored inside the phar archive, which in the case of a non-tagged release is the sha of the commit. So there are several factors that you would always have to take into account.

I think this issue will probably stay irrelevant until composer heads towards a more stable release cycle. I wouldn't count on that in the very near future though. Just my thoughts.

@paragonie-scott you're right, the normal signature for a phar is deterministic. I mentioned it because it would cause a hex dump to be a good bit more different with even the slightest difference in the contents.

Looking at the diff I think the only difference is that they were installed with two different versions of composer. I initially thought your build was phr_6bTGzo... but at second pass I believe it to be more likely to be phr_471Znr... The build in phr_6bTGzo was generated with a version of composer prior to the .hh extension updates to the autoloader which is much older than 1.0.0-alpha9.

Sooo... there were a bunch of various issues making the phar file vary at every build, which I fixed in the commits linked above!

Now the most fun of the issues is that the phar extension stores a unix timestamp for each file in its file manifest, and after reading their whole spec and then diving into the source it turns out that timestamp isn't related to the file at all, nor configurable, it's just time() for every file. As time() isn't quite reproducible given the time-space continuum and all that, the only way it seems was to patch the phar after creation, and then update its signature.. which I did in a tiny new library https://github.com/Seldaek/phar-utils/blob/master/src/Timestamps.php#L33

I now get the same phar output on linux and windows running bin/compile on both from the same git commit, but if you wanna check it out and confirm you're most welcome to do so.

Obviously this whole mess is not applicable to the previous releases, but upcoming ones should hopefully be good, and so are the dev snapshots by the way.

Sooo... there were a bunch of various issues making the phar file vary at every build, which I fixed in the commits linked above!

👍 Awesome!

I wrote and published a tool called Pharaoh if you'd like to use it to perform the meaningful comparisons. (Timestamps are benign; I was interested in the phar stub and the file contents.)

I'll check this out this weekend and publish my findings :)