Reproducible Multi-Project Gradle Builds
Lately I’ve been working on a way to create a reproducible build of my application source using Gradle. And by reproducible, I mean, when I run the md5 or sha256 checksum on the resulting files, I get the same results.
This is a bit more complicate then I originally thought. The issue is that inherently, a Java Jar file includes the timestamp information for the files contained within. So, if you build a Jar and checksum it and then clean and rebuilt the Jar, even though the files are semantically identical, the resulting checksum’s will be different.
This is a problem in Continuous Integration/Deployment environments where you may be building from a clean VCS checkout every time but want to know only about the things that changed.
Working around this in Gradle is a bit of mind twister. After thinking about this for a couple days, I’ve come up with this list of items that need to addressed to successfully accomplish what I’m trying to do.
- Calculate a single timestamp for each artifact that Gradle produces
- Timestamp calculation needs to be sensitive to the Gradle project that produces the artifact.
- Timestamp calculation needs to be sensitive to any Gradle project dependencies for the project producing the artifact.
- Modify the resulting artifact to set the timestamp information for contents uniformly.
After a quick Twitter conversation with Luke Daley, item 4 was knocked out by using Java’s ZipFile
API to iterate over the jar’s contents and set the lastModified
timestamp to a value, but that leaves the more critical timestamp calculations.
Let’s start with a mind experiment to figure out what we need. Let’s assume we have the following Gradle project structure:
This is a simple Gradle multi-project build where we have the application as one project and a utility library as another.
We check this code out and run a ./gradlew build
and calculate an MD5 checksum on utils.jar
and app.jar
:
File | Checksum |
---|---|
utils.jar | A |
app.jar | B |
If I make a change to the app
project and do a ./gradlew clean build
, we want the following to happen:
File | Checksum |
---|---|
utils.jar | A |
app.jar | B’ |
In this scenario, only the app.jar
checksum would change because the utils
project is functionally equivalent.
However, if we start from our initial commit and change the utils
project and do a ./gradlew clean build
, we want the following to happen:
File | Checksum |
---|---|
utils.jar | A’ |
app.jar | B’ |
In this case with only a change to utils
, we want both checksums to change because the app
project is dependent upon the changed code in the utils library.
My initial thought was to use my VCS to get a timestamp for the latest change to the Gradle project being built. This would couple my build to the VCS being used, but that’s an okay dependency in my opinion. Alternatively, the build could scan over the source contents and get the latest lastModified
timestamp of any file contained within.
Neither of these solutions would solve the issue with upstream libraries however. Somehow, we need to propagate the upstream project’s information.
I’m not sure there is a nice way to do this, but my current thinking is to simulate this meta information by enforcing a commit to downstream projects when an upstream is modified. In my particular case, I would likely change the project’s version of the upstream library by modifying the build.gradle
file in that project. I could add a task that verifies that the downstream project has a version that is at least equivalent to any upstream projects’ versions. This could run as part of the standard build process. This would force a change to the downstream’s build.gradle
thus indirectly updating the timestamp the VCS would report for that project.
All of this is theoretical at this point, I’ve only started working on an implementation. I’ll update this post as I learn more.