I’ve been thinking about this subject for a while now and I wanted to put together some of the ideas that are that are in my head. These contradict my thoughts above, and I rescind my previous comment above.
First and foremost: Software versions are built to communicate intent, but they fail at doing so. If you pulled someone off the street who knew nothing about technology, and showed them the version numbers between Red Hat 5 and Red Hat 7, what would it tell them? It would be useless, and meaningless. Even for someone who is a software engineer, perhaps they could tell you the differences in base kernel versions, or the choice of SystemD over InitD, and so on, but then what are the minor, and patch versions for? You’d have to go reference the CHANGELOG, or better, actually read the code.
I think the best way to understand that using software versions to communicate intent is a failure is to go read semver.org. For something so simple, it’s detailed and complicated.
Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.
Under this scheme, version numbers and the way they change convey meaning about the underlying code and what has been modified from one version to the next.
But there’s no actual system for verifying any of this! So in reality, SemVer is just hopes and dreams. You HOPE that the developer has done rigorous testing to ensure they maintain compatibility, you DREAM that patch versions make their bug fixes backwards-compatible, but none of this can be verified. And even IF the developer has done some super rigorous testing, path-dependency can lead to runtime failures. So at least to me, putting faith in a version number, and trusting a version number for communicating intent is negligent.
The answer to this problem of communication, at least in my mind, is two things:
- Read the code, all of it.
- Use the version number as a way to establish your understanding of the code.
So once I have read all of the code, the point of the version number is that it’s a human readable mapping to a git commit hash. Then in the future, it’s possible for me to do a diff on the code and make the task of reading all the new code that much easier.
I already have a git commit hash, which marks a snapshot of a period of time of the code. The git commit hash, and a copy of the code at that git commit is what I actually care about, because that’s where the intent is at. It’s total nonsense to expect to understand the intent of the new version of the software by trusting a version number!
So, by my mind, the whole point of a version number is just to make it easy for developers to write dependency management software using a human understandable integer.
Now that I’ve talked your ear off about software versions, now we can talk about how the dependency management software should consume those software versions.
As I said above, I only care about the version number because it gives me a baseline for understanding which git commit hash is in play. If build systems didn’t ever interact with humans, we could just base all of the upstream software versions on git commit hash.
Version numbers exist because they are easy for humans to understand and use in our dependency management systems. A single incrementing integer is much better than trying to squint your eyes between two 40 character hex-valued strings.
We humans also care about the time associated with the produced version of the artifact resulting from the build process that the code comes from, because it makes it easy to reason about when the thing was produced.
Of course we also care about what the thing is, and who produced it.
So what should a “version number” be, really? I think it should be a composite of all of these things:
who made the thing / what the thing is / the version of the thing (git commit mapped to the version) / the single incrementing integer that represents time the thing was produced
or, if you haven’t already guessed:
Now I want to step back for a second and look at this a little more. The thing that we humans really care about the most in the build system is that very last number:
20181221225447. This signifys the point in time where the build system produced an artifact, and where we have incremented that human-readable integer that we care about. It’s nice because just by using the date we can accomplish both of these goals. A new version of that number is the signal for you to go do a diff on the build manifest (the plan.sh / hooks / config files) and see what’s different.
But doing this all the time is really hard for humans to keep up with. In other words, I have to audit every piece of code and every dependency and every transitive dependency, and I have to do that ALL the time because the world is always moving forward? But I just wanted to do a new build of MY code, and just use a good set of dependencies that will work. Keeping up CONTINUOUSLY is exhausting, and distracting from the value I’m trying to build for my business.
So I think @smacfarlane 's idea above nails it on the head. What we really need is a regular cadence of moving the world foward. We need a not_exhausting method of publishing these artifacts, so that people can reasonabily audit the code, or pin to a version of the dependencies that they care about.
In this case, while it’s helpful to know the time that the artifact was produced, what I really want in this case is just to say something like:
because then I can pin ALL my dependencies to the point in time where the world moved forward instead of worrying about the upstream version.
What this really is, is a bundling of all of those release numbers for every package, because they will all differ, even though they were built with the intent of working together:
So what I think is that there should also be a package manifest that goes along with these kinds of tags, that is just a list of all of the packages built for that tag and all of their pkg_ident numbers in one document. I think producing a sort of manifest like this, and doing rebuilds at a regular cadence, is a good, non-exhausting way of moving the world forward.
The idea then, is that you could always move forward faster than this, if you were willing to incur the cost of auditing your code more frequently.
So then, as @chris mentioned, this gives us that place at a tag where we could introduce things like an upgrade/transform/deprecation hook that takes care of the path-dependency problem. Such a hook could modify existing configuration, or prevent the service from starting and throwing an error. This way you don’t beat yourself over the head when the world moves forward, and we can create a safe way to deprecate pieces of existing plans for consumers of those plans.