A curiosity journal of math, physics, programming, astronomy, and more.

Annotating changes in diff files

Besides troubleshooting branches and splitting branches, I've also used diff files to annotate changes in a branch. That's a less common analytical task, not a regular part of my workflow, but having already used diff files for other purposes, using one to summarize some repetitive changes was natural.

In this particular case, I introduced a new vertical slice of functionality, which required adding a lot of new classes. I became curious how many classes had to be added, how many methods, and how many imports, to have a better idea how much similar work might be required for future vertical slices. I tried going through the changes and counting, but the counts were difficult to track. Instead, I added comments to the diff.

The unified diff format output by Git doesn't natively support comments, but since I wasn't using the diff afterward, it didn't matter. I used # in the first column to denote a comment, then added comments for any changes of interest, such as class added, method changed, import added, external interface changed, unit test added, etc. Afterward, it was simple to tally the changes with a Bash command:

cat annotated-changes.diff | grep '^#' | sort | uniq -c | sort -nk1r

That finds all the comment lines, sorts them, counts unique lines, then sorts by count, with the most common changes listed first. With those results, I got a quick bird's-eye view of the vertical slice. It took some manual tagging within the diff, but I didn't want to write even a rudimentary parser for a one-off task I could do the hard way in a few minutes. And that annotated diff may be useful in the future, such as if a stakeholder asks why there's so much friction for a new feature, I can provide specifics on how a dozen new classes were needed (for transporting data between different layers) and how three times as many imports were needed to make those classes available in the expected places.

If you have other suggestions for using diff files as part of your workflow, please email me.