Author: Jud

Technologist interested in building both systems and organizations that are secure, scaleable, cost-effective, and most of all, good for humanity.
Off To The Races

Off To The Races

In my previous post I mentioned an issue I had when building a CDK construct. As promised, today I’ll go through the problem I found: a dreaded race condition, which anyone who’s spent much time debugging software knows is a pernicious type of situation where behavior varies depending on the order in which various parts of the system execute, causing intermittent failures.

For background, part of the power of CDK is that it provides a framework for executing raw AWS API calls as part of a larger deployment. This is useful in numerous circumstances. For my construct, it enabled me to output several Managed Blockchain parameters that are available via API call but not from CloudFormation.

Under the hood these API calls are executed in a Lambda function that is created just for this purpose. This function has an IAM role, to which various permission policies are applied. For efficiency, it is only created once during a deployment, and then shared across all the API calls in the stack.

In order to keep my code well-organized, I’ve broken out the API calls in several places: one to gather data on the network member, and another to gather data for each peer node. And as a security best practice I want the permissions to be scoped as narrowly as possible. That means at each point in my construct where I call the function, I attach a policy that allows access only to the specific member or node being queried, via an explicit identifier.

Here’s the problem: IAM is an eventually consistent service, and thus policy updates are not immediately effective. Typical propagation time is only a few seconds, but it can take longer in certain circumstances. For the first custom API call in a CDK stack this is not an issue. The policy and role is created, and then the Lambda is created, the latter taking over a minute to be fully instantiated because it upgrades its dependencies at launch. However, on subsequent calls, because the Lambda is already warmed up, it runs immediately after the preceding policy update, and about half the time said policy is not yet effective, and the function fails due to a permission error.

It’s the “sometimes it works, sometimes it doesn’t” nature of race conditions that make them so difficult to track down. Thankfully I was able to identify and document my experience and pass it along to the CDK team. Anyone want to take a crack at a solution? I described several possible approaches in my write-up, with the “simple retry logic” approach likely being the best.

Put A Bow On It

Put A Bow On It

Despite writing a bit about CDK nearly two years ago, it’s taken me some time to get a chance to really lean into it. Having now built out a couple real projects, I can confidently say that like it has its rough edges, like any technology, but overall it’s both powerful and fun.

If you’re so inclined, feel free to check out my most recent creation, a construct for deploying a Hyperledger Fabric network on Amazon Managed Blockchain.

# Easy as pie!
HyperledgerFabricNetwork(
    self, 'MyNetwork',
    network_name='MyNetwork',
    member_name='MyMember',
)

For my next trick post I’ll go through one of the aforementioned rough edges I discovered, and how it can hopefully be fixed.

Resolute Comprehension

Resolute Comprehension

I really like New Year’s resolutions. As a lover of habit, the beginning of a year is perfect time to calibrate a new routine. This year I have two resolutions:

  1. Post on this blog at least once per month
  2. Learn a new programming language

The latter was inspired by this article, which is stupidly long but thoroughly enjoyable. As a non-fan of OOP I found myself nodding along quite frequently. He advocates pretty hard for functional languages; while I’m familiar with the paradigm having used it in Python, I haven’t done much with purer forms. In 2022 I intend to change that, probably by learning Clojure.

Erlang and Go are also on my to-learn shortlist, the former for its first-class support for concurrency, the latter because it’s the new hotness for performant APIs.

In other news, I’m working on publishing my first CDK construct, which I’ll share here when it’s ready. I do wish I didn’t have to write it in TypeScript, but sadly I’m at the mercy of the JSII compiler. Why TS doesn’t have first class support for comprehensions boggles my mind. This is the closest I could get:

Array.from(nodeProps.entries()).map(e => new HyperledgerFabricNode(scope, `Node${e[0]}`, e[1]));

Compare that with a Python equivalent:

[HyperledgerFabricNode(scope, f'Node{i}', p) for i, p in nodeProps.enumerate()]

For shame, TypeScript. For shame.

A Number Of Numbers

A Number Of Numbers

Back in my math major days in college, I was introduced to the Online Journal of Integer Sequences. It’s exactly what it says on the tin. As part of a class we were encouraged to contribute, which I did.

A few weeks ago I had another idea for a submission, and to my surprise no one else had added it, so once again I had opportunity to contribute a little piece of Internet history.

Here’s a complete list of the sequences I’ve authored over the years:

If the above isn’t enough online notoriety, check out my only published mathematical work, A Probabilistic View of Certain Weighted Fibonacci Sums. I was only a mild contributor, but still got an authorial credit, which is pretty cool.

Putting Pen To Paper

Putting Pen To Paper

I spent some time today chatting with an early career colleague who’s looking towards a future career in the dark arts software development. Being the kind of person that enjoys the sound of my own voice, I enjoy these opportunities to pontificate. One piece of advice I routinely give is that good engineers write well, and developing this skill will pay itself back with copious dividends.

However, that shouldn’t be read to mean that quantity trumps quality. Far from it, keeping things brief is usually more difficult than not (I know I’m bad at rambling, kinda like I am now). Which is why I found this little article on the value of the humble readme such valuable advice.

The other day I published a blockchain solution on Github, and while I’m pretty proud of the code, the readme is in bad shape (as of today at least). For my next project (a refactoring of the core of this solution into a reusable CDK construct) I think I’m going to write the readme first, as the above article suggests. We’ll see how it goes!

Hodgepodge Advice

Hodgepodge Advice

I’m a sucker for lists that contain pithy nuggets of truth. Here’s two great ones I found this week:

Some of my favorite statements, in no particular order:

If you don’t have a good grasp of the universe of what’s possible, you can’t design a good system

Every system eventually sucks, get over it

Software engineers should write regularly

Always strive to build a smaller system

KISS, don’t be afraid, and boring > cool

The bottleneck is almost always the database

Doomed To Repeat It

Doomed To Repeat It

Technologists are particularly susceptible to recency bias. It’s one reason why I try to read older computer science literature from time to time (especially work from the 60s and 70s). The Mythical Man-Month is my canonical example; it should be required reading for everyone who works with technology. The Psychology of Computer Programming contains timeless truths of what it takes to lead a team of software engineers. Donald Knuth’s The Art of Computer Programming is a dense, three volume work, but much treasure lies within. I’ve only finished the first book, but I came away with tremendous respect for the geniuses that paved the way for us fortunate souls who have IDEs, fast compilers, and gigabytes of RAM.

Today I read On the Criteria To Be Used in Decomposing Systems into Modules, a research paper by D.L. Parnas of Carnegie-Mellon University, published in 1972. While the details of the middle section weren’t terribly interesting, it’s the bookends of introduction and conclusion that impressed me. The benefits of two-pizza teams were clearly understood fifty years ago, for example (“separate groups would work on each module with little need for communication”) and the paper lays out a novel approach to decomposition (to me, at least):

“We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others.”

The above resonates with prior posts I’ve written on abstractions, especially Out of Sight, Out of Mind. If the goal of abstraction is to hide difficult detail, we ought to modularize with that goal front-and-center.

The Butterfly Effect

The Butterfly Effect

I was privileged to have access to computers from an early age, from the humble TI-99/4A of my early elementary years to a snappy Pentium in high school (can’t remember the exact model, but it was pretty expensive; perhaps the 133MHz version?) The influence this access had on my life cannot be overstated.

Young Jud on TI-99/4A
Train up a child in way they should go

Now that I’m firmly in middle age, and on a career path where I’m regularly evaluating technical talent, I’m reminded of that privilege, and how so many didn’t have it then, and some still don’t have it now. How much untapped potential there must be within these groups!

If we’re going to overcome the lack of diversity in tech, it starts with access; early access, when life-long perceptions are formed. As the saying goes: the best time to plant a tree is 25 years ago, but the second best time is today. Gotta get planting!

Takin’ Care Of Business

Takin’ Care Of Business

Love them or hate them, ticketing systems like Jira or Asana are an essential part of modern software engineering. Misuse is rampant, but wielded well, ticketing systems align teams around common goals, unburden them from pointless status meetings, and unlock accelerated development with better accountability and fewer defects.

What are some ticketing best practices, you ask? I have thoughts:

  • Work of any significance should be captured in a ticket. My rule of thumb is anything that takes longer than a few hours.
  • Tickets are not a substitute for good requirements and design documentation. Instead they should capture specific tasks to be accomplished, including links to other docs as needed.
  • Writing tickets is not only the responsibility of the team lead or product manager; a team should own its ticket board. If the current set of tickets does not match the reality of what is being worked on, make it so be rewriting, breaking up, and deleting tickets as appropriate.
  • Being assigned a ticket is a form of promise. It says to the team “I will accomplish this task in this amount of time.” Each engineer should thoroughly understand all of the tickets assigned to them, proactively seeking out guidance if they don’t, and making clarifying edits as needed.
  • Know how metrics such as velocity and completion dates are computed. Their value is only as good as the ticket data used to calculate them (garbage in, garbage out), so ensure values like story points are accurately tracked.
  • Avoid Bricklin’s Law by embracing healthy use of a ticket system, and others won’t feel the need to add more status tracking. Done perfectly, it renders other status checkpoints redundant, maybe even those daily standups you dread.
  • Finally, tickets are a means, not an end. Don’t lose sight of your ultimate goals. If your ticketing process isn’t helping you achieve those goals, change it.

The above list is far from exhaustive. Any other suggestions out there?

There’s Gold In Them Thar Hills

There’s Gold In Them Thar Hills

In my conversations with fellow engineers, git comes up quite a bit. I find myself regularly giving advice both tactical and strategic on its effective use. Learning it in detail is a force multiplier, but few people do. Part of the problem is that training materials are all over the map.

Which is why I was so pleased to discover Git from the inside out. Without question the best introduction to git I’ve come across. It perfectly balances teaching basic commands while also explaining what’s actually happening. Despite having used git at a fairly advanced level for 10 years, I still learned some new things, for example that each git add creates an immutable blob object that is retained for a while even if you git add the same file again, and even if you never commit it. Also that it’s pretty easy to decode raw git objects should you ever need to; here’s a script I wrote to do just that, if you’re curious.

I’ve said before that abstractions are valuable, but they’re not excuses to avoid learning internals, because critical information lies beneath the surface. At the risk of pretentiously quoting myself:

When things go wrong, the engineer must descend into the particulars, and an inability to minimally reason about, if not fully grasp, what lies beneath an abstraction can prove fatal to the debugging process.

I didn’t write the above with version control in mind, but I surely could have. Engineering organizations are full of developers who run stuck the moment a git command fails. You don’t have to be that developer!