Month: February 2020

CloudFormation Derriere Kicker

CloudFormation Derriere Kicker

Are you using the AWS Cloud Development Kit yet? If your job ever involves creating or maintaining infrastructure (on AWS, natch), you absolutely should be. Take the power of CloudFormation (minus the annoying parts) and mix it with the familiarity of your favorite programming language (Python, am I right? Yes I am. But TypeScript, Java, and .NET are also choices), and you get a killer option in the burgeoning Infrastructure as Code toolbox.

Instead of spending a bunch of time telling you why the CDK is so great, here’s some resources to help you to discover so for yourself:

Happy building!

OOPs I Did It Again

OOPs I Did It Again

Have you ever looked at a heavily object-oriented codebase (Java, I’m looking at you) and immediately felt dumber? Did the design patterns touted as the ultimate in software craftsmanship sound amazing when you studied them in school, but now every time you see them in real projects they result in layer upon layer of confusion?

You are not alone.

There is no one approach to rule them all when it comes to software development, and while I wouldn’t be quite as hard on object-oriented programming as the articles linked above (yes, there are multiple ones), it’s not the only, best, or even appropriate pattern to use in all circumstances.

What The Devil’s In

What The Devil’s In

AWS provides a number of fantastic managed services that make building applications quick and easy. At least for the most part. But there are plenty of interesting gotchas, and instances where the underlying details matter.

This past week I was working on an app that used the Simple Queue Service (SQS) to exchange messages between components, and I had implemented long polling to reduce the cost of repeated API calls. I’d also set a long visibility timeout because the processor took a significant amount of time to handle each message.

During the course of testing I was finding that messages were getting stuck in an “in-flight” state; given the long visibility timeout, this was causing delays in processing because the handler had to wait for the timeout to expire for these stuck messages. But I couldn’t initially figure out why the messages were getting stuck in the first place. I only had one handler thread; why were messages getting pulled in flight, but not getting processed and eventually removed?

It turns out the reason was that in the course of testing I was regularly killing off the handler with Ctrl+C and restarting it. And that terminate signal was cutting short the long poll API call into SQS. Why did that matter? Because a long poll call fires off a process on the AWS servers that is waiting for messages to show up on the queue so it can return them. That process continues to run even if the client that initiated it dies. Thus if a message shows up on the queue after the client goes away, but before the long poll time expires, it’s taken off the queue as “in flight”, but sits there until the visibility timeout hits because there’s nothing to subsequently process and delete it.

I was unable to figure out the above until I learned more about what actually happens within AWS during an SQS long poll. Finding this thread about the Node.js client helped too (I was writing my client in Python but the behavior is common across all SDK implementations). If I’d only been able to reason at the level of the queue abstraction, I’m not sure I could have solved the problem. Once again, descending into the particulars was the path to a solution.

Out Of Sight, Out Of Mind

Out Of Sight, Out Of Mind

Today I came across this statement from Alfred North Whitehead, and instantly loved it as an extension of my previous post on abstractions.

“Civilization advances by extending the number of important operations which we can perform without thinking about them.”

That to me is the essence of abstractions. Not that one needn’t ever be required to dig down into the implementation details, but that the layer on top of those details enables them to be ignored to an increasing degree.

Incidentally, this is the second Alfred North Whitehead reference I’ve come across recently, the first being a mention of his book Science and the Modern World in one of my favorite podcasts. Something tells me I need to dive deeper.

Turtles All The Way Down

Turtles All The Way Down

One could make an argument that computer science is the study of effective abstractions. It is no small challenge to build interfaces on lower-level details in a way that enables higher-level capabilities. But once in place, the higher-level constructs become the next layer’s low-level details, and exponentially-growing design power is unlocked.

Nowhere is this more apparent than in the explosion of cloud computing, where hardware itself has been abstracted away, where “serverless architectures” and “managed services” have enabled a form of “pure thought stuff” that Fred Brooks could only dream about.

At least in theory. In reality, there is no perfect abstraction in which the lower-level details become completely irrelevant. We do a disservice to software developers when we pretend that because high-level abstractions like AWS Lambdas exist that their underlying implementations never need to be understood. When things go wrong, the engineer must descend into the particulars, and an inability to minimally reason about, if not fully grasp, what lies beneath an abstraction can prove fatal to the debugging process.

Consider my previous post. Node’s package management system has enabled an explosion of abstractions that power some of the web’s best tools, but too often developers are not trained on what it’s doing or how to fix problems. Package documentation makes it sound so simple (“just run npm i and you’re golden!”) But if you want to use npm, you need to grok the details, or you’ll never be productive.

As another example, last week I was troubleshooting a deployment to Lambda, and the issue ended up being file permissions inside the zipped code package. One might be inclined to believe that since Lambda is “serverless” that the upload simply floats into the clouds and magically does its work. But of course that’s untrue: there is a server (with its myriad hardware abstractions), there is an operating system and corresponding system user, there is a disk to which those files are written, and there are file permissions on said disk. And if the files are not readable by the system user (e.g. if they were created on a machine with a restrictive umask) the Lambda cannot function. What seems a minor detail proves critical.

Is there a way to hide that detail from the user? Maybe? I don’t claim to understand the complex domain of cloud function implementations (if one had to do so to use them, few could), but I’m glad I had sufficient knowledge to know what to consider when I experienced trouble.