Tag: Invent And Simplify

What The Devil’s In

What The Devil’s In

AWS provides a number of fantastic managed services that make building applications quick and easy. At least for the most part. But there are plenty of interesting gotchas, and instances where the underlying details matter.

This past week I was working on an app that used the Simple Queue Service (SQS) to exchange messages between components, and I had implemented long polling to reduce the cost of repeated API calls. I’d also set a long visibility timeout because the processor took a significant amount of time to handle each message.

During the course of testing I was finding that messages were getting stuck in an “in-flight” state; given the long visibility timeout, this was causing delays in processing because the handler had to wait for the timeout to expire for these stuck messages. But I couldn’t initially figure out why the messages were getting stuck in the first place. I only had one handler thread; why were messages getting pulled in flight, but not getting processed and eventually removed?

It turns out the reason was that in the course of testing I was regularly killing off the handler with Ctrl+C and restarting it. And that terminate signal was cutting short the long poll API call into SQS. Why did that matter? Because a long poll call fires off a process on the AWS servers that is waiting for messages to show up on the queue so it can return them. That process continues to run even if the client that initiated it dies. Thus if a message shows up on the queue after the client goes away, but before the long poll time expires, it’s taken off the queue as “in flight”, but sits there until the visibility timeout hits because there’s nothing to subsequently process and delete it.

I was unable to figure out the above until I learned more about what actually happens within AWS during an SQS long poll. Finding this thread about the Node.js client helped too (I was writing my client in Python but the behavior is common across all SDK implementations). If I’d only been able to reason at the level of the queue abstraction, I’m not sure I could have solved the problem. Once again, descending into the particulars was the path to a solution.

Turtles All The Way Down

Turtles All The Way Down

One could make an argument that computer science is the study of effective abstractions. It is no small challenge to build interfaces on lower-level details in a way that enables higher-level capabilities. But once in place, the higher-level constructs become the next layer’s low-level details, and exponentially-growing design power is unlocked.

Nowhere is this more apparent than in the explosion of cloud computing, where hardware itself has been abstracted away, where “serverless architectures” and “managed services” have enabled a form of “pure thought stuff” that Fred Brooks could only dream about.

At least in theory. In reality, there is no perfect abstraction in which the lower-level details become completely irrelevant. We do a disservice to software developers when we pretend that because high-level abstractions like AWS Lambdas exist that their underlying implementations never need to be understood. When things go wrong, the engineer must descend into the particulars, and an inability to minimally reason about, if not fully grasp, what lies beneath an abstraction can prove fatal to the debugging process.

Consider my previous post. Node’s package management system has enabled an explosion of abstractions that power some of the web’s best tools, but too often developers are not trained on what it’s doing or how to fix problems. Package documentation makes it sound so simple (“just run npm i and you’re golden!”) But if you want to use npm, you need to grok the details, or you’ll never be productive.

As another example, last week I was troubleshooting a deployment to Lambda, and the issue ended up being file permissions inside the zipped code package. One might be inclined to believe that since Lambda is “serverless” that the upload simply floats into the clouds and magically does its work. But of course that’s untrue: there is a server (with its myriad hardware abstractions), there is an operating system and corresponding system user, there is a disk to which those files are written, and there are file permissions on said disk. And if the files are not readable by the system user (e.g. if they were created on a machine with a restrictive umask) the Lambda cannot function. What seems a minor detail proves critical.

Is there a way to hide that detail from the user? Maybe? I don’t claim to understand the complex domain of cloud function implementations (if one had to do so to use them, few could), but I’m glad I had sufficient knowledge to know what to consider when I experienced trouble.

JaPythoScriptML

JaPythoScriptML

More and more I’m discovering that I consume media on my laptop instead of traditional “living room TV”. To some degree it’s a resource contention issue with the family, but even when the TV is available I find watching on a personal device simpler.

An unexpected side effect of that is that I find myself taking screenshots and closely inspecting examples of “source code” that appear onscreen (my genre of choice is sci-fi, so there are plenty of opportunities). The following is from an episode of Westworld:

Looks like some odd mixture of HTML, JavaScript, and Python. Clearly some design work went into this display, even though it was only shown for about second. I can’t help but wonder whose job it is to build out these designs. Must be fun.

Duplicity

Duplicity

The problem I described in my last post is relevant to more than just useless code comments. Check out these helpful and totally not redundant download instructions I found on a website today:

I sure am glad the extra text is there to clear up any confusion on which version I needed.

The Bible Of Software Engineering

The Bible Of Software Engineering

One of my all-time favorite passages on software development.

The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures.

Yet the program construct, unlike the poet’s words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

Frederick P. Brooks, The Mythical Man-Month

Be A Polyglot!

Be A Polyglot!

A couple of days ago a colleague of mine was reading out loud some code he was debugging, and I could have sworn he was direct quoting from Execution In The Kingdom Of Nouns.

That story is one of my favorite warnings against the notion that software development is a “one size fits all” proposition. There’s great value in being fluent in a number of tools and approaches, and using the right tool for the job. Can this be taken to an unhealthy extreme? Of course, but in general the benefits outweigh the drawbacks.

High-Powered Foot Assault Rifle

High-Powered Foot Assault Rifle

Giving a user too much flexibility can be a dangerous thing. Better to do the hard work of separating actual requirements from “desirements” (as an old government customer of mine used to say).

Yesterday’s lesson: the Perl HTML::Tidy library doesn’t handle certain UTF-8 encoded characters properly. In particular, it truncates a non-breaking space (code c2 a0) into a bare control byte (c2), which at best gives a mangled character, and at worst an invalid encoding sequence. Pass that result to another utility, and you can expect failure.

The above incident was particularly difficult to debug because the app in question does not keep edit history. I was reminded how valuable it can be to track changes over time, not just in source code, but about the data on which an application operates as well. Made me think that git might not be a half-bad back-end storage mechanism in many situations. Anyone have experience with that?

Slimming Down

Slimming Down

Yesterday I migrated a bunch of microservices from the baseline Node 6 Docker image to an Alpine Linux version that I custom-built to be as small as possible. Average image size went from 800MB to 80MB across the 6 services, for a total savings of over 4GB per deployment.

Not a bad way to spend a day.

Covfefe

Covfefe

Even the smartest and most experienced developers make bone-headed mistakes, the kind that require hours of painful debugging only to find a comma was misplaced, or a one should have been a zero. While a good amount of effort should go into avoiding these errors, the energy required to prevent them completely would be astronomical. A better approach is to balance prevention and detection, hence the importance of good peer review and solid automated testing.