Author: Jud

Technologist interested in building both systems and organizations that are secure, scaleable, cost-effective, and most of all, good for humanity.
Turtles All The Way Down

Turtles All The Way Down

One could make an argument that computer science is the study of effective abstractions. It is no small challenge to build interfaces on lower-level details in a way that enables higher-level capabilities. But once in place, the higher-level constructs become the next layer’s low-level details, and exponentially-growing design power is unlocked.

Nowhere is this more apparent than in the explosion of cloud computing, where hardware itself has been abstracted away, where “serverless architectures” and “managed services” have enabled a form of “pure thought stuff” that Fred Brooks could only dream about.

At least in theory. In reality, there is no perfect abstraction in which the lower-level details become completely irrelevant. We do a disservice to software developers when we pretend that because high-level abstractions like AWS Lambdas exist that their underlying implementations never need to be understood. When things go wrong, the engineer must descend into the particulars, and an inability to minimally reason about, if not fully grasp, what lies beneath an abstraction can prove fatal to the debugging process.

Consider my previous post. Node’s package management system has enabled an explosion of abstractions that power some of the web’s best tools, but too often developers are not trained on what it’s doing or how to fix problems. Package documentation makes it sound so simple (“just run npm i and you’re golden!”) But if you want to use npm, you need to grok the details, or you’ll never be productive.

As another example, last week I was troubleshooting a deployment to Lambda, and the issue ended up being file permissions inside the zipped code package. One might be inclined to believe that since Lambda is “serverless” that the upload simply floats into the clouds and magically does its work. But of course that’s untrue: there is a server (with its myriad hardware abstractions), there is an operating system and corresponding system user, there is a disk to which those files are written, and there are file permissions on said disk. And if the files are not readable by the system user (e.g. if they were created on a machine with a restrictive umask) the Lambda cannot function. What seems a minor detail proves critical.

Is there a way to hide that detail from the user? Maybe? I don’t claim to understand the complex domain of cloud function implementations (if one had to do so to use them, few could), but I’m glad I had sufficient knowledge to know what to consider when I experienced trouble.

To Alcohol and WiFi!

To Alcohol and WiFi!

The causes of, and solutions to, all of life’s problems.

It isn’t often that an intermittent network connect is a benefit, but in this case a connection hiccup actually tipped me off to a useful workaround.

When you’re an engineering manager, you’re “important”, which means you have to go to a lot of meetings. And because you’re so very “important”, you can’t be troubled to close your laptop when walking across the office to said meetings, because you might miss someone’s giphy on Slack. Pretty sure I looked like an idiot, but that’s the price you pay for being in charge. Or something.

Anyhow, I’d been fighting an npm issue all morning (natch), where a particular module (bcrypt) was core dumping on my Mac. Not cool, bcrypt, not cool. Couldn’t figure out what was going on, but as is typical, “have you tried erasing your node_modules folder and re-running npm install?” Actually I had, but I was getting desperate, so thought I’d give it one more go. While simultaneously picking up my laptop to head to a meeting (keeping it open as I walked, because “important things” happening on it).

I arrived at the meeting (no idea what it was about, also pretty typical), and when the npm install had finally finished, I tried the program again, and lo and behold, it worked! I think at this moment I audibly exclaimed my excitement, despite the outburst not fitting the context of the meeting, that’s how happy I was. But I was also a bit befuddled. What had changed?

So I pored over the logs, both from the install that didn’t work, and the one that had (God bless anyone that ever has to review an npm log, it’s a special kind of hell).  Check out extracts from the install that failed to run, and the one that worked. Do you see the difference?

Please go look. I’ll wait.

Figure it out?

Did you notice that the binary of bcrypt failed to download in the second log, and npm fell back to compiling from source? That was the secret! Something must’ve been wrong with the prebuilt version for Mac. Now, I never solved what caused the crash in that build, but it was easy enough to work around it with npm --build-from-source.

But the real serendipity was the likely cause of the download failure. The only explanation I can think of is that our office’s crummy WiFi happened to flake out briefly as I was carrying my open laptop across the hall, just at the moment when the bcrypt binary was being downloaded, causing it to fail. But the network was back by the time the source tarball was downloaded, and the reset of the process finished normally.

Even as I write, it sounds preposterous. What are the odds? Maybe it was something else, I don’t have any proof. But you’ll never convince me.

A Tale As Old As 2001

A Tale As Old As 2001

For the next week or two I’m going to go back through my old drafts and finish them up. That means the stories are at least a year or two old. For this one, I’m curious if Edge finally changed the behavior. Anyone want to try it out?

When you’re debugging a pernicious issue, there’s no greater feeling than Google search auto-completing your first couple search terms and matching a page that describes your problem to a T. The challenge of course is figuring out those magic couple of words.

The team was recently trying to figure out an IE11-only problem (ugh) where our authentication mechanism was failing, but only for a subset of customers, with no obvious commonality. The server would return a Set-Cookie header, but the browser completely ignored it. WTF, Microsoft!

We’d spent an entire day trying to come up with a solution, until finally stumbling into the root cause: underscores in the subdomain. Chrome and Firefox are cool with them, but IE silently refuses to store cookies when they’re present. The details are a fascinating combination of unexpected side effects from a bug fix, misinterpreted web standards, and lingering backwards compatibility. This post captures the story nicely.

My product manager had never been thrilled with the way we’d been handling domain names. While I couldn’t have anticipated our design would lead to this misadventure (and a simple s/_/-/ solved the problem), I probably should have given his critique a closer listen.

It’s Been Awhile

It’s Been Awhile

Howdy friend. It’s been quiet here for some time now, but as is typical around a new year, I’m renewing my efforts to stay active on this blog (especially since I’ve mostly stopped using social media). This is in no small part to me now working for AWS Professional Services as a Senior Consultant in the public sector, a role for which improving my writing will be particularly valuable.

My silence should not be interpreted as inactivity, because a heck of a lot has gone down since I last posted:

  • Got promoted to the Director of Engineering for a 20+ person team (this actually happened in late 2017 but I’ve never mentioned it here)
  • Led that team through a painful acquisition process that required reducing the team by about a third
  • Experienced the joy of having a paycheck delayed by two full weeks during the holiday spending season
  • I celebrated my 40th birthday with a trip to Germany and Ireland
  • Was laid off when my employer ran out of money, without warning and with no final paycheck (about this much more could be said, but going to keep it short for now)
  • Dipped my toes into independent consulting for a few months while searching for a new job
  • Was hired by Amazon as a Senior SDE to work on their Last Mile team (the folks that get packages from delivery stations to your doorstep)
  • Transferred to AWS as I mentioned above

Pretty bonkers 18 months, but things are starting to settle down, and I’m eagerly anticipating the new normal of 2020. More to come!

JaPythoScriptML

JaPythoScriptML

More and more I’m discovering that I consume media on my laptop instead of traditional “living room TV”. To some degree it’s a resource contention issue with the family, but even when the TV is available I find watching on a personal device simpler.

An unexpected side effect of that is that I find myself taking screenshots and closely inspecting examples of “source code” that appear onscreen (my genre of choice is sci-fi, so there are plenty of opportunities). The following is from an episode of Westworld:

Looks like some odd mixture of HTML, JavaScript, and Python. Clearly some design work went into this display, even though it was only shown for about second. I can’t help but wonder whose job it is to build out these designs. Must be fun.

Truth And Consequences

Truth And Consequences

Yesterday I was in an all day meeting preparing for a large customer demonstration. Ran into a bug that turned out to be a misunderstanding of how JavaScript handles truthiness. Consider the following code:

if (person.address) {
  console.log('Address is ' + person.address);
}
else {
  console.log('No address.');
}

Seems clean enough right? Not so fast. If person.address is null, no problem. However, if person.address is an empty object, that evaluates to true, and the code fails to do the right thing. To me at least, this is non-intuitive behavior.

I started this blog with a discussion of why I love Python, and once again it behaves more intuitively. Empty dictionary, empty list, None, and empty string all evaluate to False. So the code works in a broader-variety of cases:

if person['address']:
    logging.info('Address is ' + person['address'])
else:
    logging.info('No address.')

Isn’t that nice and clean?

As an aside, most developers have become so accustomed to bracketing punctuation (e.g. braces, semicolons) that they assume there’s no other way. Personally I’ve come to love the lack of noise in Python syntax, and I think you will too.

Duplicity

Duplicity

The problem I described in my last post is relevant to more than just useless code comments. Check out these helpful and totally not redundant download instructions I found on a website today:

I sure am glad the extra text is there to clear up any confusion on which version I needed.

This Is A Post

This Is A Post

A friend of mine suggested to me a few days ago that the recent Apple vulnerability might have been avoided if the (supposed) offending code had been commented. Perhaps, but perhaps not.

Code comments are a tricky business. Everyone knows they’re a “good thing” but that doesn’t mean every comment is a good one. Blindly them in quantity can actually make code legibility worse. I don’t go as far as Uncle Bob, however, who considers every comment “a failure to make the code self-explanatory.”

For me, a great rule of thumb is that code itself should be expressive enough to communicate the “what”, and comments should be used to explain the “why”. An example is instructive:

# If user is root and there is no root password, don't do the thing
if user == 'root' and password is None:
    dontDoTheThing()
else:
    doTheThing()

See how the comment doesn’t provide any information beyond what the code says? Pretty unhelpful. What a developer who’s asked to maintain this code needs is context, like the following:

# By default an installation of MacOS does not set a root password, thus root
# should never be used as a privileged account unless a password has been set
if user == 'root' and password is None:
    dontDoTheThing()
else:
    doTheThing()

Much better. A comment like that, and maybe Apple doesn’t end up in the headlines.

Even If It’s Broke, Don’t Fix It

Even If It’s Broke, Don’t Fix It

I doubt we’ll ever learn the root cause of a particularly nasty security vulnerability recently revealed in MacOS High Sierra. But it’s fun to theorize. I’d wager it went down about like this:

  1. In the earliest days of the Darwin kernel, it was decided not to set a default root password, because it could cause usability issues for non-technical people buying Macs
  2. As a consequence, any dialogs or other interfaces that ask for login information had to have special cases built into them to not allow root as an option, or at least not to allow it if no password had been set
  3. Over the years, the number of such widgets grew, requiring more and more little special case checks
  4. The reasoning behind the decision at step one was forgotten, probably to the point where entire teams of developers didn’t even know MacOS has no default root password
  5. While building the High Sierra release, a developer (probably new and/or junior-ish) noticed an odd bit of special-case code in a library that was getting in the way of a new feature being added; a few clicks later and the seemingly unnecessary check was removed, and now the feature worked
  6. QA regression tests the release and finds no problems, because no one thought to try using the unprotected root account in various places (see #4)

Good times, Apple. I feel your pain.

An Apple A Day

An Apple A Day

Anyone want to speculate on how many .DS_Store files and other MacOS cruft have been inadvertently uploaded to Google Drive, Dropbox, etc? Every time I see these things it’s a reminder that when designing a system, one should never assume it’s always going to be used within whatever comfortable little ecosystem the engineer envisions for it.