Tag: Are Right A Lot

After My Own Heart

After My Own Heart

Went back in my draft backlog and found this gem from 2020: Unit Testing is Overrated.

In the age of AI-generated code, it feels even more applicable. When a model writes unit tests (especially when it does it in view of the code it’s testing) they’re at risk of being overfitted to the functions under test. They may indeed prove software executes as it’s written, but that has little to do with proving the software meets requirements (for example, Kiro had created hundreds of perfectly passing tests for this project).

The key takeaways from the article are all worth sticking in your coding agent instructions, because without explicit directives, LLMs are probably biased to do the opposite of these recommendations given the weight of training data pushing so-called “best practices.”

In particular, I can see value in using a separate (perhaps even adversarial) agent/model to write the tests. It’ll be less biased on the context used to write the code, and it can be instructed to “aim at the highest level of integration while maintaining reasonable speed and cost.”

Not All Rainbows and Unicorns

Not All Rainbows and Unicorns

As much as I love being a generalist and believe it’s the better end of the breadth/depth spectrum in the current tech environment, it doesn’t always feel like a great way to operate, for several reasons:

First, it’s often difficult for a generalist to describe their job, not just to folks like friends and family who are unfamiliar with their domain, but also to managers in 1-on-1s, performance reviews, and promotion documents (I’ve had firsthand experience with the latter). A term related to generalist, factotum, can even have negative connotations in certain settings, despite a “glue person” being an essential role within pretty much any organization. Specialists, on the other hand, are easier to understand, as their work is often simply “what it says on the tin” (i.e. their job title).

Second, while a generalist has proximate knowledge of many things, when they spend time with legitimate experts in a topic (which is often, because they’re so curious), they are quick to realize how much they don’t know. And since this happens so much, the experience of “I don’t know as much as that other person” compounds into a malaise of “eh, I don’t know much about anything.” Inferiority complex ensues.

When interests and aptitudes are varied, it’s also tough to determine where to focus, paradox of choice and all that. Similarly, while there’s value in doing extra at times, it’s easy to get distracted and fail to deliver the most important thing, losing the proper balance between work work and non-work work. This leads to a feeling that one is “mildly disappointing everyone all the time.” When I have that feeling, burnout lurks not far behind.

I don’t have a tidy conclusion to give here, it’s just stuff I’ve been sitting with.

On The Other Hand

On The Other Hand

Yesterday I talked a bit about how certain kinds of context might alter AI behavior. However, this research argues that maybe context isn’t that important after all, at least with certain kinds of tasks: Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

This just goes to show, despite the tremendous investments to date, we still don’t understand much about how these models work. That’s equal parts fascinating and terrifying.

Know What You Know

Know What You Know

For the past couple years I’ve doing work in the ecosystem of verifiable digital credentials, a space that perhaps is finally gaining some national traction given the introduction of the MATCH Act by Congressman Burgess Owens.

What are verifiable credentials? So glad you asked? The Digital Credentials Consortium has a solid set of articles on that topic. Here are several of my favorites:

My own work has been varied. For example, I’ve participated in a handful of standards working groups. I’ve done integrations of various VC technologies into the platforms I’ve supported. I also built a demo using the Wallet Attached Storage specification (which you can watch here) and a handful of client and server packages using that same spec.

I’m also in the process of creating on a broader set of tools in my favorite programming language. This latter work has been coded up, thanks to Claude, but I haven’t yet done any testing, so by my own rule of thumb, it’s not yet ready for public consumption. But perhaps soon!

Imagining Dragons

Imagining Dragons

Editor’s Note: I wrote the first draft of this post back in December, before I’d truly discovered Claude Code. Not sure it’d play out this same way now, several months later. I really ought to get back to it and find out.

I used Amazon Kiro to build a thing that I hope to publish eventually. But in the meantime, I’ll share an anecdote from my experience with it.

The spec-driven development model makes a lot of sense to me. In a few minutes with Kiro, I thought I had a solid description of what I wanted to build. Kicked off the tasks, let things cook for a while, and after a bit, I was told things were ready to test.

Not quite sure where to begin, I asked for a full end-to-end walkthrough in the README. The model wrote a great one with detailed, step-by-step command line instructions. I was excited to try it out. Opened up my terminal, Ctrl-C Ctrl-V-ed the first command, and… error: option not supported.

Tried another one, same thing. Weird.

Did a bit more investigation and came to a shocking realization: Kiro had hallucinated the entire walkthrough.

At first I was upset, but in truth, it was okay! Because I just told Kiro to read the README in detail, and turn the walkthrough into reality by building all the stuff it had invented, and retroactively put it in the spec.

Legitimate approach? Perhaps. But next time, maybe I’ll have it build the experience first, and then the code? Work backwards from the customer, anyone?

Coming Up For Air

Coming Up For Air

Believe it or not, I’m not going to say anything about Claude today.

I wrote a post a couple years ago about statistics I tracked while doing daily crossword puzzles. I took a couple years off, but last year I was back at it, this time using a calendar from the New York Times.

The NYT crosswords are supposed to get harder as the week goes on, with Monday being easiest, and weekend ones being the most difficult. I wanted to prove that out, so I noted my average solve time (capped at 30 minutes) on every puzzle, and then computed an average solve time for each day. The results are below:

Lo and behold, my experience aligns perfectly! I thought that was cool.

No Seriously, Don’t Stop

No Seriously, Don’t Stop

I’m starting to feel a compulsion to keep as many Claude Code terminals running as I possibly can. Ready for lunch? Try to kick off a large implementation. Bathroom break needed? Run a research project in parallel. Bedtime? Don’t you dare until you have your swarm of agent teams configured with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS and everything allowed thanks to --dangerously-skip-permissions.

Time to git add --all && git commit -m "yolo" && git push -f up in this business!

And is it time to graduate to Gas Town? I’m already using beads to good effect, and I’ve now reached Stage 6 on the Steve’s Evolution of the Developer chart.

Gift, What Gift?

Gift, What Gift?

It’s Christmas today, yay! In that spirit, I have two applications to share with the world. The first one I’ll talk about today, the other later this week.

My family loves to play games of all varieties, especially on holidays. An old favorite is Pinochle, which I first learned from my grandparents in Michigan (pretty sure card playing is the only thing to do in the Midwest in winter). Almost 3 years ago I first spoke about creating an online score tracking tool for Pinochle, and released in an initial form last year. Today it’s finally useable. Check it out at onlinescoresheet.net, scoresheet.info, scoresheet.mobi, or scoresheet.space (I do like lots of domain names). You can also find the source code on GitHub (completely AI-written).

This is a very bad Pinochle hand

What got it over the hump from “fiddly prototype” to “ready for prime time” wasn’t the choice of development tool or a eureka moment on my part. It was actual usage by real users others than myself. Putting it out there, and then convincing my family members (across a couple generations and device types) to try it. Got enough feedback to make a handful of critical improvements, and while it could certainly be better, it’s perfectly usable and doesn’t have any glaring functional bugs.

Usage is a gift. Seek it out, and don’t take it for granted.

Winning Business

Winning Business

My team just wrapped up a big proposal. I love the feeling when a month of writing and design work comes together. And it’s not just completion of the response itself that’s satisfying; it’s the culmination of all the groundwork that comes before, often a year or more of it.

Doing sales work requires a certain amount of brazen optimism that doesn’t come natural to many technologists, as we tend to be pessimists realists. To win customers’ trust (and their pocketbooks), you have to believe deeply that you are the best option for success. Deeply enough that it shows as genuine, because this kind of belief can’t be easily faked.

No, I’m not advocating for recklessly abandoning reality or straight-up lying. And yes, things are going to go wrong. We all know that. But the challenge of delivering a solution to a problem can’t solve itself. You can’t work it if you can’t win it. So yeah, you gotta first figure out a way to get into the ring, and leave future problems for your future self. Trust that you’re smart enough to address them, or at least be brave enough to risk failure.

I’m reminded of a chapter from The Geek Leader’s Handbook that talked about various approaches to proving truth in the workplace. While I wasn’t thrilled with the way the book broadly framed “geeks” vs “non-geeks” as fixed categories, it’s certainly the case with many engineers I’ve worked with (myself included) that we tend to disbelieve a statement until it’s proven true, and we especially don’t want to claim a fact without ample evidence. Whereas sales folks can operate more on hunch and gut feeling, believing something is achievable even when the outcome is unsure.

No where does this show up more than in the process of scoping and then selling projects; business folks need a cost (i.e. people x time) but engineers say giving such an a priori estimate is impossible. While the latter is true in the literal sense, it has to be done regardless. Rigid thinkers like myself have to get over themselves and do the best they can.

What I’ve found to work best is to find colleagues who bring a perspective at the other end of the “it must be proven before we call it true” and “it feels true so it is true” spectrum, and partner together on sales efforts. Is that not what we partly mean when we say there’s power in diverse thinking?

It’s even more powerful when some of these colleagues have been buyers of what you’re selling, because ultimately they’re the type of person you have to convince.