Tag: Dive Deep

That Kind Of Day

That Kind Of Day

You know when you’re trying to clean up a bunch of old AWS accounts but there’s no way to bulk close them so you have to click them one at a time and then click close and then copy and paste the account number to confirm and then there’s also a rate limit so you have to wait a minute between closures and then you hit a “10 account closures per 30 days” limit and no you can’t increase the quota says the documentation but you talk to support anyways and then they try to increase the quota but they can’t either so they suggest you log into each of them as root one at a time to close and you say “fine” but then the root email passwords are missing in your repository of credentials so you try to go through the forgot password flow but the root email address has two plus signs in it and no Exchange configuration you can think of to try seems to be able to accept such an email address and so you’re just outta luck…

ARGH!!!!!!!!!!!

UPDATE: So you finally get a shared mailbox set up with a carefully crafted alias that will receive email from the offending email address at least from your personal gmail account so in theory its working but then you retry the forgot password flow on the AWS login page and there’s a CAPTCHA and the first 4 times you try to solve it you get it wrong and then finally you get it right and the page claims it’s sent you password reset instructions to via email but you’ve waited 15 minutes and nothing has come through and yes you’ve checked the junk folder and what the hell am I doing with my life this is not what I dreamed a career in technology was going to be like and why can’t it involve more of the Property Brothers??

Failure To Communicate

Failure To Communicate

I’m a sucker for travel point programs (and gamification more broadly). I’ve achieved maximum status on American Airlines, and am almost there with Marriott. When you get to the upper tiers there are crossover benefits with other programs, but they require activation. Today I tried taking advantage linking my Marriott status to Hertz, but the website threw a delightful stack trace:

Error processing capillary request: Error converting value {null} to type 'System.DateTime'. Path 'profiles[0].fields.marriottstatusmatchd', line 1, position 792. - at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.EnsureType(JsonReader reader, Object value, CultureInfo culture, JsonContract contract, Type targetType) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateList(IList list, JsonReader reader, JsonArrayContract contract, JsonProperty containerProperty, String id) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateList(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, Object existingValue, String id) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent) at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType) at Newtonsoft.Json.JsonConvert.DeserializeObject(String value, Type type, JsonSerializerSettings settings) at Newtonsoft.Json.JsonConvert.DeserializeObject[T](String value, JsonSerializerSettings settings) at Brierley.HertzModules.Custom.CapillaryIntegration.CapillaryManager.d__8.MoveNext() in C:\devroot\htz-loyalty\code\Portals\CustomModules\CapillaryIntegration\CapillaryManager.cs:line 66 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Brierley.HertzModules.Custom.MemberStat.d__8.MoveNext() in C:\devroot\htz-loyalty\code\Portals\CustomModules\MemberStat.ascx.cs:line 305

Probably not a great thing to leak, because one can learn a lot from a stack trace. For example, I can see their webserver runs on Windows, that it’s based on C# and uses ASP.Net web user control files, and that it uses the Newtonsoft JSON framework. If I were nefarious, and hunting for vulnerabilities, that’s a treasure trove.

Also interesting is the namespace of the module: Brierley. A quick Google search tells me that The Brierley Group is “a recognized innovator in the design of Customer Loyalty programs” and is behind a number of the ones I use every time I travel. Who knew there were companies who specialize in this sort of thing? Learn something new every day I guess.

All Will Love Me And Despair

All Will Love Me And Despair

A joy of my life is hacking around on APIs so I can automate things that would otherwise be manual. I’ve done it with Ticketmaster, American Airlines, WordPress (i.e. this blog), the AWS Product API, a payment page, a bunch of internal Amazon tools, and now (I’m happy to say) Tableau.

Deep in the land of San Diego, in the fires of his MacBook, the Developer Jud forged a Python script, and into this script he poured his creativity, his manipulations, and his will to dominate all APIs.

One script to rule them all.

One by one, the free websites of the Internet fell to the power of the script. There were some who resisted… but the power of Chrome Dev Tools could not be undone.

Played The Fool

Played The Fool

I’m not into pranks, giving or receiving. Maybe it’s just because my years are limited, but I don’t generally appreciate being inconvenienced in ways that waste my time for no reason other than humor. It’s a bit like an individualized corollary of the broken window fallacy.

Because of the above I get somewhat hypersensitive around April 1. I feel I’m generally good at sniffing out the BS, but I got taken pretty hard this year, cleverly enough that I have to tip my hat.

So a website I visit regularly posts periodic brain teasers. The one on April 1 sounded innocuous enough. The gist:

Start with a number. If it’s even, divide by 2. If odd, multiply by 3 and add 1. Repeat enough times, and you’ll end up with 1. Prove why that’s the case for any starting number.

I’m a sucker for that sort of mathy puzzle, and I spent a decent amount of time throughout the day noodling on it. Well, here’s the deal. Known as the Collatz conjecture, this convergence to one is famously unsolved, described on the Wikipedia page as “an extraordinarily difficult problem, completely out of reach of present day mathematics.” Lovely, so you’re saying this Ph.D. dropout is unlikely to solve it?

To be fair, I should have known. Numeric conjectures that intermingle addition and multiplication are notoriously complex, despite their apparent simplicity. I used to joke that a life goal was to solve Goldbach’s conjecture, which states that every even natural number greater than 2 is the sum of two prime numbers. Apparently I said this enough at my first job that when I left, they gave me this fill-in-the-blank certificate as a gift:

It’s a good reminder that it’s the “easy” stuff you have to worry about most. It’s never five minutes.

Leap Day

Leap Day

The world is a complex place. Time is hard, as evidenced by the plethora of things going wrong today. Naming is hard. Designing architectures is hard. Getting GenAI right is hard when the answers really matter.

And as it turns out, color is hard too! Did you know there are “imaginary colors”? I didn’t? How cool!

This is not an argument to run away from technology, but to say that we who do this work must be vigilant and realistic. The answer to “how long” is never “five minutes”. And we must engage across a broad set of disciplines, because our own perspectives are limited.

When confronted with complexity, the wrong answer is to retreat to comfortable simplicity. Read. Listen. Have an open mind and broaden your view of the world.

Not All Who Wander

Not All Who Wander

There’s a danger in over-indexing on successful outcomes when evaluating a decision. As a LeBron fan I respect making the right play even if the shot doesn’t go down. When watching football (I hear there’s a game today?) I shake my head at coaches who punt when the data says taking a bigger risk is worth it. The same is true when making business decisions and evaluating technical tradeoffs.

Simple math makes the above obvious in certain cases. Whether a decision has a 90%, 60%, or even 51% probability of success, it is the right decision to make, even if it doesn’t work out (presuming the cost of failure is equal no matter what decision is made).

Of course a nice probability cannot be known in most real-world situations. It’s in those moments when it’s especially important not to focus too much on the outcome. Because a failed result doesn’t tell us anything certain about the original likelihood of success, as even 95% certainty fails 5% of the time.

I don’t say any of this to mean that a pattern of failed outcomes should be ignored, but that full context should be used in any process that attempts to evaluate the road that led to certain results.

Buckle Up

Buckle Up

There’s nothing like an effort to make sure all my years of accumulated data is backed up to kick up some nostalgia (not to mention an impending birthday). I doubt anyone else much cares, but this is my website and I’ll fill it up with digital relics from my past if I want to. Consider this fair warning.

We’ll get things started with this beauty, which I wrote September 24, 1992, if the file’s timestamp can be believed. Over 31 years old, it’s the oldest digital document I can find that I wrote myself.

I do not like to go to school. All the teachers do is teach you things you already were taught in 5th grade. That is, except for math and computer class. In math, we learn all about neat things, like 3y2+4(2x3+4). Mr. Farley is a great teacher, and the other teachers should teach like he does.

In computer class we learn about computers, such as this one, and about different computer programs. That is really neat for me because I enjoy working with computers, although some kids are really dumb when it comes to computers. But it is not like English, which is the same every single year. BORING!!!!!

I suppose that Science is O.K. Mr. Freese is pretty cool, and we learn some new stuff, and some old stuff. Like the scientific method. We learned it in 7th grade, and we learn it again now. It doesn’t make any sense.

This is my story about school. I hope that someday teachers will be able to read this and learn from it. Although they won’t listen to the small ideas from a thirteen year old boy, maybe they might get ideas anyway.

For the tech nerds, the file was in WordPerfect format (which definitely squares with the technology I was using in 8th grade), and opened perfectly on my Mac using LibreOffice.

More to come!

Resolution Recap

Resolution Recap

Relaxing on a much-needed holiday has given me time to wrap up a couple books, bringing this year’s reading to a close (I’ve also finally started Alexander Hamilton, but no way I’m finishing it on my return flight; it’s good but long).

Per my meta-resolution, I aimed to read 44 books this year. I’m finishing at 48, though a few only barely qualify. Here’s this year’s 5-star selections:

How did I do in my objective to read more non-male, non-white authors? The goal was 32 books, and I finished with 14 non-male, 15 non-white, and 4 both, for a total of 33. Mission accomplished? Quantitatively yes, but qualitatively, the mission of broadening horizons is never done; this will continue to be a focus area.

What will I aim for next year (besides the obligatory quantity)? For one, I intend to read more history and biographies. Given my job, I also am going to do more reading on politics and government. Should be fun!

Know Thyself

Know Thyself

It’s inevitable that over time I’m going to repeat myself here (including post titles). When I’m aware of potential similarities, I try to embed links back to those prior posts. A while back I noted an idea of building a thematic map of all my posts, but I wasn’t sure how to go about doing so. Now that I’ve learned some about embeddings, it was time to try my hand at it.

You can find the code I wrote to accomplish all of this on GitHub. I was inspired by the clustering section of the OpenAI cookbook, but took considerable liberties rewriting the code there, as I’m not a huge fan of typical data science code examples (they’re suitable for notebooks, perhaps, but rarely include meaningful names or breakdown into logical functions).

First, I had to actually fetch all the post content. I briefly toyed with the WordPress REST API, but couldn’t figure out how to enable it. No worries, though, RSS to the rescue! Unfortunately it’s XML, and I fiddled a bit with using lxml to parse the it, but stumbled upon feedparser which abstracted the details. Awesome!

Since it’s the de facto standard for Python data science, I loaded the posts into a pandas DataFrame. I’m still working on my fluency with pandas, numpy, scikit, and matlibplot, amongst other common tools, and I’m grateful for any opportunity to get their power under my fingers.

To compute embeddings for each post, I used the OpenAI API with the text-embedding-ada-002 model. It’s not good to store API keys in code; for local scripts I store all mine in the MacOS keychain using keyring. Nice and easy.

Since OpenAI usage costs money, I don’t want to repeatedly call the API with identical inputs if I don’t have to. That’s where cachier comes in (a library I help maintain) so results can be transparently saved to disk for subsequent use.

Once I had the embeddings, I used K-means clustering to group posts into common themes, and then t-SNE to reduce the dimensionality and produce a visualization of the clusters. To produce a summary of the theme of each cluster I took a sample of posts from each and shoved them into GPT4.

To start I tried using 2 clusters, which produced the following distribution:

Pretty interesting that there’s a natural grouping going on. Here’s the themes and sample posts:

Blue Posts

The theme of these posts is the author’s personal and professional experiences with technology, education, open-source contributions, ethical considerations, and the impact of travel and diversity on personal growth and the tech industry.

Orange Posts

The theme of these posts revolves around the reflections, experiences, and insights of a software developer navigating the challenges and nuances of the tech industry.

Of course I had to try with a variety of different numbers of clusters, so I reran with 3, 5, and 8 clusters as well (anyone see a pattern there?)

Of those graphs, to my eye the 5 cluster one seemed the best balance between having enough distinct themes without starting to look too arbitrary. Here’s the summarizations for it:

Blue Posts

The theme of these posts is the author’s personal and professional experiences, challenges, and insights related to technology, software development, and working within the tech industry.

Orange Posts

The theme of these posts revolves around the challenges, insights, and anecdotes from the world of software development and engineering management.

Green Posts

The theme of these posts is the multifaceted nature of software development, encompassing the importance of maintaining code quality, the broad skill set required for effective development, and the challenges and responsibilities that come with the profession.

Red Posts

The theme of these posts is the reflection on and sharing of personal experiences, insights, and best practices related to software development, including contributing to communities, understanding abstractions, effective communication, and professional growth within the tech industry.

Purple Posts

The theme of these posts is the author’s personal reflections on their experiences, interests, and philosophies related to their career, hobbies, and life choices.

What’s next? I’d like a quantitative way to evaluate the quality of the theme clustering and summaries produced. There’s a lot of non-determinism in the functions used here, and with some twiddling I bet I can produce improved results. I’ve got some ideas, but will save them for a future post.

Keep It Secret, Keep It Safe

Keep It Secret, Keep It Safe

AWS recently announced that blocking public access to published AMIs will be enabled by default. This is good news, as it’s an easy way to accidentally leak sensitive data. When I first started using GovCloud (2015 maybe?) I remember stumbling into a set of AMIs that, based on their names alone, clearly weren’t intended to be shared. Thankfully a quick note to AWS support and the offending party squared things away post haste, though I’ll never know if damage had already been done.

Horror stories are easily found online of the easiest way to make this mistake: turning on public access to an S3 bucket. Thankfully AWS has made taking this step difficult; in our internal accounts, in fact, without getting prior approval, creating a bucket with public access would get you a Sev-2 page in about 15 minutes. Unfun.

Which is why I found it so surprising to discover that in GCP, the only way I can tell to host a static website behind a CDN is to make the backing cloud storage bucket public. I mean, I recognize by definition it’s okay for the data to be Internet-accessible, but it meant turning off the “don’t allow public cloud storage” block project-wide, which seems a bad idea. Bad enough that the moment I hit that button I got a security warning via email. Am I missing something here? Would love to know if there’s a better way.

In any case, it’s going to be an adventure learning all these subtle differences as I broaden my cloud experience. Passing certifications is nice, but it’s no substitute for kicking the tires.

(Editor’s Note: I’m chuckling to myself as I add Amazon LP tags to a post that’s partly about GCP. Those things are burned into my brain forever).