Tag: Learn And Be Curious

That’s Good Advice

That’s Good Advice

Part of the CTO job is being conversant in a broad set of technical domains. I’ve never been a data engineer, but a current project has need, and thus I’ve been getting up to speed.

Spent some time on a flight this morning reading Amazon Redshift documentation, and found this beauty:

How helpful, Amazon: a best practice for loading data is to first learn how to load said data? Wouldn’t have guessed that. I wonder what other wonders of wisdom await me…

Go With It

Go With It

Last week I finished The Principles of Product Development Flow, based on a recommendation from the article I wrote about in Fightin’ Words. To say it was relevant to my current line of work is an understatement; I immediately mailed a copy to our de facto development process person.

Luckily for us all, though, there’s an official TL;DR version online. It’s a plethora of pithy principles packed onto one page. Go check it out!

Failure To Communicate

Failure To Communicate

I’m a sucker for travel point programs (and gamification more broadly). I’ve achieved maximum status on American Airlines, and am almost there with Marriott. When you get to the upper tiers there are crossover benefits with other programs, but they require activation. Today I tried taking advantage linking my Marriott status to Hertz, but the website threw a delightful stack trace:

Error processing capillary request: Error converting value {null} to type 'System.DateTime'. Path 'profiles[0].fields.marriottstatusmatchd', line 1, position 792. - at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.EnsureType(JsonReader reader, Object value, CultureInfo culture, JsonContract contract, Type targetType) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateList(IList list, JsonReader reader, JsonArrayContract contract, JsonProperty containerProperty, String id) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateList(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, Object existingValue, String id) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue) at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent) at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType) at Newtonsoft.Json.JsonConvert.DeserializeObject(String value, Type type, JsonSerializerSettings settings) at Newtonsoft.Json.JsonConvert.DeserializeObject[T](String value, JsonSerializerSettings settings) at Brierley.HertzModules.Custom.CapillaryIntegration.CapillaryManager.d__8.MoveNext() in C:\devroot\htz-loyalty\code\Portals\CustomModules\CapillaryIntegration\CapillaryManager.cs:line 66 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Brierley.HertzModules.Custom.MemberStat.d__8.MoveNext() in C:\devroot\htz-loyalty\code\Portals\CustomModules\MemberStat.ascx.cs:line 305

Probably not a great thing to leak, because one can learn a lot from a stack trace. For example, I can see their webserver runs on Windows, that it’s based on C# and uses ASP.Net web user control files, and that it uses the Newtonsoft JSON framework. If I were nefarious, and hunting for vulnerabilities, that’s a treasure trove.

Also interesting is the namespace of the module: Brierley. A quick Google search tells me that The Brierley Group is “a recognized innovator in the design of Customer Loyalty programs” and is behind a number of the ones I use every time I travel. Who knew there were companies who specialize in this sort of thing? Learn something new every day I guess.

All Will Love Me And Despair

All Will Love Me And Despair

A joy of my life is hacking around on APIs so I can automate things that would otherwise be manual. I’ve done it with Ticketmaster, American Airlines, WordPress (i.e. this blog), the AWS Product API, a payment page, a bunch of internal Amazon tools, and now (I’m happy to say) Tableau.

Deep in the land of San Diego, in the fires of his MacBook, the Developer Jud forged a Python script, and into this script he poured his creativity, his manipulations, and his will to dominate all APIs.

One script to rule them all.

One by one, the free websites of the Internet fell to the power of the script. There were some who resisted… but the power of Chrome Dev Tools could not be undone.

Dig In

Dig In

Getting to know your professional colleagues at a personal level is risky. I regularly read advice to avoid it. That’s a reasonable strategy to avoid some of the lows of gainful employment, but it also hamstrings the chance to achieve truly beautiful successes, not to mention it forfeits a potent antidote to loneliness.

So yeah, not only am I going to ignore that advice, I’m doubling down on getting better at being a student of other people. To that end, last week I started reading How to Know a Person, from which I extracted the following list of conversation starters:

  • Which of your five senses is strongest?
  • What are you most self-confident about?
  • What’s working really well in your life?
  • What is the “no” you keep postponing?
  • What have you said “yes” to that you no longer really believe in?
  • What forgiveness are you withholding?
  • Tell me about a time you adapted to change?
  • Have you ever been solitary without feeling lonely?
  • Can you be yourself where you are and still fit in?
  • What crossroads are you at?
  • What would you do if you weren’t afraid?
  • If we meet a year from now, what will we be celebrating?
  • If the next 5 years is a chapter in your life, what is that chapter about?
  • What has become clearer to you as you have aged?
  • What is the best way to grow old?
  • If you died tonight, what would you regret not doing?

Full credit to David Brooks here, I’m just repeating his excellent ideas. Keep learning, friends!

Remote Learning

Remote Learning

Ohio in the early 90s had few educational options for a middle schooler interested in computers. But when there’s a will (and willing parents, thank you) there’s a way. Somehow I got signed up for a correspondence course in Pascal in 8th grade. Yes, an actual class where I never met in person (and only rarely spoke to the teacher on the phone). Where the majority of exchanges were via the good old fashion United States Postal Service. Where code had to be printed out, mailed, marked up, and mailed back (how’s that for slowing down rapid iteration!)

Despite it seeming painful to modern ideas of remote learning, the material was quite useful in my overall development. Up until then I was completely self-taught; reasonably good in BASIC and some rudimentary C. Learning Pascal, however, really opened up a new world. And luckily for you all, I still have a number of my Pascal programs, which I recently uploaded to Github for your browsing pleasure. Here’s the good stuff that awaits you:

  • MARKET.PAS – This one’s special for two reasons. First, it’s the oldest of all these files, with a last modified date of Dec 6, 1992, making it the earliest example of code I wrote that I still have in digital form (the absolute oldest being this handwritten BASIC program from 1987). And second, it was my attempt to implement the Stock Market Game, a board game from the 1970s that my mom and I played together when I was a kid. No one else in the family ever wanted to join; it was kinda “our thing” (as was Scrabble).
  • GRADE.PAS – A simple gradebook app for teachers. I believe this was the final project for my correspondence course.
  • CYBER.PAS & CYBORG.PAS – Today you couldn’t pay me enough to get into video game development, but as a youngling I had a thing for trying to build them. This code is a tiny step towards what looks like a side-scrolling shooter involving robots and lasers.
  • KARATE.PAS & KGRAPHIC.PAS – Another game effort, this one a fighter like Mortal Kombat, but with stick figures, because I am terrible at visual art. Pretty sure I got it to a reasonably playable state, though the mechanics were terrible and it required two people because there was no AI to speak of.
  • JDNCRYPT.PAS – Built this encryption tool to protect DIARY.TXT, which I still have (but no, I’m not gonna share it). Basically I reinvented a simple rotation cipher using an insecurely predictable pseudo-random number generator, with an easily bypassed magic parameter kill-switch on the executable. How cute. Rule one of cryptography: never ever write your own.
  • GAME133.PAS – In college a mathy friend of mine and I got really into the Number Jumbler. I wrote this solver to do research into combinations that had no solutions. Two years later when I started my first real job, I was tasked to learn Ada, and as part of that effort I ported this solver.

FYI, in upcoming posts I intend to expand on my personal tech history; including a visual history of my computer setups. Will it be of interest? Maybe! But I’m going to do it regardless.

Resolution Recap

Resolution Recap

Relaxing on a much-needed holiday has given me time to wrap up a couple books, bringing this year’s reading to a close (I’ve also finally started Alexander Hamilton, but no way I’m finishing it on my return flight; it’s good but long).

Per my meta-resolution, I aimed to read 44 books this year. I’m finishing at 48, though a few only barely qualify. Here’s this year’s 5-star selections:

How did I do in my objective to read more non-male, non-white authors? The goal was 32 books, and I finished with 14 non-male, 15 non-white, and 4 both, for a total of 33. Mission accomplished? Quantitatively yes, but qualitatively, the mission of broadening horizons is never done; this will continue to be a focus area.

What will I aim for next year (besides the obligatory quantity)? For one, I intend to read more history and biographies. Given my job, I also am going to do more reading on politics and government. Should be fun!

Evolution

Evolution

(Editor’s note: the past two posts, Mother Of Invention, Edge Case, and this one form a trilogy of sorts, all related to a particular project I’ve been digging into).

When I first needed a way to get access to AWS from a non-cloud-based computer, I implemented 3 options: hard-coded IAM user credentials (generally bad), user-based Cognito (okay but not super scalable), and X.509 via IoT (good technology, but cumbersome to set up).

This week I had a similar authentication need within an on-premises cluster, and was happy for the chance to learn the most up-to-date approach: IAM Roles Anywhere. I really appreciate the authors of these two blog posts who captured the step-by-step quite a bit better than the official documentation:

I used my own certificate authority because AWS Private CA is too dang expensive; $400 a month doesn’t grow on trees, ya know? Here’s the bash script to create the root CA:

mkdir -p root-ca/certs    # New Certificates issued are stored here
mkdir -p root-ca/db       # Openssl managed database
mkdir -p root-ca/private  # Private key dir for the CA

chmod 700 root-ca/private
touch root-ca/db/index

# Give our root-ca a unique identifier
openssl rand -hex 16 > root-ca/db/serial

# Create the certificate signing request
openssl req -new -config root-ca.conf -out root-ca.csr -keyout root-ca/private/root-ca.key

# Sign our request
openssl ca -selfsign -config root-ca.conf -in root-ca.csr -out root-ca.crt -extensions ca_ext

# Print out information about the created cert
openssl x509 -in root-ca.crt -text -noout

The output from the above is what’s used to create the Trust Anchor. Then here’s a script to create a certificate for the process that will be authenticating:

# Provide a name for the output files as a parameter
entity_name=$1

# Make your private key specific to your end entity
openssl genpkey -out $entity_name.key -algorithm RSA -pkeyopt rsa_keygen_bits:2048

# Using your newly generated private key make a certificate signing request
openssl req -new -key $entity_name.key -out $entity_name.csr

# Print out information about the created request
openssl req -text -noout -verify -in $entity_name.csr

# Sign the above cert
openssl ca -config root-ca.conf -in $entity_name.csr -out $entity_name.crt -extensions client_ext

# Print out information about the created cert
openssl x509 -in $entity_name.crt -text -noout

Special thanks also to the creator of iam-rolesanywhere-session, a Python package that makes it easy to create refreshable boto3 Session with IAM Roles Anywhere. Seriously, could it be easier?

from iam_rolesanywhere_session import IAMRolesAnywhereSession

roles_anywhere_session = IAMRolesAnywhereSession(
    trust_anchor_arn=my_trust_anchor_arn,
    profile_arn=my_profile_arn,
    role_arn=my_role_arn,
    certificate='my_certificate.crt',
    private_key='my_certificate.key',
)

boto3_session = roles_anywhere_session.get_session()
s3_client = boto3_session.client('s3')
print(s3_client.list_buckets())

This was a good reminder that technology marches ever onward, and what made sense yesterday might not be the best approach today. It was also a reminder that, like DNS, TLS and PKI are some of those things that every technologist ought to know (I’ve queued up this book in my Goodreads for a deeper dive). This isn’t the first time I’ve had to write code to create certificates, but it’s now the last, because I’ll have this reference post plus its associated code repository. And so will you.

Edge Case

Edge Case

I was today years old when I learned that an object key in S3 can end with a slash. Why might someone use such a strange key, you ask? Well, I was working today on a static website served by CloudFront that needs to serve a particular JSON document at /foo/bar/ (note the trailing slash). One option was to create the corresponding object at /foo/bar and then use a CloudFront function to remove the trailing slash. But that adds complexity, cost, and a tiny bit of latency. Could there be a better way?

Indeed there was! Create the object with a prefix of /foo/bar/ and Bob’s your uncle. Admittedly it’s a bit tricky to create an object with such a key. The console won’t do it, and neither will the aws CLI (at least not without getting fiddly with encoding, and no one’s got time for that). But boto3 to the rescue, it’ll happily do it.

Obligatory bit of additional knowledge: know your slashes.

Know Thyself

Know Thyself

It’s inevitable that over time I’m going to repeat myself here (including post titles). When I’m aware of potential similarities, I try to embed links back to those prior posts. A while back I noted an idea of building a thematic map of all my posts, but I wasn’t sure how to go about doing so. Now that I’ve learned some about embeddings, it was time to try my hand at it.

You can find the code I wrote to accomplish all of this on GitHub. I was inspired by the clustering section of the OpenAI cookbook, but took considerable liberties rewriting the code there, as I’m not a huge fan of typical data science code examples (they’re suitable for notebooks, perhaps, but rarely include meaningful names or breakdown into logical functions).

First, I had to actually fetch all the post content. I briefly toyed with the WordPress REST API, but couldn’t figure out how to enable it. No worries, though, RSS to the rescue! Unfortunately it’s XML, and I fiddled a bit with using lxml to parse the it, but stumbled upon feedparser which abstracted the details. Awesome!

Since it’s the de facto standard for Python data science, I loaded the posts into a pandas DataFrame. I’m still working on my fluency with pandas, numpy, scikit, and matlibplot, amongst other common tools, and I’m grateful for any opportunity to get their power under my fingers.

To compute embeddings for each post, I used the OpenAI API with the text-embedding-ada-002 model. It’s not good to store API keys in code; for local scripts I store all mine in the MacOS keychain using keyring. Nice and easy.

Since OpenAI usage costs money, I don’t want to repeatedly call the API with identical inputs if I don’t have to. That’s where cachier comes in (a library I help maintain) so results can be transparently saved to disk for subsequent use.

Once I had the embeddings, I used K-means clustering to group posts into common themes, and then t-SNE to reduce the dimensionality and produce a visualization of the clusters. To produce a summary of the theme of each cluster I took a sample of posts from each and shoved them into GPT4.

To start I tried using 2 clusters, which produced the following distribution:

Pretty interesting that there’s a natural grouping going on. Here’s the themes and sample posts:

Blue Posts

The theme of these posts is the author’s personal and professional experiences with technology, education, open-source contributions, ethical considerations, and the impact of travel and diversity on personal growth and the tech industry.

Orange Posts

The theme of these posts revolves around the reflections, experiences, and insights of a software developer navigating the challenges and nuances of the tech industry.

Of course I had to try with a variety of different numbers of clusters, so I reran with 3, 5, and 8 clusters as well (anyone see a pattern there?)

Of those graphs, to my eye the 5 cluster one seemed the best balance between having enough distinct themes without starting to look too arbitrary. Here’s the summarizations for it:

Blue Posts

The theme of these posts is the author’s personal and professional experiences, challenges, and insights related to technology, software development, and working within the tech industry.

Orange Posts

The theme of these posts revolves around the challenges, insights, and anecdotes from the world of software development and engineering management.

Green Posts

The theme of these posts is the multifaceted nature of software development, encompassing the importance of maintaining code quality, the broad skill set required for effective development, and the challenges and responsibilities that come with the profession.

Red Posts

The theme of these posts is the reflection on and sharing of personal experiences, insights, and best practices related to software development, including contributing to communities, understanding abstractions, effective communication, and professional growth within the tech industry.

Purple Posts

The theme of these posts is the author’s personal reflections on their experiences, interests, and philosophies related to their career, hobbies, and life choices.

What’s next? I’d like a quantitative way to evaluate the quality of the theme clustering and summaries produced. There’s a lot of non-determinism in the functions used here, and with some twiddling I bet I can produce improved results. I’ve got some ideas, but will save them for a future post.