Tag: Learn And Be Curious

Resolution Recap

Resolution Recap

Relaxing on a much-needed holiday has given me time to wrap up a couple books, bringing this year’s reading to a close (I’ve also finally started Alexander Hamilton, but no way I’m finishing it on my return flight; it’s good but long).

Per my meta-resolution, I aimed to read 44 books this year. I’m finishing at 48, though a few only barely qualify. Here’s this year’s 5-star selections:

How did I do in my objective to read more non-male, non-white authors? The goal was 32 books, and I finished with 14 non-male, 15 non-white, and 4 both, for a total of 33. Mission accomplished? Quantitatively yes, but qualitatively, the mission of broadening horizons is never done; this will continue to be a focus area.

What will I aim for next year (besides the obligatory quantity)? For one, I intend to read more history and biographies. Given my job, I also am going to do more reading on politics and government. Should be fun!

Evolution

Evolution

(Editor’s note: the past two posts, Mother Of Invention, Edge Case, and this one form a trilogy of sorts, all related to a particular project I’ve been digging into).

When I first needed a way to get access to AWS from a non-cloud-based computer, I implemented 3 options: hard-coded IAM user credentials (generally bad), user-based Cognito (okay but not super scalable), and X.509 via IoT (good technology, but cumbersome to set up).

This week I had a similar authentication need within an on-premises cluster, and was happy for the chance to learn the most up-to-date approach: IAM Roles Anywhere. I really appreciate the authors of these two blog posts who captured the step-by-step quite a bit better than the official documentation:

I used my own certificate authority because AWS Private CA is too dang expensive; $400 a month doesn’t grow on trees, ya know? Here’s the bash script to create the root CA:

mkdir -p root-ca/certs    # New Certificates issued are stored here
mkdir -p root-ca/db       # Openssl managed database
mkdir -p root-ca/private  # Private key dir for the CA

chmod 700 root-ca/private
touch root-ca/db/index

# Give our root-ca a unique identifier
openssl rand -hex 16 > root-ca/db/serial

# Create the certificate signing request
openssl req -new -config root-ca.conf -out root-ca.csr -keyout root-ca/private/root-ca.key

# Sign our request
openssl ca -selfsign -config root-ca.conf -in root-ca.csr -out root-ca.crt -extensions ca_ext

# Print out information about the created cert
openssl x509 -in root-ca.crt -text -noout

The output from the above is what’s used to create the Trust Anchor. Then here’s a script to create a certificate for the process that will be authenticating:

# Provide a name for the output files as a parameter
entity_name=$1

# Make your private key specific to your end entity
openssl genpkey -out $entity_name.key -algorithm RSA -pkeyopt rsa_keygen_bits:2048

# Using your newly generated private key make a certificate signing request
openssl req -new -key $entity_name.key -out $entity_name.csr

# Print out information about the created request
openssl req -text -noout -verify -in $entity_name.csr

# Sign the above cert
openssl ca -config root-ca.conf -in $entity_name.csr -out $entity_name.crt -extensions client_ext

# Print out information about the created cert
openssl x509 -in $entity_name.crt -text -noout

Special thanks also to the creator of iam-rolesanywhere-session, a Python package that makes it easy to create refreshable boto3 Session with IAM Roles Anywhere. Seriously, could it be easier?

from iam_rolesanywhere_session import IAMRolesAnywhereSession

roles_anywhere_session = IAMRolesAnywhereSession(
    trust_anchor_arn=my_trust_anchor_arn,
    profile_arn=my_profile_arn,
    role_arn=my_role_arn,
    certificate='my_certificate.crt',
    private_key='my_certificate.key',
)

boto3_session = roles_anywhere_session.get_session()
s3_client = boto3_session.client('s3')
print(s3_client.list_buckets())

This was a good reminder that technology marches ever onward, and what made sense yesterday might not be the best approach today. It was also a reminder that, like DNS, TLS and PKI are some of those things that every technologist ought to know (I’ve queued up this book in my Goodreads for a deeper dive). This isn’t the first time I’ve had to write code to create certificates, but it’s now the last, because I’ll have this reference post plus its associated code repository. And so will you.

Edge Case

Edge Case

I was today years old when I learned that an object key in S3 can end with a slash. Why might someone use such a strange key, you ask? Well, I was working today on a static website served by CloudFront that needs to serve a particular JSON document at /foo/bar/ (note the trailing slash). One option was to create the corresponding object at /foo/bar and then use a CloudFront function to remove the trailing slash. But that adds complexity, cost, and a tiny bit of latency. Could there be a better way?

Indeed there was! Create the object with a prefix of /foo/bar/ and Bob’s your uncle. Admittedly it’s a bit tricky to create an object with such a key. The console won’t do it, and neither will the aws CLI (at least not without getting fiddly with encoding, and no one’s got time for that). But boto3 to the rescue, it’ll happily do it.

Obligatory bit of additional knowledge: know your slashes.

Know Thyself

Know Thyself

It’s inevitable that over time I’m going to repeat myself here (including post titles). When I’m aware of potential similarities, I try to embed links back to those prior posts. A while back I noted an idea of building a thematic map of all my posts, but I wasn’t sure how to go about doing so. Now that I’ve learned some about embeddings, it was time to try my hand at it.

You can find the code I wrote to accomplish all of this on GitHub. I was inspired by the clustering section of the OpenAI cookbook, but took considerable liberties rewriting the code there, as I’m not a huge fan of typical data science code examples (they’re suitable for notebooks, perhaps, but rarely include meaningful names or breakdown into logical functions).

First, I had to actually fetch all the post content. I briefly toyed with the WordPress REST API, but couldn’t figure out how to enable it. No worries, though, RSS to the rescue! Unfortunately it’s XML, and I fiddled a bit with using lxml to parse the it, but stumbled upon feedparser which abstracted the details. Awesome!

Since it’s the de facto standard for Python data science, I loaded the posts into a pandas DataFrame. I’m still working on my fluency with pandas, numpy, scikit, and matlibplot, amongst other common tools, and I’m grateful for any opportunity to get their power under my fingers.

To compute embeddings for each post, I used the OpenAI API with the text-embedding-ada-002 model. It’s not good to store API keys in code; for local scripts I store all mine in the MacOS keychain using keyring. Nice and easy.

Since OpenAI usage costs money, I don’t want to repeatedly call the API with identical inputs if I don’t have to. That’s where cachier comes in (a library I help maintain) so results can be transparently saved to disk for subsequent use.

Once I had the embeddings, I used K-means clustering to group posts into common themes, and then t-SNE to reduce the dimensionality and produce a visualization of the clusters. To produce a summary of the theme of each cluster I took a sample of posts from each and shoved them into GPT4.

To start I tried using 2 clusters, which produced the following distribution:

Pretty interesting that there’s a natural grouping going on. Here’s the themes and sample posts:

Blue Posts

The theme of these posts is the author’s personal and professional experiences with technology, education, open-source contributions, ethical considerations, and the impact of travel and diversity on personal growth and the tech industry.

Orange Posts

The theme of these posts revolves around the reflections, experiences, and insights of a software developer navigating the challenges and nuances of the tech industry.

Of course I had to try with a variety of different numbers of clusters, so I reran with 3, 5, and 8 clusters as well (anyone see a pattern there?)

Of those graphs, to my eye the 5 cluster one seemed the best balance between having enough distinct themes without starting to look too arbitrary. Here’s the summarizations for it:

Blue Posts

The theme of these posts is the author’s personal and professional experiences, challenges, and insights related to technology, software development, and working within the tech industry.

Orange Posts

The theme of these posts revolves around the challenges, insights, and anecdotes from the world of software development and engineering management.

Green Posts

The theme of these posts is the multifaceted nature of software development, encompassing the importance of maintaining code quality, the broad skill set required for effective development, and the challenges and responsibilities that come with the profession.

Red Posts

The theme of these posts is the reflection on and sharing of personal experiences, insights, and best practices related to software development, including contributing to communities, understanding abstractions, effective communication, and professional growth within the tech industry.

Purple Posts

The theme of these posts is the author’s personal reflections on their experiences, interests, and philosophies related to their career, hobbies, and life choices.

What’s next? I’d like a quantitative way to evaluate the quality of the theme clustering and summaries produced. There’s a lot of non-determinism in the functions used here, and with some twiddling I bet I can produce improved results. I’ve got some ideas, but will save them for a future post.

School’s In Session

School’s In Session

Tonight I kick off a class from Stanford called Ethics, Technology, and Public Policy for Practitioners. It’s been a hot minute since I’ve been involved with formal education (about 10 years actually), but I’m pretty excited. Not just for the learning, but for the people I’ll meet along the way, who appear to be a fantastically variegated bunch based on what I’ve seen on Slack so far.

Here’s the course description from the syllabus:

Our goal is to explore the ethical and social impacts of technological innovation. We will integrate perspectives from computer science, philosophy, and social science to provide learning experiences that robustly and holistically examine the impact of technology on humans and societies.

Basically it’s Jud catnip. If it sounds interesting to you, I think it’s offered periodically. Here’s a link for future reference.

Fix-It-Up Chappie

Fix-It-Up Chappie

Over the weekend my daughter’s Chromebook stopped turning on. We’ve gotten our money’s worth, having bought it right before the pandemic (fortunate timing, that), but I suspected the issue was only with the battery, having experienced a similar failure mode with other Chromebooks. Fifty bucks, overnight shipping, and fifteen minutes at the kitchen table, and it’s back up and running. Yay!

I usually enjoy trying to repair things. I don’t always succeed (especially if it’s car-related, I leave that to the professionals after a disastrous attempt to patch a radiator leak in an ’87 Honda Civic back in the summer of 2005), but stuff like electronics or minor carpentry things I can usually figure out (not to mention identifying and working around website bugs). There’s something imminently satisfying about learning something new and immediately applying it to bring a tiny bit of order to the entropy.

You may think you don’t know how. And you may be right. But finish the sentence: you don’t know how yet. More than ever before there’s a wealth of knowledge at your fingertips. Engage your curiosity and give it a try. The risk is (usually) minimal and the rewards many.

Half The Battle

Half The Battle

Some things are just good to know, for example:

  • git
  • SQL
  • networking protocols (UDP, TCP, HTTP)
  • the relative speeds of various storage media (L1, L2 caches, RAM, and disk)
  • the airspeed velocity of an unladen European swallow

Add to that list OAuth2, the lingua franca of authentication. Get yourself acquainted with this helpful two-part series:

Truth At The Intersection

Truth At The Intersection

Earlier this year I pledged to read 32 out of my 44 books by authors who are either non-white or non-male, 73% of my total. Juneteenth seems like an excellent day to see how I’m doing, given both its significance to my objective, as well as it being near the middle of the year.

As of today I’ve completed 24 books, ahead of my required pace of 22 by this date. Of those, 4 were written by white women, 2 by non-white women, and 10 by non-white men. That’s 16 in total, or 67%, which means I need to pick up my pace a bit to hit my goal. Of my current 3 books in flight, 2 are by women and 1 was written by a consortium of indigenous folks, so that’ll help things out. And I’ve plenty more qualifying books in my queue.

If you’re curious, you can see what I’m reading any time on my Goodreads page.

Show Me The Data

Show Me The Data

For Christmas back in 2020 my daughter got me a daily crossword puzzle calendar. That started a streak of completing a puzzle every day for two years. I tracked data on how fast I was able to complete each one, curious if I would get measurably better over time.

Today I finally sat down, put all the data into Excel, and crunched some numbers. Here’s a couple interesting results (interesting to me, at least):

Normally I aimed to complete a puzzle in about 15 minutes. For the most part my results clustered around that value, though my overall average across 623 puzzles was 15.9 minutes thanks to the occasional outliers in the 30-ish minute range. Fastest solve time was 6 minutes, which I accomplished 7 times.

This graph of rolling average (with a 30 day window) pretty clearly shows I got better over time as I suspected, going from a 20-ish minute average at start all the way down to 13-ish minute average towards the end. I’m happy with those results!

In total, I spent 165 hours working crosswords in 2021 and 2022, and while it’s not exactly the world’s most productive activity, there’s enough mental benefit that I don’t regret it.

Routinely Solicited Inquiries

Routinely Solicited Inquiries

When I joined Amazon Web Services in the fall of 2019, I wanted to validate the experience I had through AWS Certification. It become something of an obsession, and I ended up passing all 12 exams within a year. I got frequent questions about this experience, enough that I developed a talk that I gave at a couple internal conferences, and once at re:Invent. This post is my best attempt to capture what I said there, plus some additional information that might be of interest to anyone seeking to up their cloud game.

Why?

The first question I typically get is a variant of “Why did you pursue AWS certifications?” And that’s usually followed-up with “And why so many of them? Did you actually absorb any of that information?”

There are plenty of places to get boilerplate answers to these questions, such as Benefits of Being AWS Certified and Four reasons you should pursue AWS certifications. For me it boiled down to wanting the broadest possible exposure to AWS services. Certifications were not the end of my cloud learning journey, they were the beginning of it. I’m a breadth person; my goal was not to get hands-on experience with everything, but rather to get high-level understanding so that I could apply the technology to customer problems. I also wanted to be able to have public verification of the skills I already had.

Is there also a depth benefit to certs? Yes, but the study approach is different than the one I present later in this article. You’ll want to focus more on hands-on workshops and less on purely information intake. Neither approach is better or worse, they simply have different end objectives.

Who?

Another common question is “Who should get AWS certified?” I’m admittedly biased, but given they’re relatively inexpensive and can only benefit one’s career, I tend to say yes to just about everyone who works in or around the cloud:

  • Technology professionals specifically tasked to build in AWS? Obviously.
  • Software engineers who might be called upon to do cloud things, and should know what’s out there so they don’t inadvertently re-create the wheel? Definitely.
  • Tech adjacent folks like product / project / program managers and salespeople who want to be able to hold their own in cloud-related conversations? Can’t hurt.
  • Executives whose companies heavily utilize the cloud? Why not?

How?

Hopefully by now you’re convinced, so the next question is obvious: “How can I decide which certification(s) to pursue?” There’s decent guidance in the AWS Certification Paths brochure (which I like to take some credit for, as I both recommended its development to the Training & Certification team back at re:Invent in 2021, and consulted with them on its content), but I’ll add my two cents here as well.

Cloud Practitioner is the foundational exam, appropriate for anyone who wants to beef up their AWS knowledge no matter their role. Even for folks that already have cloud experience, I recommend they take this first just to learn the logistics of the exams. Lean into your bias for action here and just do it; at the time of this writing it’s only $100 and there’s no consequence for failing (which is true for all the exams).

After that one, I think Solutions Architect Associate is the best next step, as it’s a great introduction to solving problems on AWS. For technical folks especially it’s my first recommendation, and for non-techies who still need to hold their own in a cloud conversation it’s a perfect stretch goal.

From there, it starts to depend on role, but for those who have to wear a lot of hats the aim should be either Solutions Architect Professional or DevOps Engineer Professional (ideally both), and to prep for them there’s value in picking up SysOps Administrator Associate and Developer Associate as study guides. The material builds on itself, but be aware there’s a considerable jump in difficulty between the associate and professional exams.

Because passing an exam earns you a coupon for 50% off your next one, completing all the foundational, associate, and professional exams can be done for as little as $625. That’s not a large investment to pick up highly marketable certifications.

The specialty exams are different in several ways. For one, they contain much more completely new material compared to the way the professional exams build on associate exams (with one exception: if you have all the general certs, you’re 80% of the way to getting the Security Speciality). So if you’re a novice, expect to invest quite a bit more study time. I can’t really recommend any of these over any others unless you either 1) have a specific interest area, or 2) have completionist compunctions like me.

Fun fact: Advanced Networking Specialty is the only certification exam I’ve failed. It took a month of pretty hardcore cramming to get it passed the second time around, and I came away with a new respect for networking engineers. There be dragons.

When?

Once you have a certification identified, the next logical question is “When should I schedule the exam?” My hot take is to take a best guess on prep time, pick a date that works for you, and get registered.

(Quick aside: I strongly recommend creating your certification account using a personal email address, so that your certifications easily follow you if you change jobs. There are ways to fix this later, but it’s a hassle).

Actually forking over the money and scheduling a date and time will force you to get started. I’ve seen too many “good intentions” fail here due to over-cautiousness or indifference. Just do it! For one thing, you can always reschedule later. Second, as I said earlier, there’s no consequence for failure (other than a 2 week cooldown period before you can try again). Treat it as an investment to see where you’re at. If you don’t pass, at least now you know where you’re weak. And if you do pass, then you’ve passed, no more studying required! All considered, it’s the frugal choice.

If you want a rule of thumb for prep time, here’s my suggestion:

  • If you’re reasonably confident, schedule it ASAP. Seriously, right now. I’ll wait.
  • If you just need to do some brushing up, 1-2 weeks of prep should be fine.
  • If you’re starting from zero, budget 4-6 weeks for foundational tests, 6-8 for associate-level, and 10-12 for professional exams.

If you literally have no idea how ready you are, taking a practice exam is an excellent evaluative tool that you can do right now. AWS made these free last year, just go register on Skill Builder and you’ll find them. They’ll provide you much helpful context on areas you’re strong in, and areas where you need to do more studying. Since there’s now no cost to taking them, there’s no excuse not to just go try them out. Don’t think you have to prepare to take a practice exam; that’s backwards. Let the practice exam results guide your preparation instead.

Where?

Once you’ve got your timeline to an exam, it’s time to answer “Where can I find the best possible preparation materials?” Naturally Google and ChatGPT have a variety of suggestions here. Too many. The metric that matters most here is “density of relevant knowledge per unit of study time”, and while I acknowledge people learn differently and everyone’s needs are different, my opinions here are based on my own considerable experience. Take them with a grain of salt, but only one…

When it comes to high value tools, my number one suggestion is to read AWS service FAQ pages. Most services have one, and they are written at just the right level of detail to be applicable to certification exams. They’re the nutrient-rich superfoods of certification study materials; nothing will get you up-to-speed faster. When I was in the midst of my test-taking prep, I would keep a dozen or so FAQ pages open in my phone’s browser, and instead of mindlessly doom-scrolling on social media, if I had a few minutes I’d read (or re-read) one or two of them.

To give you an idea of their effectiveness, when I re-certified my professional exams earlier this year, the only prep I did was to re-read FAQs across a couple dozen services, and I passed them both with equivalent or higher scores than I did three years ago.

Each certification has a guide that lists the AWS services covered on the exam. Take a look, find the FAQs, and dig in:

Another excellent resource are the courses on AWS Skill Builder. Many of these are also free, and $29 a month unlocks a bunch more. Each exam has a corresponding “readiness training” that is well-worth the few hours investment it takes to complete. And check out the ramp-up guides if you need a plan structured around job role, solution, or industry.

If more in-depth course material is needed, I’m a fan of the A Cloud Guru video courses: they’re comprehensive, well-produced, easy to navigate, and a subscription is reasonably priced. A number of them also include hands-on exercises.

While I’m not here to criticize quality, or tell people what ought to work for them, there are certain resources I found less helpful when “value per time unit” is considered.

The first is books on the certification exams. Besides taking focused time to read versus a video which can be put on in the background (something I did often when doing other activities like laundry or dishes), books age quickly. I also found their content of middling quality compared to more official sources. They’re not cheap either when compared to the broad value you get from a subscription to Skill Builder or A Cloud Guru.

AWS whitepapers, while interesting and educational, are also not a terribly rich source of certification-related knowledge. I contributed to a few whitepapers in my time at Amazon, so it’s not that I see them as valueless, nor am I biased against them. There are simply better ways to spend your time.

Finally, I discourage endless grinding on practice exams. I mentioned earlier that these exams are great evaluative tools, but their value drops significantly with repeated attempts. There’s little sense in taking a ton of them in the hopes of encountering (and memorizing) every possible question. The official ones are not comprehensive enough, and the other practice exams you can find online are of questionable quality (if not blatant rip-offs of superior material). Ultimately such an approach won’t give you the synthesis skills necessary to answer questions not previously seen, and that’s not a winning formula.

What?

The final question now becomes “What can I expect the day of the exam?” Back in 2020 I wrote about the online experience, so if you chose that option I’d encourage you to read through it; everything there still applies. The in-person option is a pretty standard thing; you sit in a room with a bunch of other people taking tests at standalone workstations, with a person monitoring the group.

I’ve done both, and mildly prefer the online choice, but I have some privileges that make it easier (e.g. a quiet space I can have to myself, a personal laptop, reliable Internet). There’s no cost difference, so do what’s right for you.

In either case, standard testing best practices apply. Get a good night’s sleep, get enough to eat beforehand (food and drink are prohibited), and plan some downtime both before and after the exam, because your brain will be tired (some of these exams can last up to 3 hours). Speaking of, absolutely don’t forget to go to the bathroom right before the test. Your mileage may vary about whether an in-person proctor will let you step out to use the restroom, but it’s forbidden when taking the test online.

It used to be that you got your results immediately after finishing the exam, but nowadays there’s some anti-fraud analysis that’s done, and you’ll get an email the next day. If you passed, congratulations! On to the next exam. And if not, no harm no foul; take a couple weeks to redouble your study efforts, and register again.

No matter what happens, I hope you take joy in learning about some of the world’s most powerful technology. If you do, thank the good folks on the AWS Training & Certification team for all the work they’ve put into developing many of the resources I’ve shared here. I certainly am grateful.

Buy Me A Coffee
If you found this blog helpful and want to say thanks, I like coffee.