#005 - Cloudy with a Chance of SaaS

Agility vs Uptime

Sep 06, 2022

long exposure photography of road and cars — Photo by Marc-Olivier Jodoin on Unsplash

Beta to Production

From the first line of code you deploy, until the time you deliver your MLLP (Minimum Lovable and Launchable Product), you have some hard choices to make. For those of us that have been around software a long time, a typical path to delivery might look like this: local development on developer laptops, then some formal testing, followed by shared integration for evaluation, and finally production. Each stage is a standalone environment with its own obstacles, costs, and headaches that can slow things down. And, if you ask 10 architects how to deliver an MLLP, you will get 11 answers as to the right way to do it. Balancing the need for speed against uptime is a challenging problem, that is not unlike the old axiom, “You can have it fast, cheap or good. Pick two.” In this cloud world, and with current consumer expectations, the idea that SaaS software has the time to go through all these stage gates is challenged at best; and morale and idea killing at worst. With that said, what is recommended?

Developer to Production

A typical software deployment pipeline, when starting out, will go something like this. Developers will use Git to version control their software. They will have some unit and integration tests created, with a good balance of what is covered. Going for complete coverage for simple CRUD (Create, Read, Update, and Delete) operations is probably a waste of time, but covering the business logic, and areas where the state is manipulated, is a good idea. Zero test coverage is a bad idea, and 100% test coverage is untenable. You will have to figure out what is required to strike a balance. While you are in the pre-launch mode, you can take these code changes, that were developed using a Git branch per capability, and deploy them right to the production instances. This gives you maximum agility but probably leads to some things breaking. Move fast and try things, right!! Of course, before you deploy you want to review what has been changed, what you expect should work, and ensure the test suite is running well. While in this fast-moving mode, you need to ensure that great documentation is happening around your server configurations. You want to implement solid backup and recovery processes, including taking these offsite, and ensuring backups are not destroying the older backups. If you do all of this, you will set a great foundation for what comes next.

As you Launch

There are two things that you need to consider when you start to allow humans outside of your teams, to use your software. The first, have a test environment that replicates production as closely as possible. In other words, use NGINX or Apache, pick just one for your stack, and commit to it. Also, avoid the sprawl of many languages and/or technologies. For example stick with the database you are using in production, just pick one, MySQL, Oracle, MongoDB, etc. Note that they can be much smaller in the test environment. Be sure your URLs, folders, configuration files, and all artifacts have a nice ontology and naming convention. For example, I like test.api.domain.com and api.domain.com to distinguish between my test and production environments for domain names, configuration files, and folders. Unspecific words like default or configuration should be avoided at all costs. Wayfinding, and knowing where you are, while you are making changes, is critical for uptime and keeping your sanity as you make changes. Having a robust test environment as described for trying out infrastructure changes is critical to ensure simple configuration changes do not take down production. What I have described might be called integration. At Lyft, they simulate a significant load on the sub-systems to avoid having to deploy a full copy of production. A four-part series starts at this link, check it out. [1]

Launch Darkly

The second item to consider when you are going live, is how to test features before your customers can look at them. Launch Darkly is the premier SaaS vendor that allows you to simply enable capabilities in your SaaS per groups or specific users. Hiding capabilities from all users, and only giving them to specific users in production is a way to manage the agility and velocity of your team, while also protecting the majority of end users from broken features or features that are not fully vetted. Give your customers new capabilities, on a set schedule, instead of a big-bang release every quarter, which allows your teams to sleep at night and live up to a schedule that you set. If you think about Facebook, LinkedIn, or Amazon, they are always running experiments and delivering incremental small improvements all of the time. Give your users a break by skipping big bang quarterly releases that require a ton of training and reskilling. Instead, staging and delivering small improvements can improve everyone's quality of life incrementally, both for your customers and your teams.

Beta Trials

Once you have all of this working, and as you scale your organization, you can think about having a beta site, where you let your best customers inside the circle and give them special access to in-process feature creation and development. These trial users would use the production database, with the latest and greatest features and capabilities. This kind of development and deployment is not for the faint of heart but does have the potential to create devoted users that love your offer. Don’t give it away for free. Instead make it a source of revenue and point of engagement for your Success team.

Maintaining agility and velocity at all development stages of your SaaS is critical, for customer happiness, but also for developer sanity.

Thank You

Jim ‘The Designatic’ Tyrrell

[1] https://eng.lyft.com/scaling-productivity-on-microservices-at-lyft-part-1-a2f5d9a77813

Cloudy with a Chance of SaaS

#005 - Cloudy with a Chance of SaaS

Agility vs Uptime

Discussion about this post