When you are first learning how to program a computer, your goal is to just make everything work correctly. There are mental hurdles you must climb to write your first program: What programming language should you use? Where do you type everything in? Where do all these semicolons go? Why does it say syntax error when everything looks correct? By the time everything actually compiles and runs, it feels like you’ve accomplished a great feat, and the task is complete. In reality, the first time your code builds and runs is just the beginning of what it means to build well-designed software.
Your first ever programs may get the job done, but their design will probably be a mess. When you build things in the real world, your mind grasps how the different parts of a design fit together much easier than when you’re studying the design of a software project. If I showed you a building made out of Legos, you would have no problem seeing all its individual parts, and understanding how each piece fits together with its neighbors. Building in the real world is both concrete and sensory. We can utilize all of our senses to understand the inner-workings of a design that is before us.
Software is not concrete, and it is not sensory. You can’t touch your code, and when you see your code, you only see symbols that represent some abstract design. The way the pieces of a program fit together must be kept in your head. When reading a book, you first read the words on the page and build a mental image from the words. When you’re reading a program, you must also read the words on the screen, but also reconstruct the mental image of the pieces of the software. You must see all the actors, map out the dependencies, recognize the contact points, and decigher all the interactions of your code. If you’re building a bridge, it’s easy to see if a support-beam is coming into contact with too many other pieces of the bridge. If you’re building software, seeing these connection points is an exercise in your ability to hold the structure clearly in your mind.
The more you program, the more you understand that the simplest way to hold a software design in your head is by making it simple. A program that is simple is easy to understand and easy to work with. Good programmers can look at a piece of code and recognize when it’s complex. Great programmers can look at a piece of code, recognize when it’s complex and then proceed to make it simple.
An airplane pilot I knew once told me that the most dangerous pilots are not the novices, and not the veterans, but the pilots just in between. They are the pilots who no longer have the fearful respect of a novice, nor the wisdom of an expert, but instead have the cockiness of an intermediate. They’ve got enough flying time under their belt that they don’t know how much they don’t know, and are dangerous because they believe themselves to much better than they really are.
A dangerous developer thinks they’re writing good code, perhaps because their variables are well named and their methods are short, but in reality their designs might be a tangled web of gunk. Seeing only superficial qualities like method length is how you miss the hidden complexities of your program. On the surface your code might be easy to read, but impossible to comprehend.
How can we defeat complexity in our programs? Many developers will reach for a design pattern, a codified best practice, or some other precscribed cookie-cutter recipe. Yes, applying a design pattern might help you, but rather than ask yourself which design pattern you should be using, ask yourself instead how easily can you visualize the design of the code in your head. As a starting point, some basic questions I like to ask are:
- How many moving pieces (objects, functions, actors) are in this code path?
- How much do the moving pieces know about each other?
- How much state is there to manipulate?
- How many branches are there into and out of my code?
- How many dependencies, both internal and external are there?
These questions are merely ways to assess code complexity - they are by no means exhaustive, nor are they always the right questions to ask. Complexity falls out of a system that is not understood, and is wound too tightly on itself. A program that is easily understood is often indicative of a system that is simple. If you cannot understand the design of your software, you have no control over it.
If I could ask one thing of you it would be this: practice seeing complexity. Not just in software, but all around you. Unlike many other things in this world, software changes easily. You have the power to make your software simple: See the complexity, know the complexity, and then destroy it.
Do What You Want - Those are the most frightening four words brought to us by the connection revolution. If you want to sing, sing. If you want to lead, lead. If you want to touch, connect, describe, disrupt, give, support, build, question… do it. You will not be picked. But if you want to pick yourself, go for it. The cost is that you own the results. – Seth Godin, The Icarus Deception
It’s hard to break free from the the mental entrenchment of yesterday’s industrial economy: If you work harder, and do what the boss says, your career as a software developer will go exactly where you want it to go. This is not how you win in software. You don’t win by following directions, you don’t win by waiting for someone to tell you what to do, and you certainly don’t win by producing code the same way new cars roll off the factory line.
You win in software when you recognize that what you’re building are tools that create leverage and value that are bigger than yourself. Software scales, and bits are cheap to spread to everyone. We live in an era where a five person team has the potential to help millions of other people simply with the software they write and share with the world.
The scarcity isn’t access to the right tools - we have cloud computing, abundance of open source solutions, and more new programming languages than we know what to do with. The scarcity is the developer who knows how to take what they’re building and shape it in a way that reaches more people, touches more lives, and makes a bigger impact.
Software is changing the world faster than our minds can adapt to the change. Tomorrow’s problems demand solutions that you’ve never written before, and maybe never even thought about before.
Will you make an impact?
When a developer needs to add more capaticy to a computer system, he usually considers two ways to do so: horizontal scaling or vertical scaling. Which strategy is selected depends on the problem being solved, and the limited resources in the system. In this post, we’ll go over both of these scaling strategies, and discuss the pros and cons of each. If you’re building a software system that needs to grow, you either select a scaling strategy explicitly, or a strategy is selected implicitly. Be intentional about knowing how your system is going to grow.
In a vertical scaling model, the process of adding more capacity means taking existing actors in a system and increasing their individual power. For example, let’s say you’re in charge of overseeing a lumber harvesting operation.
In this example, let’s assume you have 3 trucks that can carry 25 felled trees per load, and it takes 1 hour to move each load down the road to where it needs to be further processed. Given these numbers, we see that the maximum capacity of our system is:
3 trucks * 25 trees * 1 hour/load = 75 trees processed per hour
Assuming we’ve chosen a vertical scaling capacity model, how would we respond if we wanted to be able to process 150 felled trees per hour? We’d need to do one of two things: either double the carrying capacity of each truck (50 trees per hour), or halve the time it takes for each truck to process each load (30 minutes).
3 trucks * 50 trees * 1 hour/load = 150 trees processed per hour
3 trucks * 25 trees * 30 minutes/load = 150 trees processed per hour
We haven’t increased the number of actors in the system, but we have increased the productivity of each actor to achieve the desired jump in capacity.
In a horizontal scaling model, instead of increasing the capacity of each individual actor in the system, we simply add more actors to the system. In our lumber harvesting example, this means adding more trucks to move the lumber. So when we need to increase our capacity from 75 trees per hour to 150 trees per hour, we simply add 3 more trucks:
6 trucks * 25 trees * 1 hour/load = 150 trees processed per hour
The productivity of each actor in the system remains the same, but we’ve added more trucks to the system.
Scaling Your Web Database
With a basic understanding of horizontal and vertical scaling, let’s look at scaling a web system. There are numerous components in a website that need their scalability properties considered, I’d like to focus on one that usually ends up being the most critical: the database. Why is the database the most critical? Because your user’s data is usually what people care about the most. Because data is often a shared resource, it becomes the main contact point for nearly every web request.
What Kind of System Is Yours?
The most important question you have to ask when considering the scalability of your database is, “What kind of system am I working with?” Are you working with a read-heavy or a write-heavy system? Examples of a read-heavy website might include: An online shopping site, where most people spend the majority of their time browsing (reads) and only a small amount of their time purchasing (writes), or a blog, where the majority of the time people are consuming posts (reads), and only a small amount of the time are commenting or the author is posting (writes). On the flip side, good examples of a write-heavy system include: A credit card transaction processor, where the main workload is journaling transactions (writes), and occasionally looking up transactions (reads), or Google Analytics, where the majority of the workload is journaling traffic data (writes) and occasionally showing graphs of the analytics (reads).
Knowing what kind of system you’re building will help you select the right technologies when your website has to grow.
If your website is primarily a read-heavy system, vertical scaling your datastore with a relational database such as MySQL or PostgreSQL can be a good choice. Couple your RDBMS with a robust caching strategy that uses memcached or a CDN and you’ll have a system that can scale pretty cheaply. In this model, when the database runs out of capacity, putting more pieces of data in the cache helps offset the burden of reads. When there’s no more items left to cache, upgrading your database hardware with faster disks or more processors will usually buy you the necessary runway. Moore’s law makes vertical scaling with this method as simple as buying better hardware.
If your website is primarily a write-heavy system, you’re probably going to want to think about using a horizontally scalable datastore such as Riak, Cassandra or HBase. Unlike most RDBMSes, these datastores usually grow by adding more nodes. Because your system is going to be mostly writing, caching layers will not help you much like in a read-heavy system. Many write-heavy systems start out using a vertical scaling strategy, but soon run out of runway. Why? Because hard-drives and processor counts plateau at a certain point, and the marginal cost of adding one more core or a harddrive that does a few more I/O ops per second grows exponentially. If you instead choose a horizontally scalable strategy for your write-heavy system, you reach an inflection point where the marginal cost of adding one more node to the system becomes far cheaper than the cost of a harddrive that might eek out a few more disk seeks.
Another thing to keep in mind is the often unforseen costs of each scaling strategy. In a vertical scaling setup, extra costs are placed on the isolated individual components of the system. As we add more capacity to the system, the individual components become more costly to manage. From our lumber harvesting example, if we make our trucks to carry twice the number of trees per load, our trucks beds are going to have to get either longer, wider, or taller. Perhaps there’s a height restriction for the roadway based on bridge height, or a width restriction based on lane width, or a length restriction based on safe driver maneuverability. There’s a limit to how much vertical scaling you can do with the individual components of the truck. The same concepts apply to vertical scaling servers: more processors require more case room which requires more individual server rackspace.
In contrast, a system that scales horizontally places extra costs on the connected shared components of the system. As we add more capacity to the system, the shared costs associated with coordinating the actors increases. In our lumber harvesting example, as we add more trucks to the road, the road is a shared resource that becomes constrained. Can that many trucks even fit on the road at the same time? Do we have enough safe loading zones that all the trucks can be receiving lumber simultaneously? If we look at our horizontally scalable database system, the often overlooked cost on the system becomes the network that connects the servers together. As you add more nodes to the system, this shared resource often becomes increasingly taxed, usually in a non-linear fashion.
Fitting It Together
Like most things in computers, good solutions are not usually so simple as what I’ve outlined here. I’ve attempted to simplify the ideas in order to speak to the concepts, rather than any specific tactics. Scaling is a hard problem, that needs pragmatic thought at every step of the process. There is no magic scaling tactic, or magic software that will help you build an entirely reliably scalable system. Like many other problems of scale, the larger solution is usually made of hundreds of tiny solutions all working together in unison. Getting each of them right takes careful design and at every step of development.