Joining Bitly Engineering

First post! (aka Introduction)

Hello everyone, my name is Peter Herndon. I recently started working at Bitly as an application engineer on Bitly’s backend systems (which are legion). My recent experience is with a series of smaller start-ups, preceded by a long stint in a much larger and more conservative enterprise setting. I bring to the table expertise in Python, systems administration (both cloudy and bare metal), databases and systems architecture.

I’ve been interested in Bitly for quite a while, and wanted to work here for much of that time. Since its beginning, Bitly has had a reputation for technical excellence. The engineers here have demonstrated that excellence both by solving engineering challenges and by the ingenuity of how they approach those solutions. Bitly’s former chief scientist, Hilary Mason, single-handedly popularized the concepts of Big Data and Data Science, and legitimized them as engineering disciplines. Her talks and blog posts created my own awareness of and interest in the field. So when I had the opportunity to work here, I gladly leapt into it head first.

What I Found

A Company in the Process of Renewing Itself

Bitly is a unique place to work, even among tech businesses. The company employs about 60 people, about 25 of them technical, and has been in existence for 4-5 years now. That said, Bitly is in many ways a very new company. Recently the company underwent a shift in management, resulting in a new focus on business. The new CEO, Mark Josephson, brings a laser-sharp clarity to helping Bitly’s customers become successful by providing insight into how their brands are performing. This clarity of purpose is in addition to continuing the company’s technical leadership. We began the new year here with a renewed sense of purpose that is reflected in the number of new hires and the number of open positions.

I’ve experienced the process of watching an ailing small business shed employees and management, in a downward spiral of despair, including my own exit from that company. This is the first time I’ve experienced the rebirth of a company, the upward swell of pride and energy that comes from active leadership and direction. I’m very happy to see that Bitly has retained a great deal of its technical team, thus providing good institutional memory and continuity. That retention speaks well of the new leadership and the amount of pride in what the folks here have previously built. And what they’ve built is tremendous.

A Remarkable Technical Architecture

Bitly’s business is insight: providing customers with information that helps them make better decisions regarding their business by analyzing shortlink creation (referred to as encodes internally) and link click data (internally, decodes). To that end, our infrastructure must handle accumulating and manipulating around 6 billion decodes per month. That’s a lot of incoming HTTP requests. Not Google scale, but not pocket change by a long shot. To handle that volume, we use a stream-based architecture, rather than batch processing. That is, instead of accumulating incoming data in a data store and periodically processing it to reveal insights, we have a very deep, very long chain of processing steps. Each step, each link in the chain (and chain is an oversimplification since the structure is more of a directed graph, mostly acyclic) is an asynchronous processor that accepts incoming event data and performs a single logical transformation on the data. That transformation may be as simple as writing the datum to a file, or it may involve comparing it to other aggregated data for building recommendations, or for detecting spam and abuse. Frequently, the processed datum is then emitted back into the queue system for consumption further down the chain. The processed data are then made available via a service-oriented API, which is used to power the dashboards and reports we present to our customers. If any given step in the chain requires more processing power to handle the load of incoming events, we can spin up additional servers to run that particular step.

The advantage of stream-based processing over a traditional batch processing system is that the stream processing system is a great deal more resilient to spikes in incoming data. Since each processing step is asynchronous and has a built-in capacity limit, messages remain in the queue for that step until the processor is ready to handle them. The result is that every step in the chain has its own, independent capacity for handling data, and while backlogs occur (and we do monitor for them), a backlog in a given step is by no means a breaking problem as a whole. It may signify a failure in a particular subsystem, but the rest of the Bitly world will usually remain unaffected. Of course, when the problem is corrected, the result will usually be a backlog in the next steps of the chain, but that is usually fine and expected. Each step of the chain will chew through its allotted tasks and move on.

This stream processing system is powered by NSQ (documentation), about which much has been written and said, both on this very blog (here, here, and here) and elsewhere. I won’t add more, as I’m far from an expert (yet!), but I will say that I am impressed with how useful NSQ is for building large distributed systems that are remarkably resilient.

A Fanatical Attention to Code Quality

Another aspect of Bitly that has made a great impression on me is the devotion to code quality embodied in the code review process. Bitly experienced enormous growth at a time before modern configuration management tools became popular, and as a result wound up building their own system for managing server configuration. There is a certain amount of cruft in the system (how could there not be?), but Bitly’s engineers have paid a great deal of attention over time to making the deployment system as streamlined as possible. After all, maintaining the fleets of servers necessary to keep Bitly running is no small task. And that attention to operational maintainability spills over to the code that runs on those servers. Bitly has a code review process where equal emphasis is placed on functional correctness and test coverage, and on operational ease and maintainability. I’ve never had my code pored over with such a fine-toothed comb as I’ve had here, and going through the review process made me a better programmer overnight. In previous positions, I’ve quickly produced code that works; here at Bitly, I produce code that works, is aesthetically and semantically appropriate (i.e., consistent naming, following a reasonable style guide), and fits conceptually within the greater whole that is our code base. The review process can be frustrating at times, as I attempt to figure out the most efficient way to get my changes merged, but overall is a huge benefit, contributing greatly to the quality of the Bitly product.

A colleague asked me to comment on whether rigorous code review is better or worse than pair programming at improving code quality, since pair programming is something he has not done. My experience with pair programming is limited, but in that experience, pair programming does not provide a huge benefit to code quality. Instead, it is much more useful for design quality, hashing out architectural issues, and for transferring knowledge. The kind of issues I’ve caught in pair programming, or been caught in creating, are typically typos or minor logic bugs (brainos). These are the kind of bugs that pop immediately on trying to run your code for the first time, or running tests. (Tests are a given, right? Everybody writes tests nowadays.) So while there might be a tiny bit of added productivity from pair programming on the code quality front, that benefit is offset by consuming double the amount of programmer hours. The trade-off is that rigorous code review improves code quality a great deal, but does tend to lose sight of architecture and design issues. It encourages deep focus on the code itself, without considering the design. I think code review is necessary (or at least more beneficial) for code quality, while pair programming is not. Pair programming can be swapped for design meetings, thus reducing the total time spent by multiple developers on a single task.

A New (to Bitly) Approach to Teams

A major change we’ve instituted recently is to create what are being called “feature teams”. These feature teams are composed of a cross-functional slice of Bitly, including back-end developers, front-end developers, product and project management, and most importantly, business stakeholders from our Customer Success team. Each feature team is tasked with making improvements to our products, starting with different sections of the Bitly Brand Tools. I think this is the number one change towards better directing Bitly’s amazing technical talent to creating something useful for our customers, rather than just yet another neat technical tool. With our Customer Success team getting feedback on our proposed improvements directly from our customers, we are now in a perfect position to make Bitly the best source of insight it can be. And that is our ultimate goal, to provide our customers with better insight into the world around them.

In my previous experience, I’ve never seen “improvements” ever actually improve anything without feedback from customers. Near-misses, yes, but not actual hits. The inspiration should often come from within, as we are in the best position to improve existing features for all our customers, rather than just taking the opinion of one. But without business-side involvement, and without customer feedback, I’ve never seen a tech-driven improvement result in success for the actual end-user, unless the intended end-user is in fact technical. That is why a large percentage of start-ups focus on tools for other engineers, it’s easier to get started.

# 20 February 2014