Every day, Olark sees more than 3 million visits across thousands of different websites and browsers. We constantly have to ask ourselves: are we causing any issues or slowdowns on our customers' websites? Knowing the answers to these questions is critical to our success. A casual misstep means we could be silently breaking thousands of websites, halting transactions, angering the world, causing the next recession...you know, the usual. Previously we wrote briefly about the challenges of providing third-party Javascript as a service. This time around, we'll talk a bit more about these challenges, and what we do at Olark to thrive in spite of it. Living inside other companies' websites means a lot less control...and sometimes it feels a lot like running through a minefield. While we can sidestep some challenges with recent techniques, the fact remains that our code executes in a hostile environment. Here is a short list of real problems that we have seen in the wild: With this long list of variables that we cannot control, unit and functional testing has slowly become less effective at uncovering real-world issues. Ever seen a browser testing tool that offers "Internet Explorer 7 with XHTML doctype plus custom JSON overrides" as one of its environments? Neither have we :P One word: monitoring. Deep, application-level monitoring. At Olark, we have developed a collection of tools to do application monitoring via log analysis. Our monitoring architecture looks looks something like this: In the end, we simply write code like this in Javascript: …this will track "warn" events. Notice that we use #hashtags to name the events we want to show up in our metrics. Our system collects these log messages from Javascript and aggregates them into a central log server. Then, our hashmonitor tool parses the hashtags and counts their occurrences, and finally sends the calculated metrics into tinyfeedback for viewing: …this is the simplest example of how we track issues in the wild. One thing we have found incredibly useful about this approach is that we can always go back to investigate the original log messages when we see a spike in any of our metrics. This allows us to quickly roll back a deployment, while still having enough data to dig into why it might be happening. Sometimes we want to dig deeper into a particular warning, so we need a special event name for it. To accomplish this, our monitoring system allows multiple #hashtags in a single log message. For example, we keep track of cookie issues: …which gives us a way to break out these specific warnings in a more detailed way: In particular, these cookie metrics influenced our decision test cookie-setting before booting Olark, preventing strange behaviors when cookies could not be read on subsequent pages. Our monitoring system also allows value-based metrics. For example, we track the time when configuration assets are downloaded: …and our hashmonitor will automatically parse this as a value-based metric and calculate values for its distribution: We have used this data to make important performance decisions, like adding these configuration assets to our CDN. We were able to boost overall speed, and also tighten up the 90th-percentile load time by having geolocated CDN delivery. Definitely. To measure "soft" metrics like user experience, we look at the number of conversations that begin on Olark every minute. This tells us that visitors are engaging with Olark and hopefully generating more sales opportunities for our customers. Recently, we made some improvements that whittled down median load time. As a result, we improved conversation volume by nearly 10% Having this deep monitoring has really helped us to effectively measure when our code changes positively impact the real world (and real people!). We wouldn't have it any other way.Why is providing 3rd-party Javascript so challenging?
window.escape)toJSON)
So how can you survive (and thrive) across thousands of websites?

log("something weird happened #warn")
How do we break down errors and warnings?
log("cookie problems #nocookies_for_session #warn")
How do we monitor performance?
log("received account configuration #perf_assets=200")

Does this really matter that much for user experience?

Check out relevant topics on: Engineering