Erik Johnson

67 min read Original article ↗

Off the shelf business software

I’m continually fascinated by this idea of “off the shelf”. The off the shelf and “no customizations” approach is seen as a policy to avoid complexity and laborious upgrades of software. Or what businesses might call “future proofing”.

In the commerce industry, we look at solutions like Shopify as a true off the shelf type system. With not just a automatic setup of the system, but of the hosting, deployment, and front end themes.

But I continually challenge myself to think about what off the shelf truly is. As I develop my vision and leadership skills, I’m continually putting myself in my clients shoes on their policies, desires and the long term impact of key decisions.

In this post, I’m going to break down some of the myths of off the shelf software. This isn’t to scare companies, or say off the shelf isn’t the answer. It’s to provide valuable context and ideas to evaluate. With my main goal to help you avoid red herrings.

1. The system is key

The system is a huge part of your decision in purchasing software. The more important decision is your internal employees. By creating a stable, educated and empowered workforce in your company – your software will naturally follow this same pattern. No matter the system you choose. Let’s take some examples:

- WhatsApp is a custom built messaging app in Erlang. Er…what!? As the most successful messaging application ever, they’ve shown that technology is important, but reducing complexity is even more so.

- Facebook was originally written in PHP. In fact, a well written PHP application is still a cornerstone of the web. With new frameworks like Laravel, “legacy” technology can still be used to create great systems.

- Shopify uses server side rendering. This system, which doesn’t follow the fanciest MVVM Angular style application, delivers cloud ecommerce for a majority of the small business world as one of the most popular cloud system so on earth.

What did these companies all have in common? Great leadership, great employees, and pragmatism. The takeaway is that a good enough system can be made great through proper application, engineering and a focus on simplicity.

2. System X is a silver bullet

No system is a silver bullet for your organization. The complexity in applying a great piece of software is integrating it into your company. One of the smartest investments to make in a software system is to evaluate the maturity of the markets best practices. What are other companies doing? How are they running their business using System X.

What are System X’s weak points. Since not every part of a system can be a market leader, what parts do work well, and what parts would be not see much value in?

3. Customizing will take me off the upgrade path

In fact, the largest expense you have is your company not being agile enough to compete. Without a doubt long term maintenance is a concern, and severe customizations are an issue in software stacks. But we should focus our efforts on key customizations that can create business value – while understanding the long term impacts.

4. Integration is a small portion of a softwares cost

More important then the software itself is how to integrate this into the Architecture landscape of the technology that currently supports business. I’ve seen most projects cost go towards integration. The need for a well defined integration structure is what will drive a softwares cost.

The challenge is the integration is not just technical, but requires a business element to integration. This is traditionally seen as change management, but integration needs to be part of a larger conversation with the vendor – where conversations need to focus on business and technology integration.

Software is a difficult business, and off the shelf isn’t always the answer. We need to focus our effort on buying great software, but people and process are more important to a successful roll out of a software system.

Services

The challenge for companies in the coming decade is not customer experience, but moving into services. Selling core business elements as services to monetize the technology investments they’ve made. The current IT systems we work with everyday are complex, and the battle between technologists, business and cowboys will not reduce our current migraines.

There are three things companies need to prepare for as they transition in the services revolution:

1. Selling your core business as a service
2. Challenging your technology department to embrace automation, as complexity is reaching the peak of traditional it management paradigms
3. Building technology as a core offering of your business

The first challenge will be in setting up core business as services. Technology leaders will have to be visionary alongside their C suite partners to ensure the business understands the possibilities & complexities in selling services. We need to help the business understand the art of the possible when it comes to services. Selling service time on our machines will be a

We will have to challenge our organizations to embrace automation to manage the complexity of services. IT is not possible to manage services in our traditional fashion as we start to create applications & service made up of multiple services & API’s on our internal business.

As we open up our core business, we will have to challenge our teams to think about offering our technology as a service. Google is doing a fantastic job of selling their internal systems, from Google Cloud Platform to Big Query, to Predictions API. Each of these services is a separate service at a different level of granularity from Infrastructure, to software to just an API. These are great examples of enabling internal technology as services for our customers to take advantage of.

Immutability Part 2 “The Actor Model”

The actor model is a design pattern in software to enable high concurrency and parallelism. The model was initially proposed in academic papers, and better formalized in Erlang the original high performance language.

I’ve written about Erlang before, and how I admire the model it portrays for concurrency. Let’s break this down a bit, and see what this means for software in the future.

State of The Union

Software is growing in size and complexity. Recently we have seen a trend beyond just Service Oriented Architecture (SOA), into “Micro-Services”. These Micro Services are a new model of architecture within our programs, to not be so large an complex, but a small collection of smaller services that interact. This has been enabled by SOA, and the recent adoption of Docker as a model to run services on infrastructure. With all these moving parts, we have forgotten how to make this model work, and how we might use it. We hear CEO’s, VP’s and Corporations touting Micro Services based approaches, but in the traditional model, we won’t benefit form the concurrency and we will suffer the same bottlenecks – in comes the Actor Model.

Actor Model

The actor model is a quite simple model on concurrency and parallelism that states our systems concurrency and interactions should be made up of “actors”, instead of just objects. These actors maintain a state, and pass messages between each others. Why is this important? By guaranteeing that we maintain state only within a single actor, and not between actors we can schedule multiple actors concurrently without needing to worry about traditional race conditions, or large memory foot prints of mutability.

The Actor in Micro Services & Concurrency

So what gives? Why do we care about the actor model? The answer is that we are faced with the same problems we got in the original SOA vision. That no matter if we make a contract in our micro-services, we cannot introduce high concurrency unless we can properly execute our services in a non-blocking fashion. To do this, we can use the actor model, which allows us to logically architect our micro services for high performance and scalability. It is unfortunate, but the benefits for a micro services based architecture for infrastructure and deployment will hinder us in the performance end, unless we consider the actor model properly.

Immutability is the future

We are seeing a shift in the software development industry. Scaling of systems is no longer a nice to have features, but a requirement as our businesses move to be more connected. The internet, and the amount of data we deal with poses serious problems for our technical systems. With this volume of data, a connectedness, our business users are desiring a “Real-time” access to this data. How do we deal with this? How can we provide our users access to the data they need, while also providing consistent, performant systems that can deal with our customers and systems. The answer is immutability. Immutability provides the key to multi-locations, multi-threaded and consistent scalable access in our systems.

What is immutability

The basics of immutability is simple: Nothing changes. In software, the concept means if we hold a reference to something, perhaps a user record, the user record will not change on us - his is immutability, no changes. While this seems counterintuitive, it makes much sense when we think of large distributed systems that must scale.

Why should you care

Immutability is the core concept of distributed systems, with big data at it’s heart. We perform MapReduce and other largely parallel computations on immutable data structures, these are what form the basis of our big data systems. They would not be possible without immutability. This has a large effect on the storage, and cost of storage of your data. One of the first concepts of this in computing was the data warehouse. In the data warehouse we had an immutable copy of the data from our production systems to run queries, and generate reports.

The next step beyond the data warehouses, is moving to a real time warehouse that influences our currently executing systems. These systems perform interactions in an immutable way. Requiring coordination, but focusing on performance and responses over data consistency, since that can be achieved by specific rules on how to approach changes.

Immutability in action

Distributed Version Control systems are a great example at a macro level of how immutability has taken hold on the software development process. Multiple developers take out a copy of the master branch in a Git version controlled project and make changes on their local machines. There is no bottle-neck in the process due to writes, since the development team can be completely paralleled. We see that the system is never over-written, as its state is saved, and a log of changes is merely written to the system by the users in the form of commits, with no mutation of the underlying system. This is extremely powerful not only because of the history it gives us in software development, but because it allows us to perform software development at scale.

What is next

As we start to look into our systems, we need to consider immutability as a new concept in computing, beyond just changing states, but the effects it will have from a design perspective.

- Increased Storage Costs: We will need to store these states

- Concurrency: Our systems should be designed to be thread safe, and immutability will guarantee that we can provide this to our users at scale, with massive parallelism and avoid inconsistencies.

- Scaling: With the adoption of Scala, Erlang, Clojure and more functional based languages, we can start to take advantage of immutability as a core concept of the languages we write software in and we can scale to server more users, faster, with less of a memory footprint.

Service Busses and Scalability

In this latest post I’ll talk a bit about service busses. While the technology has been around for well over a decade, we will talk about the motivations behind service busses, and what the current landscape is in the industry.

First let’s delve into what a service bus actually is? Wikipedia has a generalized definition that talks about SOA. Personally I think that Microsoft has the best definition in one of their posts:  as the “magic bullet” to the ongoing challenge of connecting systems". This makes good sense and we see that the original implementation of the service bus concept with XML defined services had this concept in mind. The service bus really is the concept of integration – an important concept in abstraction, customization and selection of off the shelf products for businesses.

So why choose a service bus? What are the benefits? Well there are two main concepts that the pundits pit as the largest benefits of a service bus. 1) Scalability, and 2) Reliability. So why is this? Why do service busses provide increased scalability or reliability?

Scalability

Most issues in software system happen under load. Of course everything works well when just a single developer is trying out your new web form. But what if 1000+ are all submitting forms that then kick off processes within your system. This is where a service bus shines by providing a mechanism to ensure that requests are not dropped, connections are not refused, and any boundary cases in your database don’t result in lost data.

Reliability

It is impossible in the service bus world to talk scalability without talking reliability, the two really go hand in hand. What do I mean? Well imagine the scenario where your system is under heavy load. Now think about what lost data could occur? What about traditional HTTP requests? The move to stateless applications and a true REST approach ensures that those requests are gone. This has an adverse effect on the reliability of your system as a whole. Data can easily be lost, or hidden as a boundary case gets executed un-denounced to the user. The ability to cache these calls, and ensure that they are executed in a proper manner is one of the main benefits of a service bus.

Tradeoffs

Well it can’t be all good can it? No, there are some distinct tradeoffs. Latency is obviously increased. If your application has hard latency requirements then service busses might not be for you ie: you can’t spare 10ms – most business processes can. Debugging and the asynchronous nature of service busses can also pose problems. We need more robust debugging and investigation tools to ensure that we can trace errors since we don’t see a traditional trace-through line by line of the application.

What else?

The beauty of the service bus is that Scalability and Reliability aren’t the only things you get. Think about the benefits of this design, and separation of concerns. The disconnection, and generalization of API’s with the service bus provide the benefit that your systems scales not just in physical calls, but as your organization grows. Interchanging components, adding workflows and swapping vendors are all added benefits to the Service Bus paradigm – back in line with the SOA definition from Wikipedia.

How does it Work?

Most system operate on a publish subscribe model of services. Looping back to the Wikipedia definition and the SOA approach, this is really based on the concept of services. A service publishes its ability and a subscriber will subscribe to this service and use it. Once the message is pushed to the Bus, the subscriber no longer needs to wait, or think about delivery. This de-coupling of message passing and application code allows a fast, decoupled approach with less interference in an asynchronous manner. Service Busses generally provide application performance increases perceived to users due to the asynchronous nature of the requests – removing the blocking scenario.

This is really a base introduction, there are several ESB’s in the industry that hail from Microsoft BizTalk, JBoss, Oracle Fusion, IBM WebSphere and the new kid on the block Mule Open Source ESB. Let’s hope Erlang and RabbitMQ soon provide a full ESB.

Linux Box Hardening for Digital Ocean

Digital Ocean is a great startup that provides cheap VPS solutions if you wish to host sites, applications, or staging areas – which I would recommend. As far as hosting a live application it isn’t my first choice but as a development box I use it quite frequently. I didn’t usually harden my linux VPS’s to maximum level, but I found their were many attempts from hackers to gain control for the purposes of DDOS attacks and botnets. I’d like to provide some tips on the best way to harden and recover from botnets attempt at gaining control of your machines.

1. Disable SSH login w/Secret Key

This is the first and perhaps most important step. Your machine should not allow any anonymous FTP/SFTP or SSH login attempts. It is a no brainer you don’t want people brute forcing your login via SSH – it gives them easy access to the root account. With this, you should also disable SSH login via root. I know some sys admins don’t like sudo, but it is necessary you should never have to root into the box.

2. Diable & Remove FTP/telnet and rsh

Your box shouldn’t allow any logins remotely from any of these services. The fewer attack vectors and ports available the better. Remove ftp by doing the following: “sudo apt-get uninstall vsftp”

3. Path your Kernel & OpenSSL

We all have heard of the OpenSSL Heartbleed if you haven’t where have you been? You should periodically update your Kernel (on Ubuntu apt-get update & upgrade), as well as the underlying technologies such as Apache, Nginx that you depend on for serving content.

4. Close & Block unwanted ports

You can block and close off unwanted ports using iptables on linux, this allows you to ensure that ports not bound by services are default blocked. It supports both outgoing and incoming ports.

5. Check auto start services

Linux has a utility to check what is auto started on boot. This allows you to see if any processes have configured themselves to autostart.

Your server only has a single endpoint – unless you have redundant connections. So you should be able to capture and analyze any traffic for the presence of a botnet on the computer.

Scalability Part 1: Programming Languages

High Concurrency applications are a hard problem. We’ve seen recently issues with modern interpretive languages such as Ruby, and Python suffer when scaling on the web – especially in their popular frameworks ie: RoR, Django etc. What is interesting is although we still choose these frameworks, or others (ASP.net MVC, Windows Forms) we don’t understand the limitations, or strengths. 

I’ve been working with Scala, and some other functional languages including Erlang to really examine the underlying scalability issues. What has become apparent is that the issues are more complex then strictly framework, or language issues. I’d like to examine the main issues as I see them, and hopefully provide some useful advice on where the drawbacks and benefits lie in choosing specific languages or approaches.

The Languages

It is perhaps easiest to start on the server, and the core – in the language. Each of the aforementioned languages were interpreted so what are the popular languages of the day? Here is a breakdown from the TIOBE 2014 Community Index results over the past decade:

image

So what does this mean, it’s pretty telling that the most adopted is C, with Java and Objective-C close behind. While none of us can argue with the performance of C, it is less then prevalent in Web based systems where high costs of consultants and programmers generally wins over performance – so what are our options based on the list above? What are the performance of these languages that remain? I’m going to go ahead and suggest that we examine the following:

  1. Compiled VM Languages: Erlang (Register based), C#, Java & Scala (Stack based),
  2. Interpreted Style Languages: Python (using CPython), Ruby, PHP (by Web Server module)

I chose to examine the following 2 classes since we don’t really see too many others in web programming. In the future perhaps more will need to be included, but as of now these seem relevant. I’m specifically leaving Node.js out for a couple reasons: First, it is exploding and probably warrants its own discussion of the non-blocking I/O single threaded nature – so I’ll keep moving forward.

Let’s start digging into the language performance issues:

Compiled VM Languages

While compiled can be used loosely, this set of languages in compiled into bytecode. So what is the performance of the language like? Well it’s pretty darn fast. Compiled VM languages are fast because the bytecode executed by the VM is optimized over the language code, and it runs smoothly in the VM – which is able to execute it across multiple platforms. With the advent of JIT, and the runtime optimizations of VM’s it is possible to meet or exceed the speed at which C/C++ compiled code executes, so speed in this regards is quite high in VM based languages. Obviously C/C++ has optimizations at the hardware level that cannot be competed by VM based languages, but for the web, we assume this is of little consequence ie: distribution, and scalability is more key for requests.

Garbage collection is perhaps the largest drawback on performance. The VM collectors have advanced, but with the functional state stacks passing completely by copy we end up with large amount of copied objects that require collection once complete and ready for purging. Erlang is the largest culprit of this being functional – as compared to Java/Scala or others which can pass by reference and maintain a more global “state” of an object.

Erlang is perhaps the wildcard sitting as a strictly functional language that is a register based VM. Is it probably the fastest in the classes due to it’s lack of state, and the scalability of threads in its process. Since every process in Erlang has an only a single underlying OS thread, you can easily create thousands of processes, where Java would fail miserably. In this comparison Java/C# suffer due to the design of the language. In this scenario Erlang wins hands down.

Interpreted Style Languages

While these languages also have a VM, or in PHP’s case a module onto the Web Server, they act more explicitly like an interpreted language then our first class which is more actively compiled into bytecode.

This class varies with a large breadth, as in the above class JIT techniques have provided significant increases in speed. Facebook has been doing some very amazing work with PHP and the Hip Hop VM, which as of 2011 uses JIT techniques as well to speed up execution.

These interpreted languages benefit in many ways from the same JIT optimizations as the bytecode compiled languages, and have seen massive improvements in the past years. They suffer from similar constraints as far as garbage collection

We can see from the classes above that both these bytecode and interpreted are quite similar and for scalability in the web world (in my opinion) the language is not yet the defining feature.

So what about threads and processes? In the previous section I spoke about the thread design in Erlang vs C#/Java. I won’t break on a good thing so I’ll continue by looking at the thread model in Python to see just how it stacks up in a scalability and performance scenario. Python is perhaps the worst if the group with a Global Interpreter Lock, allowing only a single thread of execution at any one scenario due to the design of Pythons interpreter. How does this differ in Ruby? Unfortunately Green threads are much the same, with advocates moving to multiple processes instead of thread since the model doesn’t scale – it just executes in a single OS thread. This is much the same as Erlang in that it executes in a single process, advocating that the program itself create processes – which are technically threads in other languages. Erlang behaves more like an OS in the sense that it creates OS Threads for each core in the single OS process, then manages “processes” inside the Erlang VM to execute across the threads, with robust message passing allowing high concurrency.

I’ve embedded a graphic to show you a comparison of performance in Erlang vs Python Stackless, and a multitask component of Python general implementation:

image

Ref: http://pseudogreen.org/blog/erlang_vs_stackless_vs_multitask.html

Concluding

I hope this was informative, giving a brief overview of the scalability concerns of the languages and the types involved in web programming. This is part of a larger series which looks at scalability. I hate to say it, but Erlang is probably by far one of the most efficient being Functional, and it’s thread/process model for use of cores. Designing message passing, and compilation allows it to avoid the drawbacks of JIT interpreted languages – which cannot take true advantage of OS threads on the cores by processes – they implement threads in a single process with no message passing allowing just a single true thread to execute in the process at a time. The model proposed by Interpreted languages is of little benefit for true multi-core scaling.

Next up we will look at Web Frameworks in an attempt o track down the middleware, design decisions and scalability concerns as it relates to large responsive systems.

Virtual Machines as the future

Today while having a morning coffee at the corner Coffee Shop I was reading Hacker News as the story hit of the new Amazon Workspaces. The concept is not ground breaking but it all seemed exciting for everyone online, and seemed like quite a novel and good idea. Commenters were quick to point out the trade offs, benefits and test out the system. The main question I kept struggling with is why now, why so slow to this? Why is this groundbreaking at all?

The trend in the recent goliath’s of Amazon, FB, Google seems to finally be go after Enterprise – and make inroads Citrix is probably the largest player in this space at the moment, so a battle is being lined up where the traditional slow/enterprise idea will be challenged. Now on to more pressing concerns.

After all this news broke I got back to work and started in on my Clojure project I’ve been working on, doing a dual version in Scala and Clojure to test them both out. I’ve been thinking alot about competition, efficiency and what the future holds for the markets of Operating Systems ie: Chrome OS, Android, MAC, Windows, Linux etc. Today was the epiphany I’d been waiting to have. The future is Virtual Machines. Let me say that again, the future is VIRTUAL MACHINES. 

You may think I’m referring to the same machines that exist in Amazons new Workspaces, but I’m not. I’m referring to something fundamentally different. That the application, compiler and OS are going to change significantly. The change will be quite similar to what we have just witnessed moving from SERVER -> client to server -> CLIENT and now back to SERVER -> client. In the OS domain we are moving away from an OS -> virtual machine -> application to a os -> VIRTUAL MACHINE -> application type architecture. Things have already been moving in this direction with .NET, V8, JVM, Interpreters and more. We will see thought that this will fundamentally change things – but it will be slow.

We have been witnessing the revolution slowly with VMWare, VirtualBox, Parallels, but we haven’t yet noticed. One day we will. Imagine when an operating system becomes more of a Kernel running virtual machines?!? It is happening and mark the date by 2016 well see a virtual machine based eco system starting to emerge in place of the Web ecosystem we were promised.

SPDY & HTTP 2.0

I figured it would be fun to buckle down and do some concrete testing and evaluation of SPDY using WebPageTest.org. I’ve summarized some of the results here, since I don’t have another bucket to put them in quite yet. This information is by no means comprehensive, but was fun to evaluate and report on.

SPDY

SPDY is an application level protocol sitting atop TCP [1]. By focusing on multiplexing’s, compression, and prioritization of data SPDY has seen increases in Mobile, as well as normal web traffic [2]. One of the key features of SPDY is that it is transparent to the network itself, and requires no changes on the underlying Internet. The only changes required are in the client user agent header (showing it supports SPDY), and at the webserver in allowing the new protocol stack to be used.

One of the main goals is to reduce the amount of TCP based connections, and re-use the data channels that are already setup. In 2010 the average number of concurrent connections for a single webpage was 29 [2].

Key Features

  • Single Request per connection: HTTP can only make a single request per connection in a REST-full way. TCP sockets have an inherit delay of re-using the connection when we only download a single resource per connection. Browsers have been opening multiple connections to counteract this speed issue. By reducing the number of connections, yet allowing multiple streams per connection, results have yielded near 40% savings over HTTP 1.0 [3].
  • HTTP 1.0 only supports a client-initiated request, and does not allow the server to “push” data to the client.
  • SDPY compresses all headers, since most websites are using multiple heard values, the size of each request can increase to nearly 800k+.
  • Data compression: SPDY enforces data compression for all resources.
  • Header Redundancy: Headers in HTTP 1.0 are mostly redundant and static. Why resend a user agent string on a website load?

Control Frames

There are many types of control frames, below we have outlined the most command, and important frames for use in the SPDY protocol: 

  • SYN_STREAM: The Synchronize stream control frame allows the creation of a stream between 2 endpoints asynchronously.
  • SYN_ACK: Acknowledgement frame that confirms the creation of a stream.
  • SYN_RST: Control frame used to rest the stream due to abnormal errors.
  • SETTINGS: An asynchronous frame that can be sent at any time to send a client or server data about the stream or connection.
  • PING: Control frame to measure the RTT between the two endpoints.
  • CREDENTIAL: specific control frame for the creation and verification of certificates on the server.

Data Frames

Data frames are the simplest of frame, and have the smallest overhead in size. They are variable length and have the following key fields:

  • Control Bit: 0, for Data frame,
  • Flags: 8 bits,
  • Length: 24bits of value representing the length of the data frame,
  • Data: Variable length data.

It is important to remember, that while the data frames are indeed variable length, with 24bits this is the max frame size allowed at 16k. 

Streams

Streams are lightweight control mechanisms for sending data bi-directionally across a TCP based connection. A key feature of SPDY is that multiple streams may be inter-woven. This allows an increase in performance, but just increases the complexity of the server & client in comparison to HTTP 1.0 protocol.

The key value in streams is that they require very little overhead in comparison to an HTTP 1.0 implementation. In HTTP 1.0 a “keep-Alive” header must be issued to maintain the connection, and generally connection overhead is as follows when not using keep alive:

  • SYN, SYN+ACK, ACK, HTTP GET, HTTP RESPONSE, FIN, FIN ACK, ACK or approx. 7 control frames for a single connection, with no concurrent connections, in comparison SPDY allows the following,
  • STREAM SYN, STREAM REPLY, now data can be sent and the connection is wither open, or closed, but no initial TCP overhead is required.

Server Push

SPDY benefits from the stream mechanism, and a single TCP connection by allowing the sever to also push data back to the client as the connection remains open. This enables a better performance from server to client thereby further enabling technology such as web sockets to be more efficient.

Compression

SPDY proposes header compression with claimed results in a decrease of header size across the network of ~85%. This represents a substantial decrease in the control headers needed for connections and a unique advantage over HTTP 1.0.

Testing Summary

Below outlines some key tests I ran on 2 sites I loaded onto a VPS running Ubuntu, and a static and rich media site I pulled using “wget”.

These tests are run with 3 scenarios, HTTP, HTTPS & SPDY protocol or HTTP 2.0 style.

image

Is NoSQL for me?

The NoSQL movement has really taken over the web over the past couple years. Unstructured data is reigning kings, or so we thought.

I’ve been pretty lucky to ignore the noise coming from the NoSQL camp, and really not have a good understanding of the practise, benefits and implementation of a NoSQL database system. While at WebPerfDays in Amsterdam this past year I saw a very cool presentation by the CouchDB guys. I didn’t fully grasp the details – as anyone can imagine – but I understood the high level benefits.

It wasn’t until I started a new project this fall that things became clear. NoSQL is a dangerous tool. It is dangerous because it locks you into a scheme, more so than SQL. SQL forces you to structure your data, so it is easy to to move to more unstructured data. Once you commit to NoSQL and structure-less, it is very hard to go back.

My project started to get constrained by NoSQL, so I dropped it in favour of good ol Postgres. I have to say, I’m happy to have learnt, and understood the limitations. I’m eager to look at the NoSQL performance on certain unstructured queries ie: regex’s, in the future – for nowI’m keeping my data structured.

The Slow Web Movement

idonethis:

idonethis-blog:

If you wish to make an apple pie from scratch, you must first invent the universe. - Carl Sagan, Cosmos (1980).

It started as a vague idea framed as a joke.

When we put our site out into the world on January 2, 2011, we only processed incoming emails once per day. At the bottom of every…

The 2D printing industry and black magic

Perhaps I am just not aware, but why is there so much innovation in 3D printers, mobile phones and more, yet we somehow have missed the good old 2D printing industry entirely? How on earth do we not have a way figured out to have an easily connected printer on networks, or USB, or ad hoc wifi without blasted drivers?

Now I’m no printer expert, but it seems silly why network printing is not just over HTTP. Honestly, we have computers with standardized protocols that can send messages to others. Why on earth do we still have these archaic printer protocols? Why don’t we build a network printer that doesn’t suck. All it does is spin up a web-server on a port and IP, you can see the log output, it is easily debugged, and we get rid of this black magic shit. There is no reason to have massive service businesses where nobody can understand why the heck the printing failed, or the connection is moot on the network. Why is IPP not the default? This is all confusing to me.

Last night learnt myself some Erlang

Functional languages always seem to be on most developers radar, but few seem to take the plunge– this is perhaps until things become main stream. Personally I find functional languages somewhat cryptic and mysterious. While can probably pompously claim to understand JavaScript closures as good or better than most, I lack a bit in the functional realm when it comes to Lisp, Clojure, Erlang and more.

Last night I decided to explore the “real time” mysteriousness that is Erlang. Starting obviously with the basics surrounding syntax and structure. I was fortunate enough to stumble upon the following resources, of which you may find extremely useful:

The Joy of Erlang

ChicagoBoss

Verdict

If you follow the Joy of Erlang with Evan, you will find the syntax to be a bit weird, and the flow to seem a bit strange. What I did find was that for the first time in my life I’m writing less if statements. I feel as though the code I’ve written over the past day somehow looks less prone to error.

Compilation can be an annoying thing for developers coming from the scripting world. It is quite refreshing to have compile time errors. I find it actually forces me to do some better work.

My favourite quote from Evan’s blog is this:

“Admittedly, learning to ride a bird-brained pterodactyl can be tough business, but once you master it, you’ll wonder how you ever got along before.” - Evan Miller

Wealth

If you would like to get a person excited, there is perhaps no better way than discuss wealth. The topic evokes emotional reactions with varying degrees based on opinions, class/status socially and peoples personal experience with wealth. How then do we get at the root of the issues regarding wealth? I would like to break the conversation down as follows, and draw my own conclusions.

Philosophy

Wealth is something that is created. We should all be able to agree on that. Perhaps someone else created yours – inherited, or stolen – or you made it yourself. Perhaps your wealth is created from natural resources in the ground, or from the added value you apply to an already finished product. This is what I would coin wealth.

With this agreement on wealth, can we not agree that people will differ greatly in their creations of wealth? Unless we are all equal – which would render this conversation futile – there must be a large difference in our wealth creation. What effect does this have on our lives? Or perhaps on the lives of all humans?

The notion of wealth therefore breeds inequality. This is a fundamental concept, and perhaps why it evokes such an emotional response.

Inequality

This seems to be the root of the issue, not just that people create more than others, but that wealth is one of the base things that defines us, and our lives: Food, water, shelter and porches. This view of inequality of wealth seems unjust at first glance. Why do YOU, or anyone else deserve more than me. The problem is, it has nothing to do with deserving, it is all about wealth creation. People mis-understand that in a market there is some referee – there is not. Wealth creation is a freedom idea, rooted in Capitalism, and Liberty. Humans are free to own their wealth, and the differences that is produces in relation to others. 

Justice and Equality

If you believe in equality, perhaps you don’t believe in freedom. If we all have the same wealth accumulation output, regardless of our input, are we then not creating an equal system. Fundamentally, at the core, should we believe that all people should be equal, therefore marginalizing the great, and uplifting the crappy? Does equality === fair? I would propose not, that this is unjust. Equality is unjust. People, wealth and society is not equal, I do not think we should attempt to make it so.

A line

At first glance these views seem quite intense and extreme. Is there a line that exists? What about a grey area? I would argue that in a technologically advancing world we want in-equality. That perhaps, there is not line for poverty. We need people to create large amounts of wealth. This wealth as a whole will not trickle down, so much as it will allow even poor people to drive crappy cars. The same way poor people nowadays can afford iPods and Calculators devices which at one point were merely a dream for their inventors.

Environment

The wild card in this scenario. I’m not sure yet where this fits in. But I know there needs to be a lot more thought. Is the environment in conflict with wealth? That would make sense, as it seems to be an icky thing to consider right now, the repercussions of profitable enterprises and “cheap”. I’d like to somehow think about why this is, or how do we solve it? Is it a cultural thing? What about a measuring stick in wealth? We don’t measure wealth in clean water, perhaps we should. In ocean, land and earth health.

Conclusion

Wealth allows our society to move up from poverty as a whole. We need inequality and wealth creation. The absolute poverty in the world will reduce as we create more wealth. Perhaps relatively your poverty will increase in comparison to great wealth producers, but why should you care? Should you not only care about your absolute poverty?

Off the shelf business software

I’m continually fascinated by this idea of “off the shelf”. The off the shelf and “no customizations” approach is seen as a policy to avoid complexity and laborious upgrades of software. Or what businesses might call “future proofing”.

In the commerce industry, we look at solutions like Shopify as a true off the shelf type system. With not just a automatic setup of the system, but of the hosting, deployment, and front end themes.

But I continually challenge myself to think about what off the shelf truly is. As I develop my vision and leadership skills, I’m continually putting myself in my clients shoes on their policies, desires and the long term impact of key decisions.

In this post, I’m going to break down some of the myths of off the shelf software. This isn’t to scare companies, or say off the shelf isn’t the answer. It’s to provide valuable context and ideas to evaluate. With my main goal to help you avoid red herrings.

1. The system is key

The system is a huge part of your decision in purchasing software. The more important decision is your internal employees. By creating a stable, educated and empowered workforce in your company – your software will naturally follow this same pattern. No matter the system you choose. Let’s take some examples:

- WhatsApp is a custom built messaging app in Erlang. Er…what!? As the most successful messaging application ever, they’ve shown that technology is important, but reducing complexity is even more so.

- Facebook was originally written in PHP. In fact, a well written PHP application is still a cornerstone of the web. With new frameworks like Laravel, “legacy” technology can still be used to create great systems.

- Shopify uses server side rendering. This system, which doesn’t follow the fanciest MVVM Angular style application, delivers cloud ecommerce for a majority of the small business world as one of the most popular cloud system so on earth.

What did these companies all have in common? Great leadership, great employees, and pragmatism. The takeaway is that a good enough system can be made great through proper application, engineering and a focus on simplicity.

2. System X is a silver bullet

No system is a silver bullet for your organization. The complexity in applying a great piece of software is integrating it into your company. One of the smartest investments to make in a software system is to evaluate the maturity of the markets best practices. What are other companies doing? How are they running their business using System X.

What are System X’s weak points. Since not every part of a system can be a market leader, what parts do work well, and what parts would be not see much value in?

3. Customizing will take me off the upgrade path

In fact, the largest expense you have is your company not being agile enough to compete. Without a doubt long term maintenance is a concern, and severe customizations are an issue in software stacks. But we should focus our efforts on key customizations that can create business value – while understanding the long term impacts.

4. Integration is a small portion of a softwares cost

More important then the software itself is how to integrate this into the Architecture landscape of the technology that currently supports business. I’ve seen most projects cost go towards integration. The need for a well defined integration structure is what will drive a softwares cost.

The challenge is the integration is not just technical, but requires a business element to integration. This is traditionally seen as change management, but integration needs to be part of a larger conversation with the vendor – where conversations need to focus on business and technology integration.

Software is a difficult business, and off the shelf isn’t always the answer. We need to focus our effort on buying great software, but people and process are more important to a successful roll out of a software system.

Services

The challenge for companies in the coming decade is not customer experience, but moving into services. Selling core business elements as services to monetize the technology investments they’ve made. The current IT systems we work with everyday are complex, and the battle between technologists, business and cowboys will not reduce our current migraines.

There are three things companies need to prepare for as they transition in the services revolution:

1. Selling your core business as a service
2. Challenging your technology department to embrace automation, as complexity is reaching the peak of traditional it management paradigms
3. Building technology as a core offering of your business

The first challenge will be in setting up core business as services. Technology leaders will have to be visionary alongside their C suite partners to ensure the business understands the possibilities & complexities in selling services. We need to help the business understand the art of the possible when it comes to services. Selling service time on our machines will be a

We will have to challenge our organizations to embrace automation to manage the complexity of services. IT is not possible to manage services in our traditional fashion as we start to create applications & service made up of multiple services & API’s on our internal business.

As we open up our core business, we will have to challenge our teams to think about offering our technology as a service. Google is doing a fantastic job of selling their internal systems, from Google Cloud Platform to Big Query, to Predictions API. Each of these services is a separate service at a different level of granularity from Infrastructure, to software to just an API. These are great examples of enabling internal technology as services for our customers to take advantage of.

Immutability Part 2 “The Actor Model”

The actor model is a design pattern in software to enable high concurrency and parallelism. The model was initially proposed in academic papers, and better formalized in Erlang the original high performance language.

I’ve written about Erlang before, and how I admire the model it portrays for concurrency. Let’s break this down a bit, and see what this means for software in the future.

State of The Union

Software is growing in size and complexity. Recently we have seen a trend beyond just Service Oriented Architecture (SOA), into “Micro-Services”. These Micro Services are a new model of architecture within our programs, to not be so large an complex, but a small collection of smaller services that interact. This has been enabled by SOA, and the recent adoption of Docker as a model to run services on infrastructure. With all these moving parts, we have forgotten how to make this model work, and how we might use it. We hear CEO’s, VP’s and Corporations touting Micro Services based approaches, but in the traditional model, we won’t benefit form the concurrency and we will suffer the same bottlenecks – in comes the Actor Model.

Actor Model

The actor model is a quite simple model on concurrency and parallelism that states our systems concurrency and interactions should be made up of “actors”, instead of just objects. These actors maintain a state, and pass messages between each others. Why is this important? By guaranteeing that we maintain state only within a single actor, and not between actors we can schedule multiple actors concurrently without needing to worry about traditional race conditions, or large memory foot prints of mutability.

The Actor in Micro Services & Concurrency

So what gives? Why do we care about the actor model? The answer is that we are faced with the same problems we got in the original SOA vision. That no matter if we make a contract in our micro-services, we cannot introduce high concurrency unless we can properly execute our services in a non-blocking fashion. To do this, we can use the actor model, which allows us to logically architect our micro services for high performance and scalability. It is unfortunate, but the benefits for a micro services based architecture for infrastructure and deployment will hinder us in the performance end, unless we consider the actor model properly.

Immutability is the future

We are seeing a shift in the software development industry. Scaling of systems is no longer a nice to have features, but a requirement as our businesses move to be more connected. The internet, and the amount of data we deal with poses serious problems for our technical systems. With this volume of data, a connectedness, our business users are desiring a “Real-time” access to this data. How do we deal with this? How can we provide our users access to the data they need, while also providing consistent, performant systems that can deal with our customers and systems. The answer is immutability. Immutability provides the key to multi-locations, multi-threaded and consistent scalable access in our systems.

What is immutability

The basics of immutability is simple: Nothing changes. In software, the concept means if we hold a reference to something, perhaps a user record, the user record will not change on us - his is immutability, no changes. While this seems counterintuitive, it makes much sense when we think of large distributed systems that must scale.

Why should you care

Immutability is the core concept of distributed systems, with big data at it’s heart. We perform MapReduce and other largely parallel computations on immutable data structures, these are what form the basis of our big data systems. They would not be possible without immutability. This has a large effect on the storage, and cost of storage of your data. One of the first concepts of this in computing was the data warehouse. In the data warehouse we had an immutable copy of the data from our production systems to run queries, and generate reports.

The next step beyond the data warehouses, is moving to a real time warehouse that influences our currently executing systems. These systems perform interactions in an immutable way. Requiring coordination, but focusing on performance and responses over data consistency, since that can be achieved by specific rules on how to approach changes.

Immutability in action

Distributed Version Control systems are a great example at a macro level of how immutability has taken hold on the software development process. Multiple developers take out a copy of the master branch in a Git version controlled project and make changes on their local machines. There is no bottle-neck in the process due to writes, since the development team can be completely paralleled. We see that the system is never over-written, as its state is saved, and a log of changes is merely written to the system by the users in the form of commits, with no mutation of the underlying system. This is extremely powerful not only because of the history it gives us in software development, but because it allows us to perform software development at scale.

What is next

As we start to look into our systems, we need to consider immutability as a new concept in computing, beyond just changing states, but the effects it will have from a design perspective.

- Increased Storage Costs: We will need to store these states

- Concurrency: Our systems should be designed to be thread safe, and immutability will guarantee that we can provide this to our users at scale, with massive parallelism and avoid inconsistencies.

- Scaling: With the adoption of Scala, Erlang, Clojure and more functional based languages, we can start to take advantage of immutability as a core concept of the languages we write software in and we can scale to server more users, faster, with less of a memory footprint.

Service Busses and Scalability

In this latest post I’ll talk a bit about service busses. While the technology has been around for well over a decade, we will talk about the motivations behind service busses, and what the current landscape is in the industry.

First let’s delve into what a service bus actually is? Wikipedia has a generalized definition that talks about SOA. Personally I think that Microsoft has the best definition in one of their posts:  as the “magic bullet” to the ongoing challenge of connecting systems". This makes good sense and we see that the original implementation of the service bus concept with XML defined services had this concept in mind. The service bus really is the concept of integration – an important concept in abstraction, customization and selection of off the shelf products for businesses.

So why choose a service bus? What are the benefits? Well there are two main concepts that the pundits pit as the largest benefits of a service bus. 1) Scalability, and 2) Reliability. So why is this? Why do service busses provide increased scalability or reliability?

Scalability

Most issues in software system happen under load. Of course everything works well when just a single developer is trying out your new web form. But what if 1000+ are all submitting forms that then kick off processes within your system. This is where a service bus shines by providing a mechanism to ensure that requests are not dropped, connections are not refused, and any boundary cases in your database don’t result in lost data.

Reliability

It is impossible in the service bus world to talk scalability without talking reliability, the two really go hand in hand. What do I mean? Well imagine the scenario where your system is under heavy load. Now think about what lost data could occur? What about traditional HTTP requests? The move to stateless applications and a true REST approach ensures that those requests are gone. This has an adverse effect on the reliability of your system as a whole. Data can easily be lost, or hidden as a boundary case gets executed un-denounced to the user. The ability to cache these calls, and ensure that they are executed in a proper manner is one of the main benefits of a service bus.

Tradeoffs

Well it can’t be all good can it? No, there are some distinct tradeoffs. Latency is obviously increased. If your application has hard latency requirements then service busses might not be for you ie: you can’t spare 10ms – most business processes can. Debugging and the asynchronous nature of service busses can also pose problems. We need more robust debugging and investigation tools to ensure that we can trace errors since we don’t see a traditional trace-through line by line of the application.

What else?

The beauty of the service bus is that Scalability and Reliability aren’t the only things you get. Think about the benefits of this design, and separation of concerns. The disconnection, and generalization of API’s with the service bus provide the benefit that your systems scales not just in physical calls, but as your organization grows. Interchanging components, adding workflows and swapping vendors are all added benefits to the Service Bus paradigm – back in line with the SOA definition from Wikipedia.

How does it Work?

Most system operate on a publish subscribe model of services. Looping back to the Wikipedia definition and the SOA approach, this is really based on the concept of services. A service publishes its ability and a subscriber will subscribe to this service and use it. Once the message is pushed to the Bus, the subscriber no longer needs to wait, or think about delivery. This de-coupling of message passing and application code allows a fast, decoupled approach with less interference in an asynchronous manner. Service Busses generally provide application performance increases perceived to users due to the asynchronous nature of the requests – removing the blocking scenario.

This is really a base introduction, there are several ESB’s in the industry that hail from Microsoft BizTalk, JBoss, Oracle Fusion, IBM WebSphere and the new kid on the block Mule Open Source ESB. Let’s hope Erlang and RabbitMQ soon provide a full ESB.

Linux Box Hardening for Digital Ocean

Digital Ocean is a great startup that provides cheap VPS solutions if you wish to host sites, applications, or staging areas – which I would recommend. As far as hosting a live application it isn’t my first choice but as a development box I use it quite frequently. I didn’t usually harden my linux VPS’s to maximum level, but I found their were many attempts from hackers to gain control for the purposes of DDOS attacks and botnets. I’d like to provide some tips on the best way to harden and recover from botnets attempt at gaining control of your machines.

1. Disable SSH login w/Secret Key

This is the first and perhaps most important step. Your machine should not allow any anonymous FTP/SFTP or SSH login attempts. It is a no brainer you don’t want people brute forcing your login via SSH – it gives them easy access to the root account. With this, you should also disable SSH login via root. I know some sys admins don’t like sudo, but it is necessary you should never have to root into the box.

2. Diable & Remove FTP/telnet and rsh

Your box shouldn’t allow any logins remotely from any of these services. The fewer attack vectors and ports available the better. Remove ftp by doing the following: “sudo apt-get uninstall vsftp”

3. Path your Kernel & OpenSSL

We all have heard of the OpenSSL Heartbleed if you haven’t where have you been? You should periodically update your Kernel (on Ubuntu apt-get update & upgrade), as well as the underlying technologies such as Apache, Nginx that you depend on for serving content.

4. Close & Block unwanted ports

You can block and close off unwanted ports using iptables on linux, this allows you to ensure that ports not bound by services are default blocked. It supports both outgoing and incoming ports.

5. Check auto start services

Linux has a utility to check what is auto started on boot. This allows you to see if any processes have configured themselves to autostart.

Your server only has a single endpoint – unless you have redundant connections. So you should be able to capture and analyze any traffic for the presence of a botnet on the computer.

Scalability Part 1: Programming Languages

High Concurrency applications are a hard problem. We’ve seen recently issues with modern interpretive languages such as Ruby, and Python suffer when scaling on the web – especially in their popular frameworks ie: RoR, Django etc. What is interesting is although we still choose these frameworks, or others (ASP.net MVC, Windows Forms) we don’t understand the limitations, or strengths. 

I’ve been working with Scala, and some other functional languages including Erlang to really examine the underlying scalability issues. What has become apparent is that the issues are more complex then strictly framework, or language issues. I’d like to examine the main issues as I see them, and hopefully provide some useful advice on where the drawbacks and benefits lie in choosing specific languages or approaches.

The Languages

It is perhaps easiest to start on the server, and the core – in the language. Each of the aforementioned languages were interpreted so what are the popular languages of the day? Here is a breakdown from the TIOBE 2014 Community Index results over the past decade:

image

So what does this mean, it’s pretty telling that the most adopted is C, with Java and Objective-C close behind. While none of us can argue with the performance of C, it is less then prevalent in Web based systems where high costs of consultants and programmers generally wins over performance – so what are our options based on the list above? What are the performance of these languages that remain? I’m going to go ahead and suggest that we examine the following:

  1. Compiled VM Languages: Erlang (Register based), C#, Java & Scala (Stack based),
  2. Interpreted Style Languages: Python (using CPython), Ruby, PHP (by Web Server module)

I chose to examine the following 2 classes since we don’t really see too many others in web programming. In the future perhaps more will need to be included, but as of now these seem relevant. I’m specifically leaving Node.js out for a couple reasons: First, it is exploding and probably warrants its own discussion of the non-blocking I/O single threaded nature – so I’ll keep moving forward.

Let’s start digging into the language performance issues:

Compiled VM Languages

While compiled can be used loosely, this set of languages in compiled into bytecode. So what is the performance of the language like? Well it’s pretty darn fast. Compiled VM languages are fast because the bytecode executed by the VM is optimized over the language code, and it runs smoothly in the VM – which is able to execute it across multiple platforms. With the advent of JIT, and the runtime optimizations of VM’s it is possible to meet or exceed the speed at which C/C++ compiled code executes, so speed in this regards is quite high in VM based languages. Obviously C/C++ has optimizations at the hardware level that cannot be competed by VM based languages, but for the web, we assume this is of little consequence ie: distribution, and scalability is more key for requests.

Garbage collection is perhaps the largest drawback on performance. The VM collectors have advanced, but with the functional state stacks passing completely by copy we end up with large amount of copied objects that require collection once complete and ready for purging. Erlang is the largest culprit of this being functional – as compared to Java/Scala or others which can pass by reference and maintain a more global “state” of an object.

Erlang is perhaps the wildcard sitting as a strictly functional language that is a register based VM. Is it probably the fastest in the classes due to it’s lack of state, and the scalability of threads in its process. Since every process in Erlang has an only a single underlying OS thread, you can easily create thousands of processes, where Java would fail miserably. In this comparison Java/C# suffer due to the design of the language. In this scenario Erlang wins hands down.

Interpreted Style Languages

While these languages also have a VM, or in PHP’s case a module onto the Web Server, they act more explicitly like an interpreted language then our first class which is more actively compiled into bytecode.

This class varies with a large breadth, as in the above class JIT techniques have provided significant increases in speed. Facebook has been doing some very amazing work with PHP and the Hip Hop VM, which as of 2011 uses JIT techniques as well to speed up execution.

These interpreted languages benefit in many ways from the same JIT optimizations as the bytecode compiled languages, and have seen massive improvements in the past years. They suffer from similar constraints as far as garbage collection

We can see from the classes above that both these bytecode and interpreted are quite similar and for scalability in the web world (in my opinion) the language is not yet the defining feature.

So what about threads and processes? In the previous section I spoke about the thread design in Erlang vs C#/Java. I won’t break on a good thing so I’ll continue by looking at the thread model in Python to see just how it stacks up in a scalability and performance scenario. Python is perhaps the worst if the group with a Global Interpreter Lock, allowing only a single thread of execution at any one scenario due to the design of Pythons interpreter. How does this differ in Ruby? Unfortunately Green threads are much the same, with advocates moving to multiple processes instead of thread since the model doesn’t scale – it just executes in a single OS thread. This is much the same as Erlang in that it executes in a single process, advocating that the program itself create processes – which are technically threads in other languages. Erlang behaves more like an OS in the sense that it creates OS Threads for each core in the single OS process, then manages “processes” inside the Erlang VM to execute across the threads, with robust message passing allowing high concurrency.

I’ve embedded a graphic to show you a comparison of performance in Erlang vs Python Stackless, and a multitask component of Python general implementation:

image

Ref: http://pseudogreen.org/blog/erlang_vs_stackless_vs_multitask.html

Concluding

I hope this was informative, giving a brief overview of the scalability concerns of the languages and the types involved in web programming. This is part of a larger series which looks at scalability. I hate to say it, but Erlang is probably by far one of the most efficient being Functional, and it’s thread/process model for use of cores. Designing message passing, and compilation allows it to avoid the drawbacks of JIT interpreted languages – which cannot take true advantage of OS threads on the cores by processes – they implement threads in a single process with no message passing allowing just a single true thread to execute in the process at a time. The model proposed by Interpreted languages is of little benefit for true multi-core scaling.

Next up we will look at Web Frameworks in an attempt o track down the middleware, design decisions and scalability concerns as it relates to large responsive systems.

Virtual Machines as the future

Today while having a morning coffee at the corner Coffee Shop I was reading Hacker News as the story hit of the new Amazon Workspaces. The concept is not ground breaking but it all seemed exciting for everyone online, and seemed like quite a novel and good idea. Commenters were quick to point out the trade offs, benefits and test out the system. The main question I kept struggling with is why now, why so slow to this? Why is this groundbreaking at all?

The trend in the recent goliath’s of Amazon, FB, Google seems to finally be go after Enterprise – and make inroads Citrix is probably the largest player in this space at the moment, so a battle is being lined up where the traditional slow/enterprise idea will be challenged. Now on to more pressing concerns.

After all this news broke I got back to work and started in on my Clojure project I’ve been working on, doing a dual version in Scala and Clojure to test them both out. I’ve been thinking alot about competition, efficiency and what the future holds for the markets of Operating Systems ie: Chrome OS, Android, MAC, Windows, Linux etc. Today was the epiphany I’d been waiting to have. The future is Virtual Machines. Let me say that again, the future is VIRTUAL MACHINES. 

You may think I’m referring to the same machines that exist in Amazons new Workspaces, but I’m not. I’m referring to something fundamentally different. That the application, compiler and OS are going to change significantly. The change will be quite similar to what we have just witnessed moving from SERVER -> client to server -> CLIENT and now back to SERVER -> client. In the OS domain we are moving away from an OS -> virtual machine -> application to a os -> VIRTUAL MACHINE -> application type architecture. Things have already been moving in this direction with .NET, V8, JVM, Interpreters and more. We will see thought that this will fundamentally change things – but it will be slow.

We have been witnessing the revolution slowly with VMWare, VirtualBox, Parallels, but we haven’t yet noticed. One day we will. Imagine when an operating system becomes more of a Kernel running virtual machines?!? It is happening and mark the date by 2016 well see a virtual machine based eco system starting to emerge in place of the Web ecosystem we were promised.

SPDY & HTTP 2.0

I figured it would be fun to buckle down and do some concrete testing and evaluation of SPDY using WebPageTest.org. I’ve summarized some of the results here, since I don’t have another bucket to put them in quite yet. This information is by no means comprehensive, but was fun to evaluate and report on.

SPDY

SPDY is an application level protocol sitting atop TCP [1]. By focusing on multiplexing’s, compression, and prioritization of data SPDY has seen increases in Mobile, as well as normal web traffic [2]. One of the key features of SPDY is that it is transparent to the network itself, and requires no changes on the underlying Internet. The only changes required are in the client user agent header (showing it supports SPDY), and at the webserver in allowing the new protocol stack to be used.

One of the main goals is to reduce the amount of TCP based connections, and re-use the data channels that are already setup. In 2010 the average number of concurrent connections for a single webpage was 29 [2].

Key Features

  • Single Request per connection: HTTP can only make a single request per connection in a REST-full way. TCP sockets have an inherit delay of re-using the connection when we only download a single resource per connection. Browsers have been opening multiple connections to counteract this speed issue. By reducing the number of connections, yet allowing multiple streams per connection, results have yielded near 40% savings over HTTP 1.0 [3].
  • HTTP 1.0 only supports a client-initiated request, and does not allow the server to “push” data to the client.
  • SDPY compresses all headers, since most websites are using multiple heard values, the size of each request can increase to nearly 800k+.
  • Data compression: SPDY enforces data compression for all resources.
  • Header Redundancy: Headers in HTTP 1.0 are mostly redundant and static. Why resend a user agent string on a website load?

Control Frames

There are many types of control frames, below we have outlined the most command, and important frames for use in the SPDY protocol: 

  • SYN_STREAM: The Synchronize stream control frame allows the creation of a stream between 2 endpoints asynchronously.
  • SYN_ACK: Acknowledgement frame that confirms the creation of a stream.
  • SYN_RST: Control frame used to rest the stream due to abnormal errors.
  • SETTINGS: An asynchronous frame that can be sent at any time to send a client or server data about the stream or connection.
  • PING: Control frame to measure the RTT between the two endpoints.
  • CREDENTIAL: specific control frame for the creation and verification of certificates on the server.

Data Frames

Data frames are the simplest of frame, and have the smallest overhead in size. They are variable length and have the following key fields:

  • Control Bit: 0, for Data frame,
  • Flags: 8 bits,
  • Length: 24bits of value representing the length of the data frame,
  • Data: Variable length data.

It is important to remember, that while the data frames are indeed variable length, with 24bits this is the max frame size allowed at 16k. 

Streams

Streams are lightweight control mechanisms for sending data bi-directionally across a TCP based connection. A key feature of SPDY is that multiple streams may be inter-woven. This allows an increase in performance, but just increases the complexity of the server & client in comparison to HTTP 1.0 protocol.

The key value in streams is that they require very little overhead in comparison to an HTTP 1.0 implementation. In HTTP 1.0 a “keep-Alive” header must be issued to maintain the connection, and generally connection overhead is as follows when not using keep alive:

  • SYN, SYN+ACK, ACK, HTTP GET, HTTP RESPONSE, FIN, FIN ACK, ACK or approx. 7 control frames for a single connection, with no concurrent connections, in comparison SPDY allows the following,
  • STREAM SYN, STREAM REPLY, now data can be sent and the connection is wither open, or closed, but no initial TCP overhead is required.

Server Push

SPDY benefits from the stream mechanism, and a single TCP connection by allowing the sever to also push data back to the client as the connection remains open. This enables a better performance from server to client thereby further enabling technology such as web sockets to be more efficient.

Compression

SPDY proposes header compression with claimed results in a decrease of header size across the network of ~85%. This represents a substantial decrease in the control headers needed for connections and a unique advantage over HTTP 1.0.

Testing Summary

Below outlines some key tests I ran on 2 sites I loaded onto a VPS running Ubuntu, and a static and rich media site I pulled using “wget”.

These tests are run with 3 scenarios, HTTP, HTTPS & SPDY protocol or HTTP 2.0 style.

image

Is NoSQL for me?

The NoSQL movement has really taken over the web over the past couple years. Unstructured data is reigning kings, or so we thought.

I’ve been pretty lucky to ignore the noise coming from the NoSQL camp, and really not have a good understanding of the practise, benefits and implementation of a NoSQL database system. While at WebPerfDays in Amsterdam this past year I saw a very cool presentation by the CouchDB guys. I didn’t fully grasp the details – as anyone can imagine – but I understood the high level benefits.

It wasn’t until I started a new project this fall that things became clear. NoSQL is a dangerous tool. It is dangerous because it locks you into a scheme, more so than SQL. SQL forces you to structure your data, so it is easy to to move to more unstructured data. Once you commit to NoSQL and structure-less, it is very hard to go back.

My project started to get constrained by NoSQL, so I dropped it in favour of good ol Postgres. I have to say, I’m happy to have learnt, and understood the limitations. I’m eager to look at the NoSQL performance on certain unstructured queries ie: regex’s, in the future – for nowI’m keeping my data structured.

The Slow Web Movement

idonethis:

idonethis-blog:

If you wish to make an apple pie from scratch, you must first invent the universe. - Carl Sagan, Cosmos (1980).

It started as a vague idea framed as a joke.

When we put our site out into the world on January 2, 2011, we only processed incoming emails once per day. At the bottom of every…

The 2D printing industry and black magic

Perhaps I am just not aware, but why is there so much innovation in 3D printers, mobile phones and more, yet we somehow have missed the good old 2D printing industry entirely? How on earth do we not have a way figured out to have an easily connected printer on networks, or USB, or ad hoc wifi without blasted drivers?

Now I’m no printer expert, but it seems silly why network printing is not just over HTTP. Honestly, we have computers with standardized protocols that can send messages to others. Why on earth do we still have these archaic printer protocols? Why don’t we build a network printer that doesn’t suck. All it does is spin up a web-server on a port and IP, you can see the log output, it is easily debugged, and we get rid of this black magic shit. There is no reason to have massive service businesses where nobody can understand why the heck the printing failed, or the connection is moot on the network. Why is IPP not the default? This is all confusing to me.

Last night learnt myself some Erlang

Functional languages always seem to be on most developers radar, but few seem to take the plunge– this is perhaps until things become main stream. Personally I find functional languages somewhat cryptic and mysterious. While can probably pompously claim to understand JavaScript closures as good or better than most, I lack a bit in the functional realm when it comes to Lisp, Clojure, Erlang and more.

Last night I decided to explore the “real time” mysteriousness that is Erlang. Starting obviously with the basics surrounding syntax and structure. I was fortunate enough to stumble upon the following resources, of which you may find extremely useful:

The Joy of Erlang

ChicagoBoss

Verdict

If you follow the Joy of Erlang with Evan, you will find the syntax to be a bit weird, and the flow to seem a bit strange. What I did find was that for the first time in my life I’m writing less if statements. I feel as though the code I’ve written over the past day somehow looks less prone to error.

Compilation can be an annoying thing for developers coming from the scripting world. It is quite refreshing to have compile time errors. I find it actually forces me to do some better work.

My favourite quote from Evan’s blog is this:

“Admittedly, learning to ride a bird-brained pterodactyl can be tough business, but once you master it, you’ll wonder how you ever got along before.” - Evan Miller

Wealth

If you would like to get a person excited, there is perhaps no better way than discuss wealth. The topic evokes emotional reactions with varying degrees based on opinions, class/status socially and peoples personal experience with wealth. How then do we get at the root of the issues regarding wealth? I would like to break the conversation down as follows, and draw my own conclusions.

Philosophy

Wealth is something that is created. We should all be able to agree on that. Perhaps someone else created yours – inherited, or stolen – or you made it yourself. Perhaps your wealth is created from natural resources in the ground, or from the added value you apply to an already finished product. This is what I would coin wealth.

With this agreement on wealth, can we not agree that people will differ greatly in their creations of wealth? Unless we are all equal – which would render this conversation futile – there must be a large difference in our wealth creation. What effect does this have on our lives? Or perhaps on the lives of all humans?

The notion of wealth therefore breeds inequality. This is a fundamental concept, and perhaps why it evokes such an emotional response.

Inequality

This seems to be the root of the issue, not just that people create more than others, but that wealth is one of the base things that defines us, and our lives: Food, water, shelter and porches. This view of inequality of wealth seems unjust at first glance. Why do YOU, or anyone else deserve more than me. The problem is, it has nothing to do with deserving, it is all about wealth creation. People mis-understand that in a market there is some referee – there is not. Wealth creation is a freedom idea, rooted in Capitalism, and Liberty. Humans are free to own their wealth, and the differences that is produces in relation to others. 

Justice and Equality

If you believe in equality, perhaps you don’t believe in freedom. If we all have the same wealth accumulation output, regardless of our input, are we then not creating an equal system. Fundamentally, at the core, should we believe that all people should be equal, therefore marginalizing the great, and uplifting the crappy? Does equality === fair? I would propose not, that this is unjust. Equality is unjust. People, wealth and society is not equal, I do not think we should attempt to make it so.

A line

At first glance these views seem quite intense and extreme. Is there a line that exists? What about a grey area? I would argue that in a technologically advancing world we want in-equality. That perhaps, there is not line for poverty. We need people to create large amounts of wealth. This wealth as a whole will not trickle down, so much as it will allow even poor people to drive crappy cars. The same way poor people nowadays can afford iPods and Calculators devices which at one point were merely a dream for their inventors.

Environment

The wild card in this scenario. I’m not sure yet where this fits in. But I know there needs to be a lot more thought. Is the environment in conflict with wealth? That would make sense, as it seems to be an icky thing to consider right now, the repercussions of profitable enterprises and “cheap”. I’d like to somehow think about why this is, or how do we solve it? Is it a cultural thing? What about a measuring stick in wealth? We don’t measure wealth in clean water, perhaps we should. In ocean, land and earth health.

Conclusion

Wealth allows our society to move up from poverty as a whole. We need inequality and wealth creation. The absolute poverty in the world will reduce as we create more wealth. Perhaps relatively your poverty will increase in comparison to great wealth producers, but why should you care? Should you not only care about your absolute poverty?

Off the shelf business software

Immutability Part 2 “The Actor Model”

Immutability is the future

Service Busses and Scalability

Linux Box Hardening for Digital Ocean

Scalability Part 1: Programming Languages

Virtual Machines as the future

The 2D printing industry and black magic

Last night learnt myself some Erlang