Friday 2 January 2015

My Dream Project - Part 1

My Dream Project

Disclaimer: I want to keep this focused on a development/design point of view, so I have explicitly made the decision to not disclose details of the resulting project or the products used. 

In 2014 I got to work on what I personally considered to be a dream project of huge significance. There was a tight deadline, and a lot to do to integrate a number of existing products into a web platform that would be exposed to a very large audience. The dream here was that it was greenfield project to replicate an existing feature set. The scale of the platform required pretty much everything except the products feature set (and even that was improved/changed) to be re-designed from the ground up.

I wanted to take some time to record my thoughts about the project from hindsight. So this is what I learned, why I had to learn it and if what I did worked. I think that there is a lot to cover, so I am going to do this through a series of posts over the coming weeks/months. This post will start with an introduction to the scope of the task.

The Task

The task was a publicly accessible version of my companies products. Up until this point, the products have operated completely privately. The visibility of the platform was quite high and would be receiving press coverage. It has a large pre-existing user base, and the environment that it needed to operate in was much more active than what I had been used to. 

The web platform that was being built required us to run with the expectation that we would be seeing approximately 200,000 times more concurrent logged in and active users than any existing system. These users would expect sub 1 second response times, and this would need to be maintained 24/7, 365 days a year. 

I did lot of research into various technologies, stack choices, infrastructure options and just generally coding practices. I read a lot about resilience, scalability and performance optimizations for web applications, the problem I found was not a lot actually says specifically and explicitly how that is achieved. 

The reason for this I found - and perhaps you already knew - is that really it is dependant on your stack, your practices, your resources and most importantly your product. There are a variety of options for any given problem, each with their own pros and cons. Its up to you as the developer or system architect to kind of know your tool set, and then select the right tool for the job. So the reason it is never spelled out explicitly is because its likely whatever person A did, isn't going to work as well - or at all - for person B. 

At the time, the products were deployed to a number of web platforms running in various environment scenarios, but no single web platform had anything close to even half of that kind of activity. The reason for that was due to a number of factors. The user environment meant they would interact with the product infrequently during the course of the day. Only a portion of the entire user base would be logged into the platform at any one time. When they were logged in, their activity had long periods of silence in it as they consumed content. Overall this meant that our platforms were not getting 'pounded' by heavy, consistent usage.

The platform that needed to built and operated was expecting a much higher volume of constantly active users who never went 'silent'. I remember thinking at the time how much of a leap that was going to be, how as the system architect I was going to explain the gravity of that task to non-technical people. In the end that part was fairly easy.

No comments:

Post a Comment