Archive for the ‘cloud computing’ Category

Cloud computing is most often associated with scalability (see Amazon CTO Werner Vogel’s definition of scalability).  One commonly held view is that you can simply move an application onto cloud based infrastructure and it will then “magically” scale on demand.   The reality is that there is no free lunch.  Simply throwing additional CPU cycles or storage at an application is not going to deliver linear scalability unless the application was designed to scale in such a manner.

The cloud era heralds the development of new enterprise application platforms available on demand as well as new social platforms.  However this isn’t as simple as taking the current crop of relational database centric solutions and deploying them on Amazon EC2.    Of course this isn’t stopping vendors from taking that approach and offering on demand versions of their products.  The challenge is that these applications are not designed to scale dynamically and in a distributed manner.   The result is that as traffic and usage grows there will be a continual cycle of monitoring and patches to try and keep the application performing to an acceptable level.   While this will always be necessary to monitor and improve there are lessons to be learnt from some of the largest concurrent, multi-user sites that can help reduce the pain.

Consideration of cloud based scaling is clearly dependent on the nature of the application and the anticipated volume of usage.  If the application for example is very read heavy and low on write transactions then replicating databases with good caching could well be sufficient.  However for solutions that require massively concurrent heavy write based access to the database consideration needs to be given to architecting to achieve scalability.

Distributed database versus relational database

Relational databases are primarily designed for managing updates and transactions on a single instance.  This is a problem when you need massively concurrent access with millions of users initiating write transactions.  The approach taken to address this is usually clustering or sharding.  But this is really attempting to patch up the problem rather than addressing it full on.  That said there are many large scale examples using a relational database and applying these approaches.

Given a clean sheet and current developments what approaches can be used to address massively concurrent write heavy applications.  Well there are a number of different distributed database solutions that have emerged in the last few years either based on some form of key-value distributed hash table (DHT), column-oriented store or document centric.  They are often built to address precisely the issue of scaling for write heavy applications.  However they should not be considered a direct replacement for a relational database.  They often lack support for complex joins, foreign keys as well as reporting and aggregation – although some of these areas are beginning to be addressed.   Also there is not currently an SQL or object mapping such as Active Record to cleanly and transparently access them from code, so extra development effort is required.   However they should certainly be considered as part of on an overall architecture and leveraged to reduce write heavy bottlenecks in the solution.

Amazon SimpleDB – simple key value DHT, based on the Dynamo solution created by Amazon

Apache CouchDB – document centric approach built using Erlang

Cassandra – DHT variant that supports a rich data model,  originally born at Facebook, now an Apache incubator project

HBase – column-oriented store similar to Google’s BigTable, uses Hadoop as a distributed map/reduce file system.

Here is a great blog from the cofounder of Last.fm on the multitude of alternatives to a traditional RDBMS for heavy distributed write based applications http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/

Another blog worth reading on distributed key stores is http://randomfoo.net/2009/04/20/some-notes-on-distributed-key-stores.

Stateless immutable services

One of the guiding principles for linear scalability is to have lightweight, independent, stateless operations that can be executed anywhere and run on newly deployed threads/processes/cores/machines transparently as needed in order to service an increasing number of requests.   These type of services should share nothing with any other services they simply process asynchronous messages.  This type of async message passing has been proven to scale in languages such as Erlang.  One paradigm that is closely aligned to this approach is known as the Actor model.  The actor model is all about passing immutable messages and the share-nothing philosophy.  A lightweight stateless protocol such as REST is well suited to allowing these services to be accessed across the internet through HTTP.

Speaking the language of scalability

As always choice of programming language can end up being more an emotional rather than necessary decision.  But it is true that it can help to pick the right tool for the job at hand.  Some languages have better support for developing highly concurrent distributed and scalable applications. The characteristics to look for are languages that encourage immutable data structures and referentially transparent methods, typically being functional in nature and supporting asynchronous message passing.  Two popular languages that are receiving alot of attention are Scala and Erlang.  Scala runs on the JVM and was famously used to provide scalability for Twitter by implementing a message queuing solution.  Erlang has it routes in embedded systems and so was optimised to run on minimal resources.  It utilises processes which are much lighter and faster than even O/S threads supporting both multiple cores or multiple machines transparently.  Both Scala and Erlang have good support for the Actor model again encouraging scalable independent async message driven design.

In the end there is still more learning and maturing to be done in developing the next generation of cloud based solutions and not all will need to scale to high volumes.  It will be an interesting time and there is much that can be learnt from others who are already dipping their feet in this pool.  A good site for keeping track of what others are doing in the whole space of scalability is http://highscalability.com/.  Being aware of these changes is especially important when embarking on new projects where consideration to scale and using cloud infrastructure are factors.


Read Full Post »

There are couple of different areas that people are typically looking for when they investigate cloud computing for their business, namely:

  1. Flexible infrastructure that will scale dynamically as needed with pricing that scales accordingly.   They can then deploy existing or new applications/services onto this infrastructure.
  2. Applications/services available on the cloud that they can start using straight away with minimal change while scaling the number of users dynamically as needed – sometimes known as SaaS.

Right now the second option is limited in terms of the number and type of solutions available, with a few established players such as Salesforce.com getting most attention.  There are of course a number of niche SaaS solutions available but these tend to be quite closed in nature and don’t really offer any kind of platform for growth.  Hopefully this will change over time providing greater choice to customers with more vendors and services available including open source options.

The first option of using cloud computing infrastructure is growing in popularity as it provides a more cost effective and scalable solution for deploying new applications whilst not constraining enterprises from having to pick from the somewhat limited selection of SaaS solutions available in the market.

The large established solutions in the cloud infrastructure solution space include Amazon EC2, Google AppEngine and Microsoft Azure.  However there is much more choice becoming available with a number of interesting open source solutions emerging.

Slight digression but does anyone care if the infrastructure is open source or not ?  Well you should care as having people share the code underpinning these technologies will allow a greater number of solutions to be developed with faster evolution and improvement of the technology which is better for everyone.   Good analogy is to think of science where information and discoveries are freely shared allowing for further and faster progress and advancement in many areas – open source brings that free sharing and progress philosophy to software.

So here are some of the open source initiatives worth looking at in the cloud computing infrastructure space.  Not only are they interesting of themselves but some of them like Hadoop are building blocks that can and are being used in assembling other initiatives and solutions.

  • Open source cloud computing platforms
    • Joyent – uses OpenSolaris and OpenSocial
    • Eucalyptus – education driven project supporting Amazon’s EC2 interface
    • Reservoir – European initiative with backing from IBM
    • Enomaly – uses RedHat’s libvert for virtualisation
  • Open source massively parallel distributed database technology
  • Open source cloud computing components
    • Reductivelabs – provides cloud infrastructure tools Puppet and Facter
    • Hadoop – Apache highly scalable distributed filesystem used by HBase and other projects
    • OpenNebula – virtual machine management, used in Reservoir project

Learn->Share->Improve – collectively we can build and assemble the open cloud computing platform giving everyone more choice at lower cost.

Read Full Post »

Consciously or unconsciously businesses and people in general are using more and more applications within the cloud. Everything from corporate intranets, CRM and company email through to Facebook, LinkedIn and Twitter are all delivered through the cloud to a browser. The challenge for businesses is how best to leverage the solutions available within the cloud, both those running inside the company firewall and those beyond.

There is significant interest in cloud computing infrastructure, applications and services with many new businesses emerging addressing these areas and the well established software companies responding by offering SaaS based solutions for their current products.   This ever growing marketplace while offering greater choice requires deeper investigation to determine which services to leverage. Equally important in addition to deciding which services/applications to use is how to seemlessly stitch these solutions together along with existing non-cloud based solutions.  This requires taking a broader perspective to enterprise architecture that includes services beyond the company firewall as an integral part of the solution not just an afterthought.   Assembling the services you need in the cloud is made easier due to the open standards typically being adopted such as SOAP and REST.  It also requires consideration to be given to security of data being transferred and stored in the cloud.

What makes the cloud so interesting is it’s openness and the ability to truly assemble not only the best services but not be locked into one particular platform or have to accept a monolithic solution which is providing capabilities you don’t need.   As the breadth and variety of services available grows businesses can decide to utilise the services they need when they need them at the right time for their business.  Assembling or even pre-assembling compelling services and applications in the cloud will offer businesses the business agility and control of cost they need both this year and beyond.

Read Full Post »