We @ CYBOSOL have always been strong proponents of open source, with a belief that knowledge should be shared. We have decided to put together this series of writeups to share our experience and learning, just our way of saying a little thank you to the Open Source community for all it's benefits & goodness that we have been enjoying through CYBOSOL's 9 Year journey.
So as a start, the R&D team of CYBOSOL, is embarking on a project to find out the possibilities of scaling Drupal massively without worrying about the traditional bottlenecks of databases and file store.
Why Drupal ?
The answer to the question is very simple, Drupal is one of the simplest, yet powerful and widely used content management systems in the world. But most of the time users get stuck while trying to scale according to today's demands, most often amplified by the random vulnerability and rogue scans trying to bring down the servers.
Why we chose the tools that we chose?
The primary goals of this project were identified as
- Maintain and Support an unmodified Drupal Core.
- Retain all of Drupal's core functionalities.
- Scale on-demand and tackle database & static files bottlenecks.
- Cost-effective model ready-to-deploy on popular Paas platforms.
So in order to achieve the above goals we chose the following tools to be part of our journey based on our experience and availability of resources.
OpenStack - Nova, Swift & Neutron
OpenStack was chosen, just because it was readily available in our Lab and has all necessary services including Nova, Swift and Neutron powering away our other projects. All we have to do is to create another project and fire-up the VMs and ObjectStore. We will probably use Ansible to automate the process though.
Apart from the above reason, most importantly OpenStack Swift makes a good choice to keep files outside of the stateless web server pods, which will be moving around the cluster to keep up with varying demands. This also means that we don't have to worry about shared folders and long running pods & containers with stateful content.
CoreOS Beta version 835.1.0 (Beta as of this writing & remember we are still in R&D phase, so no issues in using it) is the Operating System of choice as it has all the necessary tools - etcd2, fleet, Docker & kubelet - built in and ready to start a kubernetes cluster. So why look for another when things are ready for you in a single plate :). We have already written couple of cloud-config YAML to speed up the build process.
We were in a dilemma as to which orchestration and scheduler to use. Recently we came across this nice like post from Adrian Mouat and we were convinced that Kubernetes would be the best choice for now than swarm or fleet or even Mesos.
Yes of-course we will be using Docker containers as it is the container of choice in Kubernetes. Please do not get us wrong, we are not going to put everything inside one container and going to say we are done. Instead we will follow the Kube way of making Pods and Service endpoints.
We are planning to use YouTube Vitess, the MySQL based DB powering the metadata DB engine behind YouTube's massive meta data. We are not yet sure if it would fit our needs, but from the descriptions and notes, we think it will be a good bet.
The key-value pair caching engine to off load DB load. We may or may not use it as Vitess provides majority of the off-loading functionalities on its-own.
Since the project objective is to keep things simple for Drupal deployment, we will use Apache 2.4, with PHP5 module. As you know Drupal is made while keeping Apache in mind specially those nifty htaccess rules that comes shipped along with Drupal.
Just like icing on-top of a cake, Varnish-cache would be the simple and elegant caching engine to complement Drupal and to take the brunt of the web traffic.
Nginx - In case if we have to off-load HTTPS
If time permits we would be implementing Nginx pods for off-loading the SSL traffic. Idea is to put up Nginx next to Varnish and Nginx would transform and transfer all incoming SSL traffic to the Varnish cache.
Though one could argue that to avoid complexity why don't we just use Nginx for both caching and SSL off-loading. Well, yes we could. But we would leave that decision to the reader.
So that's the brief about the project. we are planning to start the project by next week. Our plan is to release the full set of how-to by 20th November. Please stay tuned!! Comments and suggestions are welcome.
CYBOSOL R&D Team.