Cloud Tech IV – Cloud Computing Conference Summary
Last Saturday (Apr 20th), I attended the Cloud Tech IV, a conference meant to learn, share and discuss the latest and greatest in the Cloud Computing Technologies. The conference was organized by the Silicon Valley Cloud Computing Group, with support from some big players in this field including but not limited to Amazon AWS, RackSpace, VMWare, Scalr, HP, etc. (most notably, Microsoft, Google, and other big cloud service providers were missing in this conference).
As with other Meetup groups, there were a mix of people in the attendees list including developers, ops guys, employers, marketers, venture capitalists, and job seekers, and the underlying motives are many. This event was partially successful in achieving those objectives. As for me, It was useful. Given below, I have summarized my thoughts on few of the talks that interested me.
Amazon AWS Discussion:
The AWS talk was given by Apolak Borthakur, the head of AWS Bay Area development center. He touched upon few web scale challenges unique to Amazon AWS, such as the need for developing custom software based routing software, the need to customize virtualization softwares, and the reason why the standard Database scaling practices didn’t work for them. I was a bit disappointed by the fact that the speaker couldn’t provide more details due to confidentiality issues.
Some interesting snippets from the AWS talk:
- AWS has real, serious web scale issues to solve — to the point where they require custom OS, softwares, VMs, Network, DNS, and even physical hardware such as Routers!
- The S3 storage has grown to more than 2 Trillion+ objects lately (1.3T on Nov ’12, 1T on Jun ’12).
- AWS infrastructure can handle 1.1 million reqs/sec at Peak!
- The virtualization overhead is less than 5% (i.e., minimal, but not stellar)
It became clear to me that in spite of all the world-class infrastructure and service excellence, the real brain behind AWS success is its softwares, especially the way they add intelligence to commodity hardware. Amazon knows it very well! The company is a big believer of building technologies from scratch. It makes sense — from my Yahoo! experience, i can say that the standard out-of-the box softwares only works to certain extent.
The big reason behind AWS sponsoring this event (in my opinion) is to let Bay Area know that Amazon has opened a new silicon valley development center for AWS (Amazon already has presence in SF bay area through companies such as A9, Lab126, Amazon Music, etc.). This is good news for Bay Area engineers who have been interested to work on great engineering challenges faced by Amazon AWS.
FYI, the speaker is not planning to post the slides from this talk online.
How Box.com implemented MySQL Sharding
This talk was presented by Tamar Bercovici, Engineer at Box.com, and it was about how to handle exponential Database growth using Sharding. The concept of Sharding is not new — it reminded me of my days at Y! Personals where we used sharding to split the Mailbox data into multiple “farms” based on the unique UserID hash. Back then we used MySQL 4.x and ran into various operational issues due to immature replication technology. Sharding is used successfully in many Y! Properties such as Flickr, Y! Finance, and Y! Sports. Facebook is known for taking MySQL performance to extreme levels using Sharding (among other techniques). The latest number I saw on the web is that MySQL infrastructure at Facebook can handle 60Million qps peak reads. (In 2010, it was close to 13MM qps peak)
Whats interesting about this talk was how the presenter put together a complete picture from analysis, design, implementation, testing, phased deployment, and finally, the learnings from this process.
Two things to point out here:
- It is easy to come up with an exotic design to solve a pressing problem, but seeing it all the way through finish line is not easy – it requires lot of planning, perseverance and creative problem solving. In my opinion, this is how I differentiate a normal engineer vs. an EXCEPTIONAL engineer.
- In almost all cases, you have to rollout a major new design WITHOUT ANY interruption to the existing site functionality/usage/revenue/etc. Sort of like changing the tires of a car that is running at 60miles/hr on a freeway. EXCEPTIONAL engineers acknowledge this difficulty and plan well ahead of time.
I thought the incremental testing/implementation/rollout approach taken by the Box.com team was smart. It helped handle the above two scenarios very well. If you are interested in how they did it, check out the presentation (might not be the same, but close).
AirBNB – Using Chronos to orchestrate ETL Jobs
This talk was presented by Florian Leibert of AirBNB.com. It was about the challenges that AirBNB encountered in scheduling and running massive data processing jobs that require complex dependency management. They ended up building a new fault-tolerant scheduler called “Chronos” to replace plain old cron jobs, and also open sourced the project (thats very nice of them). Chronos is built on top of “Mesos”, which is a cluster manager software from Apache.org, which I am not familiar with.
The presenter clearly summarized the pain points of dealing with cron jobs, i.e., lack of dependency management, retries, and synchronous flow. Chronos can work on multiple frameworks on a same cluster, and is also agnostic about the env it runs on (VMs or bare metal).
I still didn’t understand why AirBNB did not use Apache’s Oozie workflow scheduler — its something that I need to spend time to understand (Y!, Cloudera and few other Hadoop distributions use Oozie as a standard mostly). Overall, I learned a good bit from this talk.
Real-time analytics at Facebook
Jun Fang, an Engineer from Facebook, along with Eddie Ma, Manager at Facebook, presented how they extended Facebook’s analytics infrastructure to deliver near real-time analytics. Facebook’s problem has been two-fold. One one hand, the data volume is growing exponentially, while on the other hand the Business/Analysts are demanding complex data requirements. Getting the metrics asap was key to business success, and Facebook solved it by providing near real-time analytics solution.
The solution involved building a custom “Subscriber” (a pub-sub model) layer which streamed the Web logs real-time to a HBase merge-table. The data is then processed using a custom Hive-HBase handler, and then moved/appended to the Data Warehouse, and finally made available to Hive using a read queue. It might sound complex but easy to understand with a diagram. Unfortunately, I don’t have the diagram handy.
Real-time analytics is fascinating. Lets consider a practical use case where a Product owner (PM) needs to A/B test a feature before going prime time, and has 25 unique variations/sequences to A/B test. Say, If there is a 1-day delay in gathering full metrics, then it will take a month minimum to make a feature decision! Not good. Fast forward to real-time analytics, and now, the PM can make the same decision in 1 day. That is the power of real-time analytics.
I have to say though, I was under-whelmed with the presentation. The speaker didn’t communicate well on the stage. Also, the panel discussion was a waste of time, mainly because the conversation was not audible, and the speakers swallowed words during conversation.
Percona XtraDB – the MySQL Cluster
Vadim, the CTO/Co-Founder of Percona Software presented the Percona’s XtraDB Cluster product and its features. Since I have no working knowledge of XtraDB product, I was blown away with its features and capabilities. I didn’t know that such a feature rich product for MySQL exists in the market today.
Basically XtraDB is the next level up after InnoDB. It solves some of the inherent shortcomings of InnoDB, and more. While InnoDB provides ACID compliance, and crash recovery capabilities, it sacrifices on performance and synchrony. If you implement a multi-master replicated database setup, there is no guaranty that the data in system A is fully in-sync with data in system B. With XtraDB, you get Virtual Synchrony.
I have not tried any of Percona’s products yet, but I plan to do so in future. If you are interested in this topic, here is the slide deck from this talk (might not be the same but close).
Unfortunately, I didn’t get an opportunity to listen to the remaining talks, so I can’t comment on them. But here are the links to the presentations if you are interested:
I hope this summary helps those in the Cloud Computing community who couldn’t make it to the conference last weekend.