How to Prepare for the Coming Age of Dynamic Infrastructure

Infrastructure 2.0 Journal

Subscribe to Infrastructure 2.0 Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Infrastructure 2.0 Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Infrastructure 2.0 Authors: Ravi Rajamiyer, Liz McMillan, Elizabeth White, Pat Romanski, Derek Weeks

Related Topics: Cloud Computing, Virtualization Magazine, VMware Journal, Cloudonomics Journal, Infrastructure 2.0 Journal, Microsoft Developer, CIO/CTO Update

Blog Feed Post

Scale Up or Scale Out?

Lots of Little Virtual Web Applications Scale Out Better than Scaling Up

Lots of little virtual Web applications scale out better than scaling up. Surprised? I was, but I shouldn’t have been.

capacity While working on other topics I ran across an interesting slide in a presentation given by Microsoft at TechEd Europe 2009 on virtualization and Exchange. Specifically the presenter called out the average 12% overhead incurred from the hypervisor on systems in internal testing. Intuitively it seems obvious that a hypervisor will incur overhead; it is, after all, an application that is executing and thus requires CPU, I/O, and RAM to perform its tasks. That led to me to wonder if there was more data on the overhead from other virtualization vendors.

I ended up reading an enlightening white paper from VMware on consolidation of web applications and virtualization in which it observes that multi-virtual configurations actually outperformed in terms of capacity and performance a server configured with a similar number of CPUs. Note that this is specifically for web applications, though I suspect that any TCP-heavy application would likely exhibit similar performance characteristics.

quotesAlthough virtualization overhead varies depending on the workload, the observed 16 percent performance degradation is an expected result when running the highly I/O‐intensive SPECweb2005 workload. But when we added the second processor, the performance difference between the two‐CPU native configuration and the virtual configuration that consisted of two virtual machines running in parallel quickly diminished to 9 percent. As we further increased the number of processors, the configuration using multiple virtual machines did not exhibit the scalability bottlenecks observed on the single native node, and the cumulative performance of the configuration with multiple virtual machines well exceeded the performance of a single native node.

-- “Consolidating Web Applications Using VMware Infrastructure” [PDF, VMware]

We know there’s overhead associated with the hypervisor. Fact. But what’s interesting here is that the overhead turns out to be irrelevant – at least in the case of web applications. What’s important is the initial degradation of performance and its subsequent improvement as additional virtual instances are added. We need to understand why that’s the case, because it has – or should have – an impact on our overall architectural strategy.


SCALE OUT VIRTUALLY for BEST RESULTS

image

 

 

 

 

 

 

 

 

 

 

 

 

So why would multi-instances of a web server – virtual no less - scale better for performance than simply scaling out, i.e. adding more CPUs? If we look at typical performance patterns from really any TCP-connection oriented device or application, we see very similar behavior. Capacity of the device or application tends to have a steep growth curve that plateaus rather quickly and then remains somewhat constant. The associated performance pattern of such devices and applications tends to begin with very low latency and good response times, but gradually increases as capacity plateaus at or near capacity.

This pattern, with few aberrations, should be fairly recognizable to anyone who’s performed any kind of load or performance testing on a connection-oriented (TCP-based) solution. In fact, an obvious deviation from the pattern often indicates some sort of problem in the network or solution that needs to be addressed. Garbage collection processes in JavaEE application servers, for example, have traditionally been seen as regular inverse spikes in the overall number of TCP connections and CPU utilization on the host server coupled with an increase in response time as the CPU is completely utilized for a matter of microseconds while the process completes. The reason this is consistent across connection-oriented devices and applications is because they are connection oriented. Connections must be tracked, i.e. stored in memory, and subsequently accessed as messages flow across the connection. This requires RAM and, in some cases, I/O resources. As the number of connections grows, the “table” in which they are stored grows, thus increasing the amount of time necessary to “find” the connection as well as the associated resources. Too, the more connections the more serialization and locking that occurs and it is the serialization that is another primary bottleneck for the web server.

Hence, the more connections made to a given solution, the more its performance tends to degrade.

Virtualization appears to actually address this issue by limiting connection capacity by limiting available resources. On the other hand, adding more CPU and RAM will lead to higher connection capacity and thus larger connection tables which leads to a higher degradation in performance due to the increase in serialization. Rather than simply adding CPUs it would be, from a performance standpoint, probably a better option to add another virtual instance – and another, as CPUs increase – to maintain consistent capacity and a predictable performance pattern.

You need to scale up the hardware capacity, but should scale out at the virtual and application layers to optimize efficiency of the resources and maintain the end-user experience. By load balancing across multiple, smaller, homogeneous server instances you also make capacity planning much simpler because you know exactly what the capacity for a given instance will be and can use that information to prepare in advance a plan for increasing capacity on-demand. Scaling up does not offer the same consistency because capacity will be highly dependent upon the CPU and RAM provisioned as well as load.

Read the original blog entry...

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.