Sunday, October 18, 2009

VMware: five biggest challenges of server virtualization

Although the benefits of virtualizing x86 servers have been pushed relentlessly for the past five years or so, much less discussed have been the challenges involved in moving to a world where resources are pooled and everything is linked.

The complexity that such a scenario generates can have a knock-on effect on issues ranging from infrastructure and licensing to skills, which means that migrating to the new environment can end up being an expensive upfront proposition.

Adrian Polley, chief executive at IT services provider Plan-Net, says, "You are often talking about a complete change in infrastructure, which is why people who started on this path before the recession may have continued, but not many have plunged in since."

A key challenge is that virtualization involves sharing resources, whether that relates to hosts, storage or networks, but changing one element of the whole can have repercussions elsewhere.

"All of this sharing means that if you give to one thing, you take away from something else, so it becomes a balancing act to understand how resources should be properly allocated," Polley says. "There are always bottlenecks and you can end up just moving them around. Because things are so interconnected, you can end up chasing your tail."

As a result, we have come up with a guide to help you work your way through the mire. Below we look at five of the biggest challenges relating to x86 server virtualization and what you can do about them.

1. Network connections

"If the network is not up to snuff, you are in trouble from the start. But the bad thing is that, if you have virtualized your servers without doing your homework, you will not know whether it is the network that is to blame for performance issues or something else," says Dan Hidlebaugh, network server manager at Hertford Regional College.

The educational establishment virtualized its x86 servers about two years ago in a bid to cut escalating utility bills, reduce its carbon footprint and improve its disaster recovery provision.

A campus-wide agreement with Microsoft meant that licensing fees were lower than those of rival vendors. So it agreed to become a European test site for the supplier's Hyper-V offering, helped by IBM, which provided the college with a free six-month trial of its BladeCenters. The organization has now consolidated its 120 physical servers down to about 55 virtual servers and expects more to follow.

But Hidlebaugh warns that the success of such projects is not just dependent on ensuring that the virtualization software works effectively.

"You have to look at what hardware you want to use, the storage area network (San), how you connect the two, how they connect to the network, how the network reaches the end-user, etc," he says. "You can have a great virtualization platform, but if clients cannot access it due to a network bottleneck, it is useless."

The college had already decided to upgrade its network as part of a planned move to new premises and undertook a thorough review. As a result, it introduced an enterprise-class Cisco router, a dual-band wireless network and 10Gbit network-to-edge switches to connect the system to users in each classroom. Twelve core fiber cables were also laid for redundancy purposes and the network was tested "mercilessly" for a month to push it to its limits.

Another performance consideration, however, related to the communications backplane of the host.

"We had to ensure that the servers' backplane could handle the same speeds as the router. If you just throw memory and processing power at it but are stuck with a 1Gbit network connection, you will end up with big performance issues,"
says Hidlebaugh. The BladeCenters in question have a backplane of 700Gbits.

2. Network storage
A further concern when going down the virtualization route relates to storage. Hypervisor suppliers generally recommend implementing network storage such as Sans for larger production deployments, particularly if organisations are keen to deploy high-availability tools such as VMware's VMotion. Direct attached storage may suffice for smaller development and test environments, however.

VMotion enables the automatic migration of workloads between different servers should one crash or need to be taken down for maintenance. But this activity requires that virtual machines be stored as disc images in the San. Each host on the network needs to be able to see each disc image to understand when and where to assign spare processing capacity should it be required.

But Sans - and personnel with the appropriate skills - are expensive to acquire, especially if organizations opt for higher performance fiber channel-based systems rather than cheaper ISCSI equivalents.

Even if such a system is already in place, it may be necessary to upgrade it to ensure that performance is adequate and that all components are certified to run in a virtualized environment, which is not always the case. Checking suppliers' hardware compatibility lists is a must, as is following configuration recommendations.

3. Sizing storage capacity
Another must is to size the San adequately, not least to guard against wasting money by over-provisioning the system. Such a consideration is also important in light of the fact that some organizations find their applications run more slowly in the wake of a virtualization implementation, despite their use of server-based memory management techniques such as page sharing.

Hidlebaugh says, "Disc issues tend to be the problem." The challenge in this context is that virtual machines generate a high number of I/O requests to be processed each second, but the San's physical discs may be unable to keep up.

One way of getting around the problem is to use workload analysis and planning tools such as Novell's Platespin. These tools evaluate what level of capacity is likely to be required for a virtualized environment based on the profile of current physical servers in terms of memory, disc, processor and network bandwidth usage.

An array that supports mixed workloads can also help. I/O-intensive applications such as databases and high-throughput software, such as backup, all appear as a single big workload to the array despite their different requirements.

But because priority is given to processing big blocks of data, smaller I/O-based sequential transactions are generally made to wait, which negatively affects their performance. A system able to handle both kinds of workloads simultaneously can help to address the issue, however.

4. Back-up challenges
Many organizations continue to back up their virtualized server environments in the same way as their physical servers, but this approach has its downsides. A key challenge relates to the fact that such activity in a physical environment is often undertaken by software agents that are installed on host operating systems and back up both applications and data to either disc or tape.

The problem with doing things this way in a virtual world is that virtual machines consist of complete logical environments that include not just the applications and data, but also the VM file system. Because traditional software does not back up the VM file system, should the virtual machine go down, it is necessary to rebuild the file system from scratch. The system must then be restored and configured and the relevant data and applications copied over to run on it.

Northern Ireland-based car dealership company Isaac Agnew was unhappy with the time-consuming nature of this process and so introduced specialist back-up tools from Veeam.

The organization initially virtualized a Dell blade server in the latter half of 2007 to try the technology out, but is now running 20 virtual machines on three VMware ESX-based machines used mainly for development and test purposes.

Tim Carter, senior systems administrator at Isaac Agnew, says, "Before Veeam, we had scripts that one of the team had written to automatically copy some files from the virtual machine onto one of the servers that was backed up periodically using CommVault. But we had to manually choose what to synchronise into the back-up folder, and if we missed something, we were in trouble."

Backing up each virtual machine would have meant purchasing a back-up license for each machine on which they ran, which was considered too expensive.

But the snapshotting capabilities of the new tools now mean that, "We can restore the file system in a minute as opposed to hours of rebuilding the virtual machine and copying files, which often resulted in staffing having to do overtime in the evenings and weekends," Carter says.

Although more storage capacity is needed to back up virtual machines in this way, the compression functionality provided by the tools mitigates this requirement nicely, he adds.

5. Application support

Although most applications will run in a virtualized environment, obtaining full support is another matter. There will be no problem with packages that are certified as "virtualization-ready", but some suppliers are unwilling to commit themselves to this approach either because they have not fully tested their software on virtualized hosts, or because their applications have already run into some kind of problem in the virtualized environment.

Other companies offer a kind of half-way house service in that users will be requested to reproduce any faults on a physical server if it is suspected that the issue is associated with the move to virtualization.

As a result, Hertford College's Hidlebaugh believes that it is necessary for organisations to go through "a whole process" to decide which applications are suitable candidates for migration and which are not.

"Suppliers of things like domain controllers told us that their applications were not proven yet and so to please wait. There are about 30 of our servers that we are not going to virtualize and about 10 of them relate to applications that have not been tested," he says.

"It is crucial to talk to your suppliers and anyone else who is supporting your applications,"
Hidlebaugh warns, otherwise you could end up putting yourself at risk.

He would also be wary about virtualizing I/O-intensive applications such as Hertford College's Microsoft SQL Server databases and Exchange e-mail servers without heavy amounts of testing due to San-related performance issues.

Skills
The knock-on effects of moving to a world where everything is interconnected do not end here. Another important thing to think about is skills, particularly in large enterprises, where IT staff tend to specialize in key functional areas such as storage, servers and networking.

Because all of these areas begin to overlap in the virtualized world, it is easy to end up in a scenario where support is duplicated in some areas but falls through the gaps in others. It is crucial to clearly delineate roles and decide on who is responsible for what. It may also be necessary to train personnel across the IT department in new disciplines.

Plan-Net's Polley says, "The skills issue is hard to overstate because people end up having to have a much greater breadth of knowledge. They really do need to be expert in a bunch of areas if they are going to solve problems in a virtualized world successfully."

No comments: