Bobby Boucher, persistent virtual desktops ARE THE DEVIL!

No they’re not. In fact, most desktops are persistent, they’re just not virtual. Other industry experts have already had this discussion many times and I think it’s my turn to sound off.

I have to be completely honest, I’m just plain tired of the headache that non-persistent just ends up causing from a management perspective, especially when a group of users is placed in the wrong use case bucket because the proper strategic use-case analysis wasn’t performed in the first place. Sometimes that stateless headache is totally worth it, and other times it isn’t. The concept of deleting a user’s desktop when they log off is noble from a clean and tidy “fresh desktop” configuration standpoint, but it also inherently causes a number of different issues that plenty of customers just can’t seem to get past in the long haul, especially if use case is incorrectly assessed initially. It takes a lot of administrative work to have a near-persistent auto-refreshing desktop experience, so let’s just break this down for a second with some simple pros and cons for each.

Non/Near Persistent Desktops

PROS

Can meet very low RTO/RPO disaster recovery requirements with abstracted user files/app data
In environments with rolling concurrent session counts it can reduce overall costs
Will allow for the fastest provisioning of desktops (especially with project Fargo)
Storage capacity requirements can be reduced because of natively thin desktops
Logging onto recently refreshed desktops ensure a consistent operating environment
Fits a large majority of task worker and non-desktop replacement use cases extremely well

CONS

A UEM is required to persist application data and user files, staff must learn and become comfortable with the tool
Application delivery and updates must (for the most part) be abstracted and delivered and updated quickly. This often times requires additional software and training (App-V, AppVolumes, Thinapp, etc)
Any appdata problems that persist in the profile completely negate the “fresh desktop” stateless desktop idea
Does not typically integrate well with traditional desktop management solutions such as SCCM, LANDesk and Altiris
The smallest configuration change for a single user in a stateless desktop can result in hours of work for the VDI admin
Applications that require a non-redirected local data cache must be re-cached at each login or painfully synched via the UEM

Persistent Desktops

PROS

Users live on their desktop, not partly on a share and partly in a desktop
Consistent experience; the smallest changes (including those performed by the helpdesk) unquestionably persist
Simple to administer, provision and allocate
Fits all desktop replacement use cases extremely well
Desktop administrators do not need to introduce new management tools
The user pretty much never has to be kicked out of their desktop except for reboots required by patches, software installs

CONS

Any disaster recovery requirements will add a lot of complexity, especially without OTV
Non-metrocluster persistent desktop DR plans typically dictate 2 hour+ RTO for environments at scale
Persistent is resource overkill for the task worker and occasional use non-desktop replacement use cases
Can be cost prohibitive for rolling concurrent session counts it can increase overall infrastructure costs to give 1:1
Year 2 operations can fail to be met; operational “this person left 2 years ago and is still on his desktop” type problems
Relies on existing application distribution expertise
Problems in the desktop must be handled with traditional troubleshooting techniques
Without proper storage infrastructure sizing underneath, simple things like definition updates or a zero day patch can make an entire environment unusable

My big picture take aways:

Perform strategic business level EUC/mobility use case analysis up front. If an optimized method of delivering EUC is realized, select the best combination of desktop and/or server and software virtualization for the business. Once the technology has been selected, do the detailed plan and design for the chosen technology. Don’t slam both the strategic business analysis and the product design together because you need take a good hard look at what you actually need and where you can optimize before you put a blueprint together. As a part of the business use case analysis, ensure to get sign off on the virtual desktop availability and recoverability requirements. The outcome of such an analysis should provide information in regards to which specific departments will have a great outcomes with a virtualized desktop experience and which ones have less or no value.

Are you building an infra that will let you provision virtual desktops with amazing speed and IOPS galore? Great! That’s not the only goal. Make sure it’s merely a dependable component of a solution that is manageable and meets the business requirements.

Going with a non-persistent desktop to save on storage space is ludicrous, especially with the prevalence of high performing storage platforms with block level deduplication available today and the amazing low TCO of HCI.

The idea that a non-persistent desktop is easier to manage because of the available tools is complete nonsense. There are risks and rewards for both choices. For example, there are plenty of medical use cases that are terrific fits for non-persistent desktop because of the lack of personalization and individuality required; most of the time the staff members use kiosks so a generic desktop is nothing new to them. However, take a senior corporate finance officer who has a critical Access 97 database with custom macros and software dependencies that took a VBA consultant 1 year to write, and who also has more manually installed applications his desktop support has ever seen and you’re going to be a sad person if you put then on a non-persistent desktop.

If you ask me, once the DR complexity for persistent desktops is fixed, it will drastically shift the conversation again.

As usual, if there are any pros and cons I forgot, feel free to sound off, argue, high five me, or not.

The Year of the VDI

I seriously cannot stand the phrase “The Year of the VDI”, and I’m pretty sure most of you reading this might be here because you feel the same way. In the past the notion has been very annoying to me because at each step we’ve solved portions of the infrastructure and application delivery problems, that now is the year for us to relax and push a giant VDI easy button and this will be the year that VDI finally “takes off” like it never has before. VDI is definitely increasing with adoption, with a large part of that due to the fact that more people are actually in the pool (HAH, no pun intended) rather than sitting on the pool chairs wondering if it’s actually a good idea to go swimming.

I seem to be having this conversation with customers more and more in regards to use cases. Let’s start with the basics about what a technology use case really is. I usually consider a use case to be how a technology can solve a specific business problem or a specific business enablement or technology optimization. A virtual desktop use case is not a compilation of user data in regards to OS, applications, devices, and desktop resource requirements. A good virtual desktop use case is a set of conditions which end up with a positive, efficient, rapid delivery EUC experience for the end user, and decreased operational overhead for IT. Simply put, VDI doesn’t fit everywhere, it’s the exact reason we should be providing a full palette of desktop delivery options and each has a supporting group of folks. For the purposes of this article, I don’t even take alternative presentation delivery methods into account, such as XenApp or RDSH, which can be combined with any of the options below.

Physical PC
Persistent Desktop
Near Persistent Desktop
Non-Persistent Desktop

For example, a beefy physical PC is a requirement for some engineers that require offline access to large drawings. It’s also a requirement for someone like myself who is a delivery consultant who travels and integrates with many different networks and customer environments. There are other examples of each use case I could give, but you guys get it. The point is the operational and functional requirements for each department, IT organization, security division are different. Knowing which bucket or combination of buckets to put the users in is a paramount skill that the project team must have.

Brian Madden has it spot on when he says that executives pretty much care about desktops just as much as they care about desks, lamps, office chairs, office space, cell phones; it’s just a part of the necessary equipment that employees need to get their job done. When a VDI project goes sideways, the operational value-add can often be questioned by the executive layer. For this reason, I don’t think there will ever be “A Year of the VDI”. Virtual desktop projects take the problems that you know and love on each individual piece of hardware desktop hardware and MOVE them to the datacenter, not remove them. For some use cases, they remove way more problems than they introduce, and for some use cases it’s vice versa.

There are some aspects of desktop management that VDI simplifies drastically, there are other things that it complicates and makes more work for the administrative team that didn’t exist otherwise, specifically any involvement of any UEM product. The year of the VDI? How about we take a year to learn how to manage traditional desktops first before we tackle the virtualization of all our desktop applications and operating systems. Here’s a screenshot of Netmarketshare’s desktop OS data, August, 2014 to June, 2015. Anybody notice that Windows XP actually gained market share at the end of last year and since then has only slowly tricked down?

For the shops that have it together and can place users in the correct use case buckets, keep it up. For the shops still leveraging outdated desktop operating systems, you need to take a long hard think about why you’re doing what you’re doing from an SDLC perspective. In my opinion, VDI will continue to gain adoption, but will continue to increase at a pace slower than server virtualization did. Capex benefits which were readily realized with server virtualization and consolidation, but VDI is primarily opex savings which is more difficult to measure and less justifiable to executives without an enablement initiative.

So what is my recommendation? Stop thinking about how to deliver desktops and focus on the delivery of Windows and applications, then work backwards to a capable solution needed to provide the services to the business, whether the users are leveraging physical desktops, virtual desktops or virtual servers. At that point you should start to see the use cases fall together in a much clearer fashion.

AppVolumes 2.9 – Near 0 RTO Multi-Datacenter Design Options

So this is more of me just thinking out loud on how this could be accomplished. Any thoughts or comments would be appreciated. Check Dale’s blog “VMware App Volumes Multi-vCenter and Multi-Site Deployments” for a great pre-read resource.

The requirements for this solution I’m working on are for a highly available single AppVolumes instance between two highly available datacenters with a 100% VMware View non-persistent deployment. A large constraint is that block level active/active storage replication is desired to be avoided, which is largely why UIA writable volumes is such a large piece of the puzzle; we need another way to replicate the UIA persistence without leveraging persistent desktops. CPA will be leveraged for global entitlements and home sites to ensure users only ever connect to one datacenter unless there is a datacenter failure. AppVolumes and Writable UIA volumes are to be utilized to get the near persistent experience. There can be no loss functionality (including UIA applications on writable volumes) for any user if a datacenter completely fails. RTO = near 0, RPO = 24 Hours for UIA data. In this case assume <1ms latency and dark fiber bandwidth available between sites.

Another important point is that as of AppVolumes 2.9, if leveraging a multi-vCenter deployment, the recommended scale per the release notes is 1,700 concurrent connections per AppVolumes Manager, so I bumped up my number to 3 per side.

So, let’s talk about the options I’ve come up with leveraging tools in the VMware toolchest.

Option 1a: Leveraging AppVolumes storage groups with replication
Option 1b: Leveraging AppVolumes storage groups with replication and transfer storage
Option 2: Use home sites and In-Guest VHD with DFS replication
Option 3a: Use home sites and native storage replication
Option 3b: Use home sites and SCP/PowerCLI scripted replication for writable volumes, manual replication for appstacks
Option 4: Use a stretched cluster with active/active storage across the two datacenters

Option 1a: Leveraging AppVolumes Storage groups with Replication

The problem is AppVolumes does not allow storage group replication two different vCenter servers, and having multiple Pod vCenters is a requirement for the Always-On architecture. I read that in this community post, the error message in the System Messages reads: “Cannot copy files between different vCenters”, and a fellow technologist posted up recently and said he tried it with 2.9 and the error message still exists. So in short, Dale’s blog remains true in that storage groups can only replicate within a single vCenter, so this isn’t a valid option for multi pod View.

Option 1b: Leveraging AppVolumes Storage groups with Replication and Transfer Storage

So that’s a another pretty good workaround option. Requires a native storage connectivity between the two pods, which means there must be a lot of available bandwidth and low latency between the sites, as some users in POD2 might actually be leveraging storage in POD1 during production use. Providing you can get native storage connectivity between the two sites, this is a very doable option. In the event of a failure of POD1, the remaining datastores in POD2 will sufficiently handle the storage requests. Seems like this is actually a pretty darn good option.

** EDIT: Except that storage groups will ONLY replicate AppStacks, not writable volumes per a test in my lab, so that won’t work. Also there’s this KB stating that to even move a writable volume from one location to another, you need to copy it, delete it from the manager, re-import it and re-assign it. Not exactly runbook friendly when dealing in the thousands.

Option 2: Use Home Sites and In-Guest VHD with DFS Replication

The problem with option 2 is In-Guest VHD seems like a less favorable option given that block storage should be faster than SMB network based storage, particularly when the software was written primarily to leverage hypervisor integration. I’m also not sure this configuration will even work. When I tried it in my lab, the AppVolumes agent kept tagging itself as a vCenter type of endpoint, even though I wanted it to use In-Guest mode. Tried to force it to VHD without luck. So I’m pretty sure this option as well is out for now, but the jury is still out on how smart this configuration would be compared to other options. Leveraging DFS seems pretty sweet though, the automatic replication for both AppStacks and Writables seems like a very elegant solution, if it works.

Option 3a: Use Home Sites and native storage replication

The problem with option 3 is the RTO isn’t met and it adds recovery complexity. From an AppVolumes perspective it is the easiest to carry out, as the writable volume LUNs can be replicated and the AppStack LUNs can be copied manually via SCP or the datastore browser without automatic replication. As it stands right now, I believe this is the first supported and functional configuration in the list, but the near 0 RTO isn’t met due to storage failover orchestration and appvolumes reconfiguration.

The aforementioned KB titled Moving writable volumes to new location in AppVolumes and some extra lab time indicates to me that in order to actually fail over properly, the Writable volume (since it is supposed to exist in one place only, and the datastore location is tied to the AppVolumes configuration in the database) would need to be deleted, imported and reassigned for every user of writable volumes. Not exactly a quick and easy recovery runbook.

Option 3b: Use Home Sites and SCP/PowerCLI scripted replication for writable volumes, manual replication for appstacks

This is basically option 3, but instead of leveraging block level replication, we could try to use an SCP/PowerCLI script that would copy only the writable volumes and metadata files from one vCenter to the other in one direction. Pretty sure this would just be disastrous. Not recommended.

Option 4: Use a stretched cluster with active/active storage across the two datacenters for all app stacks and writable volumes

This pretty much defeats the whole point, because if we’re going to leverage a metrocluster we can simply protect persistent desktops with the same active/active technology and the conversation is over. It does add complexity because a separate entire pod must be leveraged so that the vCenter/Connection servers can failover with the desktops, which is a large amount of added complexity. A simplified config would be just to leverage active/active storage replication for the AppVolumes storage, but that’s super expensive for such a small return. At that point might as well put the whole thing on stretched clusters.

EDIT: But then again, near zero RTO dictates synchronous replication for the persistence. If the users’ UIA applications cannot be persisted via writable volumes, they must be persisted via Persistent desktops, which will end up requiring the same storage solution. I’ve pretty much ended up back here, which is right where I started from a design standpoint.

So! If any of you guys have any input I’d be interested to hear about it. The real kicker here is trying to avoid the complexity block level active/active storage configuration. If we remove that constraint or the writable volumes from the equation, this whole thing becomes a lot easier.

EDIT: The active/active complexity for near 0 RTO can’t be avoided just yet in my opinion. At least with option 4 we would only be replicating VMDK files instead of full virtual machines. To me that’s at least a little simpler, because we don’t have to fail over connection brokers and a separate vCenter server just for managing those persistent desktops.

Entering VSAN Maintenance Mode Hangs at 65%

Ran into a weird situation in my lab where entering maintenance mode from the Web Client while attempting doing a full data evacuation WHILE there’s a failed disk in the disk group. It looked like it was unable to enter maintenance mode because one VM couldn’t get moved off the host, the reason? I’m not exactly sure.

Short very un-completely analyzed answer for me was either don’t do a full evacution, or vmotion the VMs to a different host BEFORE attempting to evacuate it. My situation was also weird because I had 1 out of the 2 magnetic disks in a failed state in the single disk group in the host that I was attempting to move away from.

I canceled the task after it ran for 12 hours, moved the powered off VM with a normal vmotion, then entered maintenance mode AOK. Sorry for the complete lack of detail, but nothing came up on teh interwebz for VSAN maintenance mode hanging at 65%.

RSAT Tools on an App Volume?

App Volumes is probably one of the coolest technologies out right now. When VMware bought them, the first thing I thought is that we now finally have the means to control the virtual desktop, soup to nuts. We can make minute changes, have versioning, and quickly react to new challenges presented to us in various use cases.

Administrators for a long time have wanted the granularity of complete control of applications on a desktop, but that control leads to issues when managing at scale with traditional architecture. App Volumes gives us the ability to get down and dirty. To try various setups and ultimately allow for the flexibility of layering applications. But what happens if what we want to put in an App Volume isn’t an application at all? What if you want to enable operating system features like Microsoft Remote Server Administration Tools?

Recommended practice for your provisioning machine is to keep it as clean as possible, only putting the bare essentials in the OS so that the app layer you are trying to capture only contains the application information. Unfortunately with RSAT Tools, there is no application to install, but rather features that you can enable with a Windows Update that can be found here: http://www.microsoft.com/en-us/download/details.aspx?id=7887

To enable RSAT Tools in an App Stack, install the above Windows Update on both the provisioning PC and the Gold Image that is used as the parent machine for your vdi pool. Create a new App Stack or update an existing Stack and attach it to the provisioning machine. Head over to the Control Panel, open Programs and Features, and click on Turn Windows features on or off. Under Windows Features, choose Remote Server Administration Tools, and then enable the features you want users that will subscribe to the App Stack to have. At this point, you can finish the provisioning, and reboot the provisioning machine. The only thing left to do is attach the App Stack to a user desktop and test the functionality.

~ DJ Gillit, VCP5-DCV, VCP5-DT, VCP-NV

Follow me on Twitter: @djgillit