Why I Chose vSphere Replication Over SRM
If you’re managing a cluster of 8–10 ESXi hosts like I am, the hardest problem you’ll face is Disaster Recovery (DR). When a Datacenter (DC) goes down, how do you get systems back online as fast as possible? VMware Site Recovery Manager (SRM) is the gold standard, but its license cost — several thousand USD per year — is a massive barrier for mid-sized organizations.
After six months running it in a production environment, I realized that vSphere Replication (VR) is a far more practical option. It’s bundled with vSphere Essentials Plus and above. VR supports VM synchronization between two sites with intervals ranging from 5 minutes to 24 hours. Best of all, you don’t need identical or expensive storage systems at both ends.
VR operates at the hypervisor level, so it doesn’t care whether the underlying storage is SAN, NAS, or local disk. As long as two vCenter instances can reach each other over the network, you’re ready to deploy. If you haven’t set up a central management instance yet, the VCSA 8.0 installation guide is a good starting point before proceeding.
Deploying the vSphere Replication Appliance
Start by downloading the VR Appliance OVF from the Broadcom Portal (formerly VMware). Make sure to check the Interoperability Matrix carefully to confirm the VR version matches your current vCenter version.
Step 1: Deploy the Appliance
In your primary site’s vCenter, right-click the Cluster and select Deploy OVF Template. The process is similar to creating a new VM, but there are 3 settings you absolutely must not get wrong:
- IP Address: Always use a static IP. If you use DHCP and the appliance’s IP changes later, all replication connections will break.
- Password: Used for the admin account on the management page at port 5480.
- Resources: The default configuration of 2 vCPU / 4 GB RAM can manage up to 2,000 VMs (on version 8.0). For smaller environments, this is more than enough.
Step 2: Pair the Two Sites
Once the appliance is deployed at both ends, open the vSphere Client and navigate to Site Recovery -> Open Site Recovery. Select New Site Pair to connect vCenter Site A and Site B.
# Make sure to open the required firewall ports before pairing
# Port 8043: VR management
# Port 31031, 44046: Replication data transfer
telnet <Remote_VR_IP> 8043
When you see a green checkmark with a “Connected” status, both sites have successfully established the pairing.
Configuring Replication for Individual VMs
In practice, not every VM needs to sync every 5 minutes. Here’s how I typically classify them:
- Right-click the VM you want to protect -> Site Recovery -> Configure Replication.
- Target Site: Point to the recovery site you’ve already paired.
- Replication Settings:
- RPO (Recovery Point Objective): For critical databases, I set this to 15 minutes. For file servers or static web servers, 1–4 hours is a reasonable tradeoff to conserve bandwidth.
- Point in Time (PIT) Instances: This feature is incredibly valuable. It retains historical snapshots (up to 24 points). If a VM gets hit by ransomware, you can roll it back to a clean state from before the infection. This pairs well with a solid VMware backup and restore strategy for comprehensive protection.
- Network Compression: Enable this if the WAN link between your two sites is limited. It can reduce the amount of data transmitted by 30–50%.
Pro tip: If a VM’s disk is several terabytes, don’t try to sync it over the network from scratch. Use the Seed feature instead. Copy the vmdk file to the recovery site manually via an external drive, then configure VR to point to that file — it will only sync the changed data (delta) going forward. For datastores running thin-provisioned disks, keep in mind that converting thin to thick provisioning can significantly affect both replication size and I/O performance at the recovery site.
Monitoring with PowerCLI
Instead of manually clicking through each VM to check its status, I use a PowerCLI script to automatically send a status report every morning. This ensures I’m never caught off guard by a VM that silently failed to replicate. If you want to go further, automating VMware management with PowerCLI covers a range of practical scenarios beyond just replication monitoring.
# Connect to vCenter
Connect-VIServer -Server vcenter.yourdomain.com
# Check replication status
Get-VDXReplication | Select-Object Name, State, LastSyncTime, RPOStatus | Format-Table
# Disconnect
Disconnect-VIServer -Confirm:$false
Hard-Won Lessons from Running This in Production
After many DR drills, here are the three most important lessons I’ve learned:
- IP Conflicts: When you power on a VM at the recovery site, it retains its original IP address. If the two sites are on different subnets, you must configure IP Customization to automatically reassign the IP. Without this, the VM will boot but services will be unreachable.
- Snapshot Cleanup: VR doesn’t play well with VMs that have long-lived snapshots. Always commit snapshots before enabling replication to avoid the frustrating “Generic Error” message. For a deeper look at managing snapshots safely, see the VMware clone and snapshot guide.
- Bandwidth Planning: A database VM writing at 10 MB/s requires a significant WAN link. Always calculate your data change rate before setting an aggressive RPO.
To sum it up, vSphere Replication is not a full replacement for a backup solution, but it is an extremely reliable last line of defense. For organizations that can’t justify the cost of SRM, VR is the pragmatic choice to keep systems resilient against any disaster.
