All About High Availability (HA),Requirements,fundamental components,Configuring

Here i will tell you All about

High Availability (HA),Requirements,fundamental components,Configuring

  1. Define HA and its Requirements.
  2. Components of HA with Diagram.
    • vCenter Server
    • Hostd and VPXA
    • FDM Agent
  3. Explain the fundamental components
      • Files for both Master and Slave.
      • Explain protectedlist file.
      • Responsibilities of Master and Slave Agents.
      • Election process of Master.
      • Master/ Slave Agents.
        • a) Remote files and b)  Local files
      • Datastore Heartbeating.
      • Election process of datastores for heartbeating.
      • Administration selection configuration for datastore heartbeating.
      • Isolated and partitioned network with diagram.
      • Virtual machine protection and unprotected workflow.
  4. Configuring vSphere HA
    • Admission control and over commitment in HA enabled cluster.
    • Admission control policy.
    • VM options.
    • VM Monitoring.
    • VM Restart Priority and Order.
  5. HA failure Scenarios
    • Host failure scenarios.
    • Guest Operating System failure scenario.
    • Application failure scenarios.
  6. Components of HA
      • vCenter Server
      • Hostd and VPXA
      • FDM Agent or HA Agent


vCenter Server:- vCenter is responsible for many tasks in respect with HA

  • Deploying and configuring HA agents.
  • Protection of VMs
  • Communicates cluster configuration changes to the Master host.

Hostd and VPXA:-

Host D is the most crucial component of an ESXi host and VPXA is a management agent of vCenter both are installed in an ESXi .FDM agent relies on host D agent to get the information about the list of all the VMs that are registered to that host if in case host d is not operational FDM agent stops/pauses all the functions and waits till the host D agent becomes available and operational.

FDM agent:-

FDM stands for Fault Domain Manager that replaced AAM i.e. (Automatic Availability Manager) which was there in the earlier versions of vSphere 5.0 FDM uses the concept of agent running on an ESXi host and is separate and decoupled from vCenter server management agent i.e. VPXA. The FDM offers multiple improvements over AAM are as follows:-

  1. FDM supports master and slave architecture and doesn’t rely on primary / secondary host designations.
  2. It supports IPV6.
  3. FDM addresses the issues of network partitioning and network isolation.
  4. FDM uses both management network and storage devices for communication.

Explain the following fundamental concepts

  1. Master / Slave agents.
  2. Heart beating.
  3. Isolated and partitioned network.
  4. Virtual Machine protection

1. Master and Slave Agents :-

Once the HA is enabled in a cluster ,HA agents participate in the election of the master and once the master is elected all the other hosts having the management connectivity with it are considered as slaves connected to that master. Master is also elected in the situations given below :-

  • Fails
  • Disconnected from vCenter.
  • Isolated or partitioned network
  • If the master is in maintenance mode or in standby mode.
  • If HA is reconfigured in a cluster

The master host election takes about 15 secs and is conducted using UDP port. HA won’t react to any failure during the master election once the master is elected failures occur during or after is taking care by master.

Master is responsible for the following tasks

  • Monitors the slave hosts and will restart the VMs in the event of slave host failure.
  • Monitors the VMs power state that are protected by HA if the protected VMs failed, HA will restart the VMs.
  • Master manages the list of protected VMs in the cluster and also update the list each time the user initiates power ON and power OFF operation. These are requested by vCenter server to protect and unprotect the VMs.
  • vCenter notifies and informs the master about the changes of the cluster configuration.
  • Master sends the heartbeat to the slave hosts so that the slaves know the master is alive.
  • vCenter typically communicates with the master only in some circumstances vCenter has to communicate with the slave hosts.
  • When master is Isolated or Partitioned from the network or master informs vCenter that slave hosts are not reachable.

Slaves responsibilities are as follows 

  • Slaves monitors the VMs power state/ running state of all the protected VMs that are running locally and if any changes occur ,slave will inform it to the master host.
  • Slave host monitors the master health status. If the master fails, the slave participates in the election of the new master host.

Election process of Master

The host that is participating in the election process having greatest number of datastores connected will be elected as master but if in case there are more than one or more host having the equal number of datastores connected then the one having the greatest Managed Object ID (MOID) will be chosen. This is done Lexically, means 99 beats 100 as 9 is greater than 1. You can see the HA host status in the summary tab.

After the master is elected, all the other hosts that are having management network connectivity with it will setup a single, secured , encrypted TCP connection to the master. This secured connection is SSL based. Typically the slaves do not communicate with each other unless reelection of a master needs to take place . When the master is elected , the master will try to acquire the ownership all of the datastores that it can directly access or can access by proxying requests to one of the slave hosts connected to it  through  the management network .It does this by locking a file called “protected list”.

What is protectedlist File?

Master uses this file to store the inventory. It stores the list of all the protected VMs by HA and it also stores the CPU reservation information and memory overhead .The master distributes this protected list file to all the datastores that are in use by the VMs in the cluster.

Protected list naming format and location of the file :-

/<root of datastore>/vSphere HA/<cluster specific directory>/Protected list

How cluster specific directory is constructed :-

<UUID of vCenter server>-<number part of MOID>-<8 char string>-<name of the running vCenter Server>

What happens if master fails or isolated from the network?

If the master fails the lock will expire and the new master will relock the file if the datastore is accessible to that new master. Master will release the lock from the file in the datastore to ensure that the new master will relock the file and can determine what all the VMs are protected by HA by reading the file .If the master fails and becomes isolated from the network, the restart of the VMs will be delayed until a new master has been elected.

The process of reelection of the master is as follows :-

If in case slave does not receive any network heartbeat ,the slaves will try to reelect the master .The new master will re-lock the protected list file and will get all the information after which it will initiate the restart of the VMs accordingly within 10 secs.

What happens if slave host fails or isolated from the network?

          Master will determine which VMs need to be restarted and when the VMs need to be restarted , the master is responsible for VMs placement and use placement engine that will try to distribute the VMs to be restarted across all the available hosts.

Files for both Master and Slave :-

Master and Slave use files not only to store the VMs state but also as a communication mechanism. Like protectedlist file used by the master to store the list of protected VMs similarly these files are created by both master and slave.

Remote Files :-It is stored in the shared datastore and not locally. Remote files are “PoweredON” files  that are stored in per host. “PoweredON” files are not for only tracking power ON state for the VMs but also for informing the master or notifies the master that the slave host is isolated from the network .If the file contains 0 means host is not isolated from the network and if carries 1 then host is isolated from the network .The master inform the vCenter about the isolation of the hosts.

Local Files :- when HA is configured for the host , host will store the specific information about its cluster locally .Each host including master stores some specific data locally. These files are not human readable.

  • ClusterConfig
  • Compactlist
  • Hostlist
  • Vmmetadata
  • cfg

2. Heartbeating :-

          Datastore heartbeating enables a master to determine whether the host has failed or isolated from the network.

           By default HA selects two datastores for haertbeating , you can configure for more than two datastores for heartbeating but it is highly not recommended from the advanced option .Let vCenter deal with this operation .vCenter uses selection algorithm to select the datastore heartbeat that are visible to all the hosts. Datastore selection process gives preference to the following :-

  • Datastore that are visible to all of the hosts or if not as many as possible.
  • VMFS datastore rather than NFS datastore.
  • Datastores that comes from different LUNs or NFS.

HA provides three different settings for administrator to select the datastore for heartbeating :-

  • Automatically select the datastore accessible from the host and manual selection of datastore from only the preferred list is disabled.
  • Use datastore from only the preferred list and if any of the datastore becomes unavailable HA will not perform any HA datastore heartbeating through a different datastore.
  • Use the datastores from the preferred list and complement automatically if needed .This means use from the preferred list first but if any of the datastore becomes unavailable then it can choose a different datastore that are available until the datastore from the preferred list becomes available.

3. Isolated Versus Partitioned Network :-

          HA uses management network as well as storage device for communication .When master cannot communicate with the slaves across management network, HA uses datastore heartbeating to determine whether the host has failed or isolated from the network . Here the functionality of isolated and partitioned network role comes into play :-

Partitioned Network:-

Where one or more slaves cannot communicate with the master across management network even though they have still the connectivity with the other slaves. In this case HA uses datastore heartbeating to determine whether the hosts are still alive or the master need to take the appropriate action to protect the VMs running on the partitioned hosts or need to initiate an election of a new master within the partitioned network. Lets say you have four segments in your network, each partitioned segment will elect its own master in your cluster which means you will be having four masters in your network. When the network is corrected any of the four master will take over the role and responsible for the cluster again.

Isolated Network :-

          Where one or more hosts have lost all the management network connectivity is called as Isolated network. Isolated hosts neither can communicate with the master host nor can communicate with the slave hosts.In this slave hosts uses datastore heartbeat to notify the master that it is isolated. Slave hosts uses the special binary file i.e. HOST-X-PowerON file to notify the master .The master can then take the appropriate action to protect the VMs.

4. Virtual Machine Protection :-

  • The virtual machine protection workflow and VM unprotected workflow is given below. For VM protection and unprotected both must be updated on protectedlist file.
  • VM protection process has been change in vSphere 5.0 but in the earlier versions of vSphere was handled by vpxd which was notified by AAM(Automatically Availability Manager) through a vpxa module called vmap. Vpxd is a vCenter server agent that is installed on vCenter itself and vpxa is the management agent of vCenter which is installed on each host. In vSphere 5 the protection of VMs is the ultimate responsibility of vCenter server.
  • When the state of a VM changes , vCenter server will direct the master to protect or unprotect the VM, in other words the master is directed by the vCenter to enable or disable the HA protection for the VMs. Protection is only guaranteed when the master has committed the change of state to disk.

SandeepKaushik and ShaswatiMukherjee