
Search Results
237 results found with an empty search
- AWS 101
This blog article would talk about various services being offered by AWS AWS Global Infrastructure AWS Region & Availability Zones Each region is a separate geographic area. AWS Region has multiple, isolated locations known as Availability Zones. An availability zone is a datacenter Edge Locations Edge Locations are endpoints for AWS which are use for caching content. Typically this consists of CloudFront, Amazon's content delivery Network ( CDN) There are many more Edge Locations than Regions Compute Services EC2 : Stands for Elastic Compute Cloud. These are virtual machines inside AWS platform EC2 Container Services Run and Manage docker containers at scale Elastic Beanstalk Elastic Beanstalk is one layer of abstraction away from the EC2 layer. Elastic Beanstalk will setup an "environment" for you that can contain a number of EC2 instances, an optional database, as well as a few other AWS components such as a Elastic Load Balancer, Auto-Scaling Group, Security Group. Then Elastic Beanstalk will manage these items for you whenever you want to update your software running in AWS Lambda AWS Lambda is a server-less compute service that runs your code in response to events and automatically manages the underlying compute resources for you. You can use AWS Lambda to extend other AWS services with custom logic, or create your own back-end services that operate at AWS scale, performance, and security. Lightsail Lightsail is a lightweight, simplified product offering -- hard disks are fixed size EBS SSD volumes, instances are still billable when stopped, security group rules are much less flexible, and only a very limited subset of EC2 features and options are accessible. Batch Used for batch computing in cloud STORAGE SERVICES S3 Oldest Storage Services. Simple Storage Service commonly referred to S3. It's Object based Storage. User's upload their files into buckets. EFS Elastic File System , basically a Network Attached Storage, users can store their data into EFS volumes and map/mount them to multiple EC2 instances or virtual machines. Glacier Used for data archival. Snowball Used to migrate large amounts of data to AWS datacenter. Storage Gateway Virtual Appliances which we install in our on-premise datacenter which would replicate data to AWS infrastructure. DATABASE SERVICES RDS Relational Database Services ( MySQL , Aurora , SQL , Oracle ) DynamoDB Non-Relational database services ElasticCache Fully managed in-memory data store and cache service. RedShift Service used for Data Warehousing or Business Intelligence. MIGRATION SERVICES AWS Migration Hub Tracks user's applications as it's being migrated to AWS. Application Discovery Service Learns what applications we have and discover's it's dependency as well. Database Migration Service Easy to migrate database from On-Premise to AWS. Server Migration Service Migrate Physical and Virtual On-Premise servers into AWS. Snowball Write large volumes of On-Premise data into physical volumes and then send it to AWS NETWORKING AND CONTENT DELIVERY VPC Known as Virtual Private Cloud. It helps you create Firewalls , Availability Zones, Network Cider Address Ranges , Route tables and ACL's CloudFront Amazon's content delivery network Route 53 DNS service API Gateway Create your own API for service integrations Direct Connect Direction connect to transfer data from On-Premise to AWS datacenter DEVELOPER TOOLS CodeStar Cloud-based service for creating, managing, and working with software development projects on AWS. You can quickly develop, build, and deploy applications on AWS with an AWS CodeStar project. An AWS CodeStar project creates and integrates AWS services for your project development toolchain CodeCommit Stores code which is developed into private repository CodeBuild Once code is ready, this service would compile for you and would make it ready for deployment CodeDeploy Used to automate application deployments CodePipeline Continuous delivery service to model , visualize and automate application releases X-Ray Debug service to analyze and troubleshoot to find root-cause of issues and identify performance bottle-necks
- ! Error searching for Sync Results from Storage
When you browse directories in vRA UI and click on Sync Log , are you seeing a message stating ! Error searching for Sync Results from storage This could happen if elasticsearch service on your Master / Sync Replica/ Potential Replica is "not running" as expected. Elasticsearch is only used for Sync log in vRA Login into each of your vRA appliances and execute command to verify status of elasticsearch service elasticsearch status If Status says "Not running (but PID file exists)" Restart elasticsearch service and see if it helps `service elasticsearch restart` If the above step fails then you could try Removing the PID rm /opt/vmware/elasticsearch/elasticsearch.pid Start the service - service elasticsearch start Above steps should help you fix the problem. You may also perform following health check to verify if elasticsearch on all the nodes are in healthy state ssh to each appliance and then execute below command curl -k http://localhost:9200/_cluster/health?pretty=true #vRealizeAutomation
- Increasing /storage/log partition size inside a vRealize Automation appliance
Follow below steps to successfully increase existing partition size of /storage/log of a vRealize Automation appliance Pre-Requisites Full backup or clone of vRA nodes This activity must not be performed during business hours or weekdays. Downtime has to be taken and perform under a scheduled change Note : If you have a HA setup, then start this activity on a secondary nodes Procedure Select vRA appliance using vSphere Client and add an additional disk I am adding a 20 GB disk for this example Run df -h and make a note of existing /storage/log partition name , if you go by defaults it must be /dev/sdb1 Run this command to manually scan for a new device echo "- - -" > /sys/class/scsi_host/host0/scan Check if new disk is seen in the VM by running command "dmesg" Run fdisk /dev/sde (considering /dev/sde) and start recreating the partition following the example shown in the screeshot Once above step has been completed , execute command " partprobe " , This would commit any changes made to kernel Now create a filesystem executing mkfs -t ext3 /dev/sde1 ( In my case it's /dev/sde1 , it might vary if you have added additional disks to vRA appliance previously ) Execute following command mkdir /tmp/log/ mount -t ext3 /dev/sde1 /tmp/log/ df -h Stop all services using commands vcac-vami service-manage stop vco-server vcac-server horizon-workspace elasticsearch service vpostgres stop Then execute cp -a /storage/log/* /tmp/log cd / mount Post that execute command sed -i -e 's#/dev/sdb1#/dev/sde1#' /etc/fstab (Make sure you use the correct Filesystem here) Now reboot vRA appliance Validate /storage/log partition once and check if application logging is working as expected !! Hope this helps !!
- Timed out while waiting for Event Broker response
Came across a problem recently where provisioning and reconfiguration of virtual machines through vRealize Automation were failing Exceptions seen in catalina.out 2019-06-02 09:53:25,643 vcac: [component="cafe:event-broker" priority="INFO" thread="event-broker-service-taskExecutor8" tenant="" context="" parent="" token=""] com.vmware.vcac.core.event.broker.integration.PublishReplyEventServiceActivator.onApplicationEvent:96 - Message Broker unavailable[internal stop] 2019-04-02 09:53:25,711 vcac: [component="cafe:console-proxy" priority="WARN" thread="Grizzly(1)" tenant="" context="" parent="" token=""] com.vmware.vcac.platform.event.broker.client.stomp.StompEventSubscribeHandler.handleException:493- Error during message processing: session:[f4329ebb-4e2e-7690-8b6a-3f420c8bd226], command[null], headers[{message=[Connection to broker closed.], content-length=[0]}], payload [{}]. Reason : [Connection to broker closed.] 2019-04-02 09:53:25,712 vcac: [component="cafe:console-proxy" priority="ERROR" thread="Grizzly(1)" tenant="" context="" parent="" token=""] com.vmware.vcac.core.service.event.ServerEventBrokerServiceFacade.handleError:337 - Error for command 'null', headers: '{message=[Connection to broker closed.], content-length=[0]}'java.lang.Exception: Connection to broker closed. Above exceptions clearly state the problem is with messaging broker which is rabbitmq Performing rabbitmq reset's and then adding second or third node ( if available ) to the master would eventually resolve the problem. After fair bit of research under rabbitmq logs we see on node psvra01.nukescloud.com: =INFO REPORT==== 11-Jun-2019::16:31:41 === rabbit on node 'rabbit@psvra03.nukescloud.com' down =INFO REPORT==== 11-Jun-2019::16:31:41 === Keep rabbit@psvra03.nukescloud.com listeners: the node is already back on node psvra03.nukescloud.com: =INFO REPORT==== 11-Jun-2019::16:54:12 === rabbit on node 'rabbit@psvra01.nukescloud.com' down =INFO REPORT==== 11-Jun-2019::16:54:12 === Keep rabbit@rabbit@psvra01.nukescloud.com listeners: the node is already back ... =INFO REPORT==== 11-Jun-2019::18:55:09 === rabbit on node 'rabbit@rabbit@psvra01.nukescloud.com' down =INFO REPORT==== 11-Jun-2019::18:55:09 === Keep rabbit@rabbit@psvra01.nukescloud.com listeners: the node is already back Above snippets clearly show network partitions / issues happening. Rabbitmq does not tolerate network partitioning events and does not recover from them properly. Executing below command would help in somewhat resilient in case of these network partitions rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic","ha-promote-on-failure":"always","ha-promote-on-shutdown":"always"}' New versions of vRealize Automation have mechanisms in place to detect this sort of issues and attempt an automated recovery The command mentioned above will help to certain extent but there has to be 100% available and redundant network available between vRealize Automation nodes.
- The number of nics you requested are more than specified in the blueprint that was used to provision
Reconfiguring a virtual machine managed by vRealize Automation fails with an exception stating "The data specified within the request is invalid.The number of nics you requested are more than specified in the blueprint that was used to provision this machine" In vRealize Automation 7.5 , when any Day-2 actions are performed ( Change of CPU / Memory , Network etc.. ) , the network size is always checked against the blueprint maximum allowed for the VM. Above exception is thrown when a given VM has more than the maximum nics allowed. vRA will not allow the max nics to be exceeded at any point in time. Even when vRA is not performing any Day-2 action on the nics , it still performs the network size check and throws an exception. This behaviour has been changed in vRealize Automation 7.6 In this version a network check is ONLY performed when a network change is being performed through Day-2 actions. If other attributes are being changed other than networks then the network size check wouldn't be performed.
- After upgrading vRA from 7.5 to 7.6 several services show up as unavailable in the VAMI
After upgrading vRA from 7.5 to 7.6 , following services might show up as unavailable advanced-designer-service - UNAVAILABLE o11n-gateway-service -UNAVAILABLE shell-ui-app - UNAVAILABLE vco - null Going through logs you might see following exception com.vmware.o11n.sdk.exception.RestApiException: com.vmware.o11n.service.version.ContentVersionException: org.springframework.dao.IncorrectResultSizeDataAccessException: query did not return a unique result: 2; nested exception is javax.persistence.NonUniqueResultException: query did not return a unique result: 2 To resolve, check the vmo_contentversioncontrol table in vRA's Postgres database to see if there is a duplicate __SYSTEM record. if so, remove the one with id = 2 This is how that table should look Once the second id is removed , services should start properly.
- vRA portal "Machine" tab exception
We get following exception by clicking on Machine tab on vRA portal Error while retrieving resources from provider [Infrastructure Service] for resource type [Machine]. /var/log/vmware/vcac/catalina.out throws following exception 2018-03-15 09:25:31,027 vcac: [component="cafe:catalog" priority="INFO" thread="tomcat-http--37" tenant="vsphere.local" context="qXyeQEHe" parent="" token="qXyeQEHe"] com.vmware.vcac.catalog.controller.consumer.ConsumerResourceController.getResourcesByResourceType:267 - Retrieving resources for type [Infrastructure.Machine], managedOnly [true], withExtendedData[true], withOperations[true] 2018-03-15 09:25:31,070 vcac: [component="cafe:catalog" priority="ERROR" thread="tomcat-http--37" tenant="vsphere.local" context="qXyeQEHe" parent="" token="qXyeQEHe"] com.vmware.vcac.catalog.service.impl.ConsumerResourceServiceImpl.extendResourcesWithProviderData:803 - Error retrieving Resources from provider. Reason:org.springframework.web.client.HttpClientErrorException: 404 at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:91) ~[spring-web-4.2.8.RELEASE.jar:4.2.8.RELEASE] at com.vmware.vcac.platform.rest.client.error.ResponseErrorHandler.handleError(ResponseErrorHandler.java:61) ~[platform-rest-client-7.2.0-SNAPSHOT.jar:?] at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:641) ~[spring-web-4.2.8.RELEASE.jar:4.2.8.RELEASE] at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:597) ~[spring-web-4.2.8.RELEASE.jar:4.2.8.RELEASE] at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:572) ~[spring-web-4.2.8.RELEASE.jar:4.2.8.RELEASE] at org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:280) ~[spring-web-4.2.8.RELEASE.jar:4.2.8.RELEASE] at com.vmware.vcac.platform.rest.client.impl.RestClientImpl.get(RestClientImpl.java:313) ~[platform-rest-client-7.2.0-SNAPSHOT.jar:?] at com.vmware.vcac.platform.rest.client.services.AbstractService.get(AbstractService.java:67) ~[platform-rest-client-7.2.0-SNAPSHOT.jar:?] at com.vmware.vcac.core.componentregistry.rest.client.service.RegistryService.getEndPoint(RegistryService.java:141) ~[component-registry-client-rest-service-7.2.0-SNAPSHOT.jar:?] at com.vmware.vcac.core.componentregistry.rest.client.SolutionRestClientManager.getEndpoint(SolutionRestClientManager.java:166) ~[component-registry-client-rest-service-7.2.0-SNAPSHOT.jar:?] at com.vmware.vcac.core.componentregistry.rest.client.SolutionRestClientManager.credentialPropagatingRestClientByService(SolutionRestClientManager.java:209) ~[component-registry-client-rest-service-7.2.0-SNAPSHOT.jar:?] at com.vmware.vcac.catalog.provider.gateway.impl.ProviderResourceGatewayImpl.createRestClient(ProviderResourceGatewayImpl.java:134) ~[classes/:?] at com.vmware.vcac.catalog.provider.gateway.impl.ProviderResourceGatewayImpl.getResourcesFromProvider(ProviderResourceGatewayImpl.java:100) ~[classes/:?] at com.vmware.vcac.catalog.service.impl.ConsumerResourceServiceImpl.extendResourcesWithProviderData(ConsumerResourceServiceImpl.java:801) [classes/:?] at com.vmware.vcac.catalog.service.impl.ConsumerResourceServiceImpl.getResourcesByType(ConsumerResourceServiceImpl.java:307) [classes/:?] Before getting into resolution section it's mandatory backup vRA vPostgres database Also ensure snapshots are taken on vRA components , including IAAS servers Resolution After connecting to vRA database execute following query select * from cat_provider; Verify if you have two com.vmware.csp.iaas.blueprint.service providertype_id's Check the time mentioned under lastsync column. This would give an idea on what's the latest id for com.vmware.csp.iaas.blueprint.service Now make note of two IaaS proxy services Two Iaas Proxy services - Old one: XXXX New one: YYYY Please perform the following steps to clean up: Verify the states of the database is still intact. select * from cat_provider where name ='{com.vmware.csp.component.iaas.proxy.provider@iaasservice.name}'; Expected result: Two entries are still there: XXXX and YYYY select * from service_info where name = 'iaas-service'; Expected result: Only one entry returned: YYYY If the results in previous step what are not expect then please STOP. Otherwise proceed to next Step Backup vRA vPostgres Database ( Skip if you have done this before ), before starting this procedure ssh to vRA appliance where DB is present using root account change directory psql location cd /opt/vmware/vpostgres/current/bin/ Switch to postgres using command "su postgres" Start psql by command "./psql" Connect to "vcac" database by command "\connect vcac" With above commands you now have psql session connecting to vRA Cafe vcac database. Run the following sql commands: delete from cat_catalogitem where provider_id='YYYY'; update cat_catalogitem set provider_id='YYYY' where provider_id='XXXX'; update cat_catalogitemtype SET provider_id='YYYY' WHERE provider_id='XXXX'; update cat_resource set provider_id='YYYY' where provider_id='XXXX'; update cat_request set provider_id='YYYY' where provider_id='XXXX'; update cat_requestcomponent set providerid='YYYY' where providerid='XXXX'; update comp_bprequest set callbackserviceid='YYYY' where callbackserviceid='XXXX'; update comp_component_res set provider_id='YYYY' where provider_id='XXXX'; update comp_componenttype set serviceid='YYYY' where serviceid='XXXX'; update comp_comprequest set provider_id='YYYY' where provider_id='XXXX'; delete from cat_provider where id = 'XXXX'; Now after cleaning up database , machine's tab and item's tab should now be functional If in doubt in any of these above steps get in touch with VMware Support #vRealizeAutomation
- Resource not found for the segment 'VirtualMachineExts'
Requests fail during allocation phase with exception catalina.out ( vRA Appliance ) com.vmware.vcac.iaas.service.impl.BaseCompositionRequestServiceImpl.processAsyncAllocationTask:737 - Allocation request [Composition RequestId: [XXXX-XXX], CompTypeId: [Infrastructure.CatalogItem.Machine.Virtual.vSphere], BlueprintId: [XXXX-XXX], CompId: [VirtualMachine], BlueprintRequestId:[XXXX-XXX], RootCafeRequestId: [XXXX-XXX], SubtenantId: [XXXX-XXX]with binding id [XXXX-XXX] failed with [HTTP/1.1 404 Not Found : Resource not found for the segment 'VirtualMachineExts'.]. org.odata4j.exceptions.NotFoundException: HTTP/1.1 404 Not Found : Resource not found for the segment 'VirtualMachineExts'. at org.odata4j.exceptions.NotFoundException$Factory.createException(NotFoundException.java:46) ~[odata4j-core-7.3.1-SNAPSHOT.jar:?] at org.odata4j.exceptions.NotFoundException$Factory.createException(NotFoundException.java:37) ~[odata4j-core-7.3.1-SNAPSHOT.jar:?] at org.odata4j.exceptions.ODataProducerExceptions.create(ODataProducerExceptions.java:92) ~[odata4j-core-7.3.1-SNAPSHOT.jar:?] at org.odata4j.consumer.ErrorMessageParser.parse(ODataCxfClientEx.java:383) ~[platform-odata4j-7.3.1-SNAPSHOT.jar:?] at org.odata4j.consumer.ODataCxfClientEx.doRequest(ODataCxfClientEx.java:308) ~[platform-odata4j-7.3.1-SNAPSHOT.jar:?] at org.odata4j.consumer.AbstractODataClient.getEntity(AbstractODataClient.java:65) ~[odata4j-core-7.3.1-SNAPSHOT.jar:?] Repository.log [UTC:2019-05-16 05:13:44 Local:2019-05-16 13:13] [Error]: [sub-thread-Id="27" context="" token=""] System.Data.Services.DataServiceException: Resource not found for the segment 'VirtualMachineExts'. at System.Data.Services.Providers.DataServiceExecutionProviderWrapper.GetSingleResultFromRequest(SegmentInfo segmentInfo) at System.Data.Services.DataService`1.CompareETagAndWriteResponse(RequestDescription description, IDataService dataService, IODataResponseMessage responseMessage) at System.Data.Services.DataService`1.SerializeResponseBody(RequestDescription description, IDataService dataService, IODataResponseMessage responseMessage) at System.Data.Services.DataService`1.HandleRequest() Looking at the blueprint configuration , user configured build information to be Blueprint Type : Server Action : Create Provisioning Workflow : BasicVmWorkflow They were using component profiles to choose different OS types ( 2008 R2 , 2012 & 2016 ) when submitting request using service catalog When we verified available Image component profiles , we found that the template field was blank. When we mapped appropriate templates to their corresponding value sets then provisioning started to work as expected
- A newer version of the product is already installed on this machine
After installing vRA 7.5 HF5, it is not possible to install any additional IAAS components - e.g. proxy agent / DEM. Attempt to launch the installer fails: Error: "A newer version of the product is already installed on this machine". This is due to updated proxy agent launcher to version 7.5.0.16144 in HF5. "C:\Program Files (x86)\VMware\vCAC\Agents\\VRMAgent.exe" Since it is only the launcher - the "add remove programs" is not updated with this info. To workaround this problem Take a snapshot of IAAS machine Take a backup of file "C:\Program Files (x86)\VMware\vCAC\Agents\\VRMAgent.exe" From VRA Appliance master node - download file: /usr/lib/vcac/patches/repo/contents/vRA-patch/f759789753d68625f0136e9bfe159b19c4607b4cebf06566c4cf6c43fa5b4b5b/patch-vRA-7.5.9501892.12933355-HF5-Reverse/vcac/components/iaas.net/build/bin/Release/legacy/VRMAgent.exe Stop "VMware vCloud Automation Center Agent - " service in IAAS Replace the VRMAgent.exe file Install a new vSphere Agent Replace the VRMAgent.exe with the backup file taken in step 2 Start the service Now attempt to install DEM/Proxy agent it should work as expected
- vRealize Automation 7.4 HF6 released
vRealize Automation 7.4 HF6 has been officially released New issues resolved in this patch are User cannot see owned items when Owned By Me is selected When trying to complete new request form, the wizard returns to the initial request screen Requests for 20 instances of a Windows blueprint, most instance fail with 'optimistic locking failed' message. When there is partially successful Scale-out operation on a nested BP deployment and retry to scale-out again, the scale-out request form will hang with the error at the back end in the vRA server log. Cannot change lease of an expired deployment Read Cumulative Update for vRA Knowledge Base for 7.4 for more information on Pre-requisites and Installation
- L1 Terminal Fault a.k.a L1TF
Intel has disclosed details on a new class of CPU speculative-execution vulnerabilities known collectively as “L1 Terminal Fault” that can occur on past and current Intel processors (from at least 2009 – 2018) Like Meltdown, Rogue System Register Read, and "Lazy FP state restore", the “L1 Terminal Fault” vulnerability can occur when affected Intel microprocessors speculate beyond an unpermitted data access. By continuing the speculation in these cases, the affected Intel microprocessors expose a new side-channel for attack Three CVEs collectively cover this form of vulnerability for Intel CPU's CVE-2018-3646 CVE-2018-3620 CVE-2018-3615 Let's discuss these CVE's one at a time CVE-2018-3646 Vulnerability Summary Referred as L1 Terminal Fault - VMM It's one of these Intel microprocessor vulnerabilities and impacts hypervisors. It may allow a malicious VM running on a given CPU core to effectively infer contents of the hypervisor's or another VM's privileged information residing at the same time in the same core's L1 Data cache. Because current Intel processors share the physically-addressed L1 Data Cache across both logical processors of a Hyperthreading (HT) enabled core, indiscriminate simultaneous scheduling of software threads on both logical processors creates the potential for further information leakage CVE-2018-2646 has two currently known attack vectors Sequential-Context Attack A malicious VM can potentially infer recently accessed L1 data of a previous context (hypervisor thread or other VM thread) on either logical processor of a processor core. Concurrent-Context Attack A malicious VM can potentially infer recently accessed L1 data of a concurrently executing context (hypervisor thread or other VM thread) on the other logical processor of the hyper-threading processor core Mitigation Summary Mitigation of Sequential-Context Attack vector is achieved by vSphere updates and patches.This mitigation is enabled by default and does not impose a significant performance impact Mitigation of the Concurrent-Context Attack vector requires enablement of a new feature known as the ESXi Side-Channel-Aware Scheduler. The initial version of this feature will only schedule the hypervisor and VMs on one logical processor of an Intel Hyperthreading-enabled core. This feature may impose a non-trivial performance impact and is not enabled by default Mitigation Process Update Phase The Sequential-context attack vector is mitigated by a vSphere update to the product versions listed in VMware Security Advisory VMSA-2018-0020. This mitigation is dependent on Intel microcode updates (provided in separate ESXi patches for most Intel hardware platforms) which are also documented in VMSA-2018-0020. IMPORTANT NOTE As displayed in the workflow above, vCenter Server should be updated prior to applying ESXi patches. Notification messages were added in the aforementioned updates and patches to explain that the ESXi Side-Channel-Aware Scheduler must be enabled to mitigate the Concurrent-context attack vector of CVE-2018-3646. If ESXi is updated prior to vCenter you may receive cryptic notification messages relating to this. After vCenter has been updated, the notifications will be shown correctly. Planning Phase The Concurrent-context attack vector is mitigated through enablement of the ESXi Side-Channel-Aware Scheduler which is included in the updates and patches listed in VMSA-2018-0020. This scheduler is not enabled by default. Enablement of this scheduler may impose a non-trivial performance impact on applications running in a vSphere environment. The goal of the Planning Phase is to understand if your current environment has sufficient CPU capacity to enable the scheduler without operational impact. The following list summarizes potential problem areas after enabling the ESXi Side-Channel-Aware Scheduler: VMs configured with vCPUs greater than the physical cores available on the ESXi host VMs configured with custom affinity or NUMA settings VMs with latency-sensitive configuration ESXi hosts with Average CPU Usage greater than 70% Hosts with custom CPU resource management options enabled HA Clusters where a rolling upgrade will increase Average CPU Usage above 100% IMPORTANT NOTE The above list is meant to be a brief overview of potential problem areas related to enablement of the ESXi Side-Channel-Aware Scheduler. The VMware Performance Team has provided an in-depth guide as well as performance data in KB 55767. It is strongly suggested to thoroughly review this document prior to enablement of the scheduler. It may be necessary to acquire additional hardware, or rebalance existing workloads, before enablement of the ESXi Side-Channel-Aware Scheduler. Organizations can choose not to enable the ESXi Side-Channel-Aware Scheduler after performing a risk assessment and accepting the risk posed by the Concurrent-context attack vector. This is NOT RECOMMENDED and VMware cannot make this decision on behalf of an organization. Scheduler Enablement Phase After addressing the potential problem areas described above during the Planning Phase, the ESXi Side-Channel-Aware Scheduler must be enabled to mitigate the Concurrent-context attack vector of CVE-2018-3646. The scheduler can be enabled on an individual ESXi host via the advanced configuration option hyperthreadingMitigation. This can be done by performing the following steps: Enabling the ESXi Side-Channel-Aware Scheduler using the vSphere Web Client or vSphere Client Connect to the vCenter Server using either the vSphere Web or vSphere Client. Select an ESXi host in the inventory. Click the Manage (5.5/6.0) or Configure (6.5/6.7) tab. Click the Settings sub-tab. Under the System heading, click Advanced System Settings. Click in the Filter box and search VMkernel.Boot.hyperthreadingMitigation Select the setting by name and click the Edit pencil icon. Change the configuration option to true (default: false). Click OK. Reboot the ESXi host for the configuration change to go into effect. Enabling the ESXi Side-Channel-Aware Scheduler using ESXi Embedded Host Client Connect to the ESXi host by opening a web browser to https://HOSTNAME. Click the Manage tab Click the Advanced settings sub-tab Click in the Filter box and search VMkernel.Boot.hyperthreadingMitigation Select the setting by name and click the Edit pencil icon Change the configuration option to true (default: false) Click Save. Reboot the ESXi host for the configuration change to go into effect. Enable ESXi Side-Channel-Aware Scheduler setting using ESXCLI SSH to an ESXi host or open a console where the remote ESXCLI is installed. For more information, see the http://www.vmware.com/support/developer/vcli/. Check the current runtime value of the HTAware Mitigation Setting by running esxcli system settings kernel list -o hyperthreadingMitigation To enable HT Aware Mitigation , run this command esxcli system settings kernel set -s hyperthreadingMitigation -v TRUE Reboot the ESXi host for the configuration change to go into effect. Refer to the following KB articles for product-specific mitigation procedures and/or vulnerability analysis: vSphere KB 55806 Hosted (Workstation/Fusion) KB 57138 VMware SaaS offerings: KB 55808 CVE-2018-3620 Referred as L1 Terminal Fault - OS ( Operating System-Specific Mitigations ) VMware has investigated the impact CVE-2018-3620 may have on virtual appliances. Details on this investigation including a list of unaffected virtual appliances can be found in KB 55807. Products that ship as an installable windows or linux binary are not directly affected, but patches may be required from the respective operating system vendor that these products are installed on. VMware recommends contacting your 3rd party operating system vendor to determine appropriate actions for mitigation of CVE-2018-3620. This issue may be applicable to customer-controlled environments running in a VMware SaaS offering, review KB 55808. CVE-2018-3615 Referred as L1 Terminal Fault - SGX CVE-2018-3615 does not affect VMware products or services. See KB 54913 for more information
- Remediation for Spectre Vulnerability
As you may be aware, VMware has released Spectre patches for ESXi and VC on 20th March, 2018. Related Articles Click to read SecurityAdvisory ( Updated VMSA-2018-0004.3 ) VMware KB describing Hypervisor-Assisted Guest Mitigation for Branch Target injection Suggested update Sequence It is mandatory to follow the below order to deploy the fix for Meltdown and Spectre Deploy the updated version of vCenter Server listed in VMSA-2018-0004 Deploy the ESXi patches listed in VMSA-2018-0004 (though we have applied the patch to ESXi already we need to apply this patch as well) Deploy the Guest OS patches for CVE-2017-5715. These patches are to be obtained from your OS vendor. VMware recommends applying the firmware update including the CPU microcode over software patch with microcode Ensure that VMs are using Hardware Version 9 or higher. For best performance, Hardware Version 11 or higher is recommended. VMware Knowledge Base article 1010675 discusses Hardware Versions. You should follow the below sequence to update Hardware Version Update VMware tools to the latest available with the patched host. Update the VM hardware version to 9 or above Shutdown VM using Guest OS console Wait for the VM to appear powered Off in the vCenter server UI. Power on the VMs The new versions of vCenter Server set restrictions on ESXi hosts joining an Enhanced vMotion Cluster, see VMware Knowledge Base article 52085 for details. You will not be able to migrate a VM from a patched host to a non-patched host. Please keep this in mind when preparing for upgrades. #vSphere