PKS and NSX-T: Error: Timed out pinging after 600 seconds

I’m still playing with PKS 1.3 and NSX-T 2.3.1 in my lab. One issue that I kept encountering was that when on deploying my Kubernetes cluster, my master and worker nodes kept failing with a “timed out” trying to do a ping. A bosh task command showed the errors, as shown here.

cormac@pks-cli:~$ bosh task
Using environment ‘192.50.0.140’ as client ‘ops_manager’
Task 845
Task 845 | 16:56:36 | Preparing deployment: Preparing deployment
Task 845 | 16:56:37 | Warning: DNS address not available for the link provider instance: pivotal-container-service/0c23ed00-d40a-4bfe-abee-1c
Task 845 | 16:56:37 | Warning: DNS address not available for the link provider instance: pivotal-container-service/0c23ed00-d40a-4bfe-abee-1c
Task 845 | 16:56:37 | Warning: DNS address not available for the link provider instance: pivotal-container-service/0c23ed00-d40a-4bfe-abee-1c
Task 845 | 16:56:49 | Preparing deployment: Preparing deployment (00:00:13)
Task 845 | 16:57:24 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 845 | 16:57:24 | Creating missing vms: master/f46b8a5f-f864-4217-8b54-753f535e69d8 (0)
Task 845 | 16:57:24 | Creating missing vms: worker/9b58c1ac-8fcc-41db-9e52-8e8deb4e944b (0)
Task 845 | 16:57:24 | Creating missing vms: worker/e61b4262-5bcc-4567-bd1c-fea7058a743f (2)
Task 845 | 16:57:24 | Creating missing vms: worker/8982a101-05ea-416d-ba9c-b6e6a98059cf (1)
Task 845 | 17:08:42 | Creating missing vms: worker/9b58c1ac-8fcc-41db-9e52-8e8deb4e944b (0) (00:11:18)
L Error: Timed out pinging to 320d032a-60e0-4c8c-bfa5-bd94e1d0ae13 after 600 seconds
Task 845 | 17:08:42 | Creating missing vms: worker/8982a101-05ea-416d-ba9c-b6e6a98059cf (1) (00:11:18)
L Error: Timed out pinging to dbc1d5cf-3e2a-4618-86d2-280172fb3ffd after 600 seconds
Task 845 | 17:08:43 | Creating missing vms: worker/e61b4262-5bcc-4567-bd1c-fea7058a743f (2) (00:11:19)
L Error: Timed out pinging to 54342511-1768-48b9-8009-27979ccf2542 after 600 seconds
Task 845 | 17:08:51 | Creating missing vms: master/f46b8a5f-f864-4217-8b54-753f535e69d8 (0) (00:11:27)
L Error: Timed out pinging to 2f86ef1a-9636-46ed-993c-d2f11811eae0 after 600 seconds
Task 845 | 17:08:51 | Error: Timed out pinging to 320d032a-60e0-4c8c-bfa5-bd94e1d0ae13 after 600 seconds
Task 845 Started Thu Feb 7 16:56:36 UTC 2019
Task 845 Finished Thu Feb 7 17:08:51 UTC 2019
Task 845 Duration 00:12:15
Task 845 error
Capturing task ‘845’ output:
Expected task ‘845’ to succeed but state is ‘error’
Exit code 1
cormac@pks-cli:~$

I eventually traced it to an MTU configuration size in the Edge Profile. Because my edge was connected to a trunk port and not a particular VLAN, I should have been using an MTU size of 1500. Instead, I had an MTU size of 1600. Once I corrected that misconfiguration, the K8s cluster deployment managed to proceed beyond this point.

I also heard from a customer that they saw a similar issue when the MTU size was misconfigured on the underlying VSS standard switch or VDS distributed switch to which the Edge is connected. So take a look at your MTU sizes if you run into this issue – it may be the problem.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.