Error code “NetworkNotFound” on Photon Controller 1.0

PHOTON_square140I mentioned yesterday that Photon Controller version 1.0 is now available. I rolled it out yesterday, and just like I did with previous versions, I started to deploy some frameworks on top. My first task was to put a Mesos framework on top on Photon Controller. I’d done this many times before, and was able to successfully roll out this same framework with the exact same settings on Photon Controller v0.9. But yesterday I hit the following error when creating my cluster:

cormac@cs-dhcp32-29:~$ photon cluster create -n mesos -k mesos --dns 10.27.51.252 \
--gateway 10.27.51.254\ --netmask 255.255.255.0 --zookeeper1 10.27.51.118 -s 2 
Using target 'http://10.27.51.117' 
Zookeeper server 2 static IP address (leave blank for none):   
Creating cluster: mesos (MESOS)   
Slave count: 2   
Are you sure [y/n]? y 

2016/09/27 14:59:54 photon: Task 'eb2a1acd-e6f6-4ecb-b6bc-b6b35f7b4ded' is in \ 
error state: {@step=={"sequence"=>"1","state"=>"ERROR","errors"=>[photon: { \ 
HTTP status: '0', code: 'InternalError', message: 'Failed to rollout \ 
MesosZookeeper. Error: MultiException[java.lang.IllegalStateException: \ 
VmProvisionTaskService failed with error [Task "CREATE_VM": step "RESERVE_RESOURCE" \ 
failed with error code "NetworkNotFound", message "Network default (physical) \
not found"]. /

photon/clustermanager/vm-provision-tasks/79e4229c-361a-4cc9-9926-\ 
fd8dff13d114]', data: 'map[]' }],"warnings"=>[],"operation"=>"CREATE_MESOS_CLUSTER\ 
_SETUP_ZOOKEEPERS","startedTime"=>"1474984788404","queuedTime"=>"1474984788388",\ 
"endTime"=>"1474984793408","options"=>map[]}} 

API Errors: [photon: { HTTP status: '0', code: 'InternalError', message: 'Failed \ 
to rollout MesosZookeeper. Error: MultiException[java.lang.IllegalStateException: \ 
VmProvisionTaskService failed with error [Task "CREATE_VM": step "RESERVE_RESOURCE" \ 
failed with error code "NetworkNotFound", message "Network default (physical) \
not found"]. /photon/clustermanager/vm-provision-tasks/79e4229c-361a-4cc9-9926-\ 
fd8dff13d114]', data: 'map[]' }]

I wasn’t sure what this the error was – I certainly had not encountered it before: Network default (physical) not found? I spoke to some of the Photon Controller engineers, and they mentioned that the API semantics around networks changed slightly in Photon Controller v1.0. Now, when deploying a cluster, you can either create a default network beforehand, or specify it when creating the cluster using the new –network_id option on the command line. To create a default network, you can use the following as an example. Here I am making the “VM Network” the default network:

cormac@cs-dhcp32-29:~$ photon network create --name vm-network -p "VM Network"
8eb5e3d8-3b06-4743-94ce-8c8b1034331c
cormac@cs-dhcp32-29:~$ photon network set-default 8eb5e3d8-3b06-4743-94ce-8c8b1034331c

You will need to get the latest photon controller cli to use the set-default argument however. This is available on github. If you wish to specify the –network_id on the command line, which is what I decided to do, you can do the following:

cormac@cs-dhcp32-29:~$ photon network create --name vm-network -p "VM Network"
Description of network: "VM Network"
Using target 'http://10.27.51.117'
CREATE_NETWORK completed for 'subnet' entity 180246d4-e125-4faa-8b72-716b1d57102e

cormac@cs-dhcp32-29:~$ photon network list
Using target 'http://10.27.51.117'
ID Name State PortGroups Descriptions
180246d4-e125-4faa-8b72-716b1d57102e vm-network READY [VM Network] "VM Network"
Total: 1

cormac@cs-dhcp32-29:~$ photon cluster create -n mesos -k mesos --dns 10.27.51.252 \
--gateway 10.27.51.254 --netmask 255.255.255.0 --zookeeper1 10.27.51.118 -s 2 \
--network_id 180246d4-e125-4faa-8b72-716b1d57102e
Using target 'http://10.27.51.117'
Zookeeper server 2 static IP address (leave blank for none):

Creating cluster: mesos (MESOS)
 Slave count: 2

Are you sure [y/n]? y
CREATE_CLUSTER completed for 'cluster' entity 3aeb3062-b0dc-4399-ab65-155bf6e0ebc2
Note: the cluster has been created with minimal resources. You can use the cluster now.
A background task is running to gradually expand the cluster to its target capacity.
You can run 'cluster show ' to see the state of the cluster.

And just to confirm that the Mesos cluster did indeed deploy successfully:

cormac@cs-dhcp32-29:~$ photon cluster show 3aeb3062-b0dc-4399-ab65-155bf6e0ebc2
Using target 'http://10.27.51.117'
Cluster ID: 3aeb3062-b0dc-4399-ab65-155bf6e0ebc2
 Name: mesos
 State: READY
 Type: MESOS
 Slave count: 2
 Extended Properties: map[netmask:255.255.255.0 dns:10.27.51.252 \
 zookeeper_ips:10.27.51.118 gateway:10.27.51.254]

VM ID VM Name VM IP
0f789d45-567b-4145-81b6-f0f840d325ff \
master-e6078abe-95fa-43f4-accc-96c6fc6dbf5e 10.27.51.95
5beddbc4-d7b7-4972-9508-631796d41693 \
master-69900ba2-999b-4dea-b09d-2f6c6a0735f0 10.27.51.99
6cbe983f-f6c3-4450-92b2-52a37b82bee4 \
master-8f228fa3-9327-40fd-bab8-71aad70db916 10.27.51.98
df7ecb03-da0a-4d48-ab2c-3c245d2129c7 \
zookeeper-828a315d-7d82-4065-9107-fa9e567aad45 10.27.51.118
f5473630-ec4f-439c-ad0c-31555dc4aaf7 \
marathon-5aafbb4a-fb7c-4289-b377-94639e482ad1 10.27.51.101
cormac@cs-dhcp32-29:~$

And now if I browse to a master on port 5050 or the marathon VM on port 8080, I should see everything up and running. First is the Mesos master (you can connect to any of them):

mesos-masterAnd the next is the marathon framework which is included in this distro:

marathonAll looks good. So just to recap, some of the network semantics have changed in Photon Controller 1.0, so if you run into an issue, hopefully this post will help you out.