编写一个 operator

本文来演示如何创建一个operator, 该operator会自动监管应用的pod数量。并且,把这个operator部署在Kubernete/OpenShift 集群上,让它真正运行起来。

安装operator-sdk

Mac 直接用 `brew` 安装即可。其它平台可以参考https://github.com/operator-framework/operator-sdk/blob/master/doc/user/install-operator-sdk.md

$ brew install operator-sdk
$ operator-sdk version
$ operator-sdk version: "v0.12.0", commit: "2445fcda834ca4b7cf0d6c38fba6317fb219b469", go version: "go1.13.4 darwin/amd64"

注意: 这里最好使用go1.13+

新建一个operator 项目, 比如 learn-operator

operator-sdk 目前支持Go, Ansible 和 Helm语言. 本示例,我选择使用默认的Golang语言来编写这个operator. 工程目录为 $GOPATH/src/github.com/exmaple-inc

mac:example-inc jianzhang$ operator-sdk new learn-operator
INFO[0000] Creating new Go operator 'learn-operator'.   
INFO[0000] Created go.mod                               
INFO[0000] Created tools.go                             
INFO[0000] Created cmd/manager/main.go                  
INFO[0000] Created build/Dockerfile                     
INFO[0000] Created build/bin/entrypoint                 
INFO[0000] Created build/bin/user_setup                 
INFO[0000] Created deploy/service_account.yaml          
INFO[0000] Created deploy/role.yaml                     
INFO[0000] Created deploy/role_binding.yaml             
INFO[0000] Created deploy/operator.yaml                 
INFO[0000] Created pkg/apis/apis.go                     
INFO[0000] Created pkg/controller/controller.go         
INFO[0000] Created version/version.go                   
INFO[0000] Created .gitignore                           
INFO[0000] Validating project                           
INFO[0022] Project validation successful.               
INFO[0022] Project creation complete. 

查看下目录结构

可以看到整个工程的框架已经被operator-sdk 创建好了。并且operator-sdk 已经帮我们创建好了与Kubernetes 或者 OpenShift 相关的代码,非常方便!作为应用开发人员,我们并不需要深入了解底层云平台的API 接口。只需专注于自己的逻辑代码就可以了。

mac:example-inc jianzhang$ tree learn-operator/
learn-operator/
├── build
│   ├── Dockerfile
│   └── bin
│       ├── entrypoint
│       └── user_setup
├── cmd
│   └── manager
│       └── main.go
├── deploy
│   ├── operator.yaml
│   ├── role.yaml
│   ├── role_binding.yaml
│   └── service_account.yaml
├── go.mod
├── go.sum
├── pkg
│   ├── apis
│   │   └── apis.go
│   └── controller
│       └── controller.go
├── tools.go
└── version
    └── version.go
 
9 directories, 14 files

业务逻辑代码只需关心两个方面:

pkg/apis/apis.go
package apis
 
import (
	"k8s.io/apimachinery/pkg/runtime"
)
 
// AddToSchemes may be used to add all resources defined in the project to a Scheme
var AddToSchemes runtime.SchemeBuilder
 
// AddToScheme adds all Resources to the Scheme
func AddToScheme(s *runtime.Scheme) error {
	return AddToSchemes.AddToScheme(s)
}
pkg/controller/controller.go
package controller
 
import (
	"sigs.k8s.io/controller-runtime/pkg/manager"
)
 
// AddToManagerFuncs is a list of functions to add all Controllers to the Manager
var AddToManagerFuncs []func(manager.Manager) error
 
// AddToManager adds all Controllers to the Manager
func AddToManager(m manager.Manager) error {
	for _, f := range AddToManagerFuncs {
		if err := f(m); err != nil {
			return err
		}
	}
	return nil
}

开始编写逻辑代码

使用`add api` 创建新的API资源

使用 --kind 来指定新API的名称,这里命名为 `Learn`

mac:learn-operator jianzhang$ operator-sdk add api --api-version=app.learn.com/v1 --kind=Learn
INFO[0000] Generating api version app.learn.com/v1 for kind Learn. 
INFO[0000] Created pkg/apis/app/group.go                
INFO[0033] Created pkg/apis/app/v1/learn_types.go       
INFO[0033] Created pkg/apis/addtoscheme_app_v1.go       
INFO[0033] Created pkg/apis/app/v1/register.go          
INFO[0033] Created pkg/apis/app/v1/doc.go               
INFO[0033] Created deploy/crds/app.learn.com_v1_learn_cr.yaml 
INFO[0037] Created deploy/crds/app.learn.com_learns_crd.yaml 
INFO[0037] Running deepcopy code-generation for Custom Resource group versions: [app:[v1], ] 
INFO[0045] Code-generation complete.                    
INFO[0045] Running OpenAPI code-generation for Custom Resource group versions: [app:[v1], ] 
INFO[0054] Created deploy/crds/app.learn.com_learns_crd.yaml 
INFO[0054] Code-generation complete.                    
INFO[0054] API generation complete.  

可以看到,对应的CR(customer resource)已经被operator-sdk 创建。

deploy/crds/app.learn.com_v1_learn_cr.yaml
apiVersion: app.learn.com/v1
kind: Learn
metadata:
  name: example-learn
spec:
  # Add fields here
  size: 3

使用`add controller`创建对应的控制器

mac:learn-operator jianzhang$ operator-sdk add controller --api-version=app.learn.com/v1 --kind=Learn
INFO[0000] Generating controller version app.learn.com/v1 for kind Learn. 
INFO[0000] Created pkg/controller/learn/learn_controller.go 
INFO[0000] Created pkg/controller/add_learn.go          
INFO[0000] Controller generation complete. 

添加代码

在资源类型文件中定义自己的资源结构。本示例的operator会监控Learn 资源,并根据Learn 资源中的size 域来更改对应的pod 数量。LearnStatus 结构会显示实时状态。

pkg/apis/app/v1/learn_types.go
type LearnSpec struct {
	// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
	// Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
	// Add custom validation using kubebuilder tags: https://book-v1.book.kubebuilder.io/beyond_basics/generating_crd.html
	// Size is the size of the learn deployment
	Size int32 `json:"size"`
}

// LearnStatus defines the observed state of Learn
// +k8s:openapi-gen=true
type LearnStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "operator-sdk generate k8s" to regenerate code after modifying this file
	// Add custom validation using kubebuilder tags: https://book-v1.book.kubebuilder.io/beyond_basics/generating_crd.html
	// PodNames are the names of the learn pods
	PodNames []string `json:"podnames"`
}

逻辑控制代码:

	// Ensure the deployment size is the same as the spec
	size := learn.Spec.Size
	if *found.Spec.Replicas != size {
		found.Spec.Replicas = &size
		err = r.client.Update(context.TODO(), found)
		if err != nil {
			reqLogger.Error(err, "Failed to update Deployment", "Deployment.Namespace", found.Namespace, "Deployment.Name", found.Name)
			return reconcile.Result{}, err
		}
		// Spec updated - return and requeue
		return reconcile.Result{Requeue: true}, nil
	}

	// Update the Learn status with the pod names
	// List the pods for this learn's deployment
	podList := &corev1.PodList{}
	listOpts := []client.ListOption{
		client.InNamespace(learn.Namespace),
		client.MatchingLabels(labelsForLearn(learn.Name)),
	}
	if err = r.client.List(context.TODO(), podList, listOpts...); err != nil {
		reqLogger.Error(err, "Failed to list pods", "Learn.Namespace", learn.Namespace, "Learn.Name", learn.Name)
		return reconcile.Result{}, err
	}
	podNames := getPodNames(podList.Items)

	// Update status.PodNames if needed
	if !reflect.DeepEqual(podNames, learn.Status.PodNames) {
		learn.Status.PodNames = podNames
		err := r.client.Status().Update(context.TODO(), learn)
		if err != nil {
			reqLogger.Error(err, "Failed to update Learn status")
			return reconcile.Result{}, err
		}
	}

构建对应的operator image

现在,代码已经写好了。我们要让它运行起来。在云平台中,组件是容器化运行,那首先我们需要创建一个image. 使用build 参数可以快速把代码打包到一个image. 当然你可以修改Dockerfile 来定制特别的需求,这里选择默认配置。构建过程如下:

mac:learn-operator jianzhang$ operator-sdk build quay.io/jiazha/learn-operator
INFO[0001] Building OCI image quay.io/jiazha/learn-operator 
Sending build context to Docker daemon  40.14MB
Step 1/7 : FROM registry.access.redhat.com/ubi8/ubi-minimal:latest
latest: Pulling from ubi8/ubi-minimal
645c2831c08a: Pull complete 
5e98065763a5: Pull complete 
Digest: sha256:32fb8bae553bfba2891f535fa9238f79aafefb7eff603789ba8920f505654607
Status: Downloaded newer image for registry.access.redhat.com/ubi8/ubi-minimal:latest
 ---> 469119976c56
Step 2/7 : ENV OPERATOR=/usr/local/bin/learn-operator     USER_UID=1001     USER_NAME=learn-operator
 ---> Running in 0238e3a3b78a
Removing intermediate container 0238e3a3b78a
 ---> a5a49d29df84
Step 3/7 : COPY build/_output/bin/learn-operator ${OPERATOR}
 ---> b9f310c13223
Step 4/7 : COPY build/bin /usr/local/bin
 ---> 085a9494584e
Step 5/7 : RUN  /usr/local/bin/user_setup
 ---> Running in 564f938ba278
+ mkdir -p /root
+ chown 1001:0 /root
+ chmod ug+rwx /root
+ chmod g+rw /etc/passwd
+ rm /usr/local/bin/user_setup
Removing intermediate container 564f938ba278
 ---> 2ddceb6ddd43
Step 6/7 : ENTRYPOINT ["/usr/local/bin/entrypoint"]
 ---> Running in 50e82b9c4b58
Removing intermediate container 50e82b9c4b58
 ---> 01889797cc39
Step 7/7 : USER ${USER_UID}
 ---> Running in 9d9917ada91b
Removing intermediate container 9d9917ada91b
 ---> d34a0831ba52
Successfully built d34a0831ba52
Successfully tagged quay.io/jiazha/learn-operator:latest
INFO[0038] Operator build complete. 

把该镜像推送到一个image 仓库。这里选择Quay.

注意,这里选择的是一个公开的image 仓库。如果使用私有的,需要另外配置你的仓库的token 到这个云平台中。

mac:learn-operator jianzhang$ docker push  quay.io/jiazha/learn-operator
The push refers to repository [quay.io/jiazha/learn-operator]
89ed084dc713: Pushed 
6c1790c8ff98: Pushed 
198c24bacf4a: Pushed 
a066f3d73913: Pushed 
26b543be03e2: Pushed 
latest: digest: sha256:1bc419f412b5fe6efeb310783095d94523d6e059c6e974ca444a287bab80dd0d size: 8377

部署operator

我们使用YAML文件来部署这个operator到云平台,当然你也可以使用Helm. Operator-SDK 已经自动生成了所有相关的部署文件,我们只需在部署文件中配置上面这个image 即可.

$ sed -i "" 's|REPLACE_IMAGE|quay.io/jiazha/learn-operator|g' deploy/operator.yaml

可以看到,在部署之前,当前集群中并无 kind 资源:

mac:learn-operator jianzhang$ oc get learn
error: the server doesn't have a resource type "learn"

开始部署:

mac:learn-operator jianzhang$ oc create -f deploy/role.yaml 
role.rbac.authorization.k8s.io/learn-operator created
mac:learn-operator jianzhang$ oc create -f deploy/role_binding.yaml 
rolebinding.rbac.authorization.k8s.io/learn-operator created
mac:learn-operator jianzhang$ oc create -f deploy/operator.yaml 
deployment.apps/learn-operator created
mac:learn-operator jianzhang$ oc create -f deploy/crds/app.learn.com_learns_crd.yaml 
customresourcedefinition.apiextensions.k8s.io/learns.app.learn.com created

可以看到该operator已经运行起来了,并且该集群中已经有了learn资源了!

mac:learn-operator jianzhang$ oc get pods
NAME                              READY   STATUS    RESTARTS   AGE
learn-operator-768d88c6d6-8g9lz   1/1     Running   0          10m
mac:learn-operator jianzhang$ oc get learn
No resources found.

可轻松定制自己的API 资源,这就是Kubernetes的魅力所在!关于如何快速搭建自己的Kubernetes 或者 OpenShift 会在之后介绍。

好了,那我们就开始使用这个learn 资源吧!

使用定制的资源

我们指定该资源的大小为2 看看会发生什么

eploy/crds/app.learn.com_v1_learn_cr.yaml
apiVersion: app.learn.com/v1
kind: Learn
metadata:
  name: example-learn
spec:
  # Add fields here
  size: 2

mac:learn-operator jianzhang$ oc create -f  deploy/crds/app.learn.com_v1_learn_cr.yaml 
learn.app.learn.com/example-learn created
mac:learn-operator jianzhang$ oc get learn
NAME            AGE
example-learn   2m12s

查看下这个example-learn 对象,可以看到它的status 显示了两个pod 的名称。我们再看下pod, 可以看到就是新生成的这两个pod!

	
mac:learn-operator jianzhang$ oc get learn example-learn -o yaml
apiVersion: app.learn.com/v1
kind: Learn
metadata:
  creationTimestamp: "2019-11-09T14:20:07Z"
  generation: 1
  name: example-learn
  namespace: learn
  resourceVersion: "3098847"
  selfLink: /apis/app.learn.com/v1/namespaces/learn/learns/example-learn
  uid: ce6c8b2b-b5f1-4fba-8ded-649849920186
spec:
  size: 2
status:
  podnames:
  - example-learn-6764b9858-l9xpj
  - example-learn-6764b9858-tzdnv
  
mac:learn-operator jianzhang$ oc get pods
NAME                              READY   STATUS    RESTARTS   AGE
example-learn-6764b9858-l9xpj     1/1     Running   0          2m42s
example-learn-6764b9858-tzdnv     1/1     Running   0          2m42s
learn-operator-768d88c6d6-cfl6n   1/1     Running   0          3m37s

那把size 改为 3 试试?可以看到pod数量增长到了3个!

mac:learn-operator jianzhang$ oc edit learn example-learn
learn.app.learn.com/example-learn edited
mac:learn-operator jianzhang$ oc get pods
NAME                              READY   STATUS              RESTARTS   AGE
example-learn-6764b9858-l9xpj     1/1     Running             0          12m
example-learn-6764b9858-pbpzd     0/1     ContainerCreating   0          9s
example-learn-6764b9858-tzdnv     1/1     Running             0          12m
learn-operator-768d88c6d6-cfl6n   1/1     Running             0          13m
 
mac:learn-operator jianzhang$ oc get learn example-learn -o yaml
apiVersion: app.learn.com/v1
kind: Learn
metadata:
  creationTimestamp: "2019-11-09T14:20:07Z"
  generation: 2
  name: example-learn
  namespace: learn
  resourceVersion: "3113493"
  selfLink: /apis/app.learn.com/v1/namespaces/learn/learns/example-learn
  uid: ce6c8b2b-b5f1-4fba-8ded-649849920186
spec:
  size: 3
status:
  podnames:
  - example-learn-6764b9858-l9xpj
  - example-learn-6764b9858-tzdnv
  - example-learn-6764b9858-pbpzd

那如果我不修改这个example-learn 对象,直接删除一个pod呢?会发生什么?

至此,该operator已经在集群中正常运转了,关于这个operator的所有代码可以在这里找到:https://github.com/jianzhangbjz/learn-operator

最后更新于