Skip to content

Custom Resource (CR) 和 Custom Resource Definition (CRD)

概述

Custom Resource Definition (CRD) 是 Kubernetes 的扩展机制,允许用户定义自己的资源类型。通过 CRD,可以创建 Custom Resource (CR),扩展 Kubernetes API,使其支持特定于应用程序的资源。

关键点内容
核心作用扩展 Kubernetes API,定义自定义资源
资源类型CRD(定义)、CR(实例)
API 版本apiextensions.k8s.io/v1
作用域集群级别(CRD)、命名空间或集群级别(CR)
适用场景应用程序特定资源、Operator 开发

CRD 的本质

设计理念

  • API 扩展:扩展 Kubernetes API,支持自定义资源类型
  • 声明式管理:使用 YAML 声明式定义和管理自定义资源
  • 原生集成:与 Kubernetes 原生资源一样使用 kubectl 管理
  • 版本控制:支持多版本 API 和版本转换

工作原理

CRD 定义 → API Server 注册 → 创建 CR 实例 → Controller 处理 → 业务逻辑
    ↓           ↓              ↓             ↓           ↓
  资源模式    扩展API        自定义资源     监听变化    执行操作

CRD vs ConfigMap

特性CRDConfigMap
用途定义新的资源类型存储配置数据
API 集成完全集成到 Kubernetes API内置资源类型
验证支持 OpenAPI 模式验证无结构验证
版本控制支持多版本无版本概念
控制器可以有专门的控制器通常被其他资源引用

基本配置

1. 简单的 CRD 定义

yaml
# 基础 CRD 定义
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: webapps.example.com  # 必须是 <plural>.<group> 格式
  labels:
    app: webapp-operator
    version: v1.0.0
spec:
  # 资源组
  group: example.com
  
  # 支持的版本
  versions:
  - name: v1
    served: true      # 是否提供 API 服务
    storage: true     # 是否作为存储版本
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              # 应用名称
              appName:
                type: string
                minLength: 1
                maxLength: 63
              # 副本数
              replicas:
                type: integer
                minimum: 1
                maximum: 100
                default: 1
              # 镜像
              image:
                type: string
                pattern: '^[a-zA-Z0-9._/-]+:[a-zA-Z0-9._-]+$'
              # 端口
              port:
                type: integer
                minimum: 1
                maximum: 65535
                default: 8080
              # 环境变量
              env:
                type: array
                items:
                  type: object
                  properties:
                    name:
                      type: string
                    value:
                      type: string
                  required: ["name", "value"]
            required: ["appName", "image"]
          status:
            type: object
            properties:
              # 状态
              phase:
                type: string
                enum: ["Pending", "Running", "Failed"]
              # 就绪副本数
              readyReplicas:
                type: integer
              # 消息
              message:
                type: string
              # 最后更新时间
              lastUpdateTime:
                type: string
                format: date-time
    
    # 额外的打印列
    additionalPrinterColumns:
    - name: App Name
      type: string
      description: The name of the application
      jsonPath: .spec.appName
    - name: Replicas
      type: integer
      description: The number of replicas
      jsonPath: .spec.replicas
    - name: Image
      type: string
      description: The container image
      jsonPath: .spec.image
    - name: Status
      type: string
      description: The status of the webapp
      jsonPath: .status.phase
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
  
  # 资源作用域
  scope: Namespaced  # 或 Cluster
  
  # 资源名称
  names:
    plural: webapps      # 复数形式
    singular: webapp     # 单数形式
    kind: WebApp         # 资源类型名
    shortNames:          # 简写
    - wa
    - webapp
    categories:          # 资源分类
    - all

---
# 对应的 Custom Resource 实例
apiVersion: example.com/v1
kind: WebApp
metadata:
  name: my-webapp
  namespace: production
  labels:
    app: my-webapp
    version: v1.0.0
  annotations:
    description: "示例 Web 应用程序"
spec:
  appName: "my-web-application"
  replicas: 3
  image: "nginx:1.20"
  port: 80
  env:
  - name: "ENV"
    value: "production"
  - name: "LOG_LEVEL"
    value: "info"
  - name: "DB_HOST"
    value: "postgres.database.svc.cluster.local"
status:
  phase: "Running"
  readyReplicas: 3
  message: "WebApp is running successfully"
  lastUpdateTime: "2024-01-15T10:30:00Z"

2. 多版本 CRD

yaml
# 支持多版本的 CRD
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.storage.example.com
  labels:
    component: database-operator
spec:
  group: storage.example.com
  
  versions:
  # v1 版本(当前存储版本)
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              # 数据库类型
              type:
                type: string
                enum: ["mysql", "postgresql", "mongodb"]
              # 版本
              version:
                type: string
              # 存储配置
              storage:
                type: object
                properties:
                  size:
                    type: string
                    pattern: '^[0-9]+[KMGT]i$'
                  storageClass:
                    type: string
                required: ["size"]
              # 资源配置
              resources:
                type: object
                properties:
                  requests:
                    type: object
                    properties:
                      cpu:
                        type: string
                      memory:
                        type: string
                  limits:
                    type: object
                    properties:
                      cpu:
                        type: string
                      memory:
                        type: string
              # 高可用配置
              highAvailability:
                type: object
                properties:
                  enabled:
                    type: boolean
                    default: false
                  replicas:
                    type: integer
                    minimum: 1
                    maximum: 9
                    default: 3
            required: ["type", "version", "storage"]
          status:
            type: object
            properties:
              phase:
                type: string
                enum: ["Pending", "Creating", "Running", "Updating", "Failed"]
              conditions:
                type: array
                items:
                  type: object
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                    reason:
                      type: string
                    message:
                      type: string
                    lastTransitionTime:
                      type: string
                      format: date-time
              endpoints:
                type: object
                properties:
                  primary:
                    type: string
                  readonly:
                    type: array
                    items:
                      type: string
    
    additionalPrinterColumns:
    - name: Type
      type: string
      jsonPath: .spec.type
    - name: Version
      type: string
      jsonPath: .spec.version
    - name: Storage
      type: string
      jsonPath: .spec.storage.size
    - name: HA
      type: boolean
      jsonPath: .spec.highAvailability.enabled
    - name: Status
      type: string
      jsonPath: .status.phase
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
  
  # v1beta1 版本(兼容性版本)
  - name: v1beta1
    served: true
    storage: false
    deprecated: true
    deprecationWarning: "storage.example.com/v1beta1 Database is deprecated; use storage.example.com/v1 Database"
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              # 简化的配置(向后兼容)
              dbType:
                type: string
                enum: ["mysql", "postgres"]
              dbVersion:
                type: string
              storageSize:
                type: string
              replicas:
                type: integer
                minimum: 1
                default: 1
            required: ["dbType", "dbVersion", "storageSize"]
          status:
            type: object
            properties:
              state:
                type: string
              message:
                type: string
    
    additionalPrinterColumns:
    - name: Type
      type: string
      jsonPath: .spec.dbType
    - name: Version
      type: string
      jsonPath: .spec.dbVersion
    - name: Storage
      type: string
      jsonPath: .spec.storageSize
    - name: Replicas
      type: integer
      jsonPath: .spec.replicas
    - name: State
      type: string
      jsonPath: .status.state
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
  
  scope: Namespaced
  
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames:
    - db
    categories:
    - all
    - storage
  
  # 版本转换策略
  conversion:
    strategy: None  # None, Webhook

---
# v1 版本的 Database 实例
apiVersion: storage.example.com/v1
kind: Database
metadata:
  name: production-postgres
  namespace: database
  labels:
    app: postgres
    environment: production
spec:
  type: postgresql
  version: "13.7"
  storage:
    size: "100Gi"
    storageClass: "fast-ssd"
  resources:
    requests:
      cpu: "1000m"
      memory: "2Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  highAvailability:
    enabled: true
    replicas: 3
status:
  phase: "Running"
  conditions:
  - type: "Ready"
    status: "True"
    reason: "DatabaseReady"
    message: "Database is ready and accepting connections"
    lastTransitionTime: "2024-01-15T10:30:00Z"
  endpoints:
    primary: "postgres-primary.database.svc.cluster.local:5432"
    readonly:
    - "postgres-replica-1.database.svc.cluster.local:5432"
    - "postgres-replica-2.database.svc.cluster.local:5432"

3. 集群级别的 CRD

yaml
# 集群级别的自定义资源
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: clusters.infrastructure.example.com
  labels:
    component: cluster-manager
spec:
  group: infrastructure.example.com
  
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              # 集群名称
              clusterName:
                type: string
                minLength: 1
                maxLength: 63
                pattern: '^[a-z0-9]([-a-z0-9]*[a-z0-9])?$'
              # 集群类型
              clusterType:
                type: string
                enum: ["development", "staging", "production"]
              # 区域
              region:
                type: string
              # 可用区
              availabilityZones:
                type: array
                items:
                  type: string
                minItems: 1
                maxItems: 5
              # 节点组配置
              nodeGroups:
                type: array
                items:
                  type: object
                  properties:
                    name:
                      type: string
                    instanceType:
                      type: string
                    minSize:
                      type: integer
                      minimum: 0
                    maxSize:
                      type: integer
                      minimum: 1
                    desiredSize:
                      type: integer
                      minimum: 0
                    diskSize:
                      type: integer
                      minimum: 20
                      maximum: 1000
                    labels:
                      type: object
                      additionalProperties:
                        type: string
                    taints:
                      type: array
                      items:
                        type: object
                        properties:
                          key:
                            type: string
                          value:
                            type: string
                          effect:
                            type: string
                            enum: ["NoSchedule", "PreferNoSchedule", "NoExecute"]
                  required: ["name", "instanceType", "minSize", "maxSize", "desiredSize"]
                minItems: 1
              # 网络配置
              networking:
                type: object
                properties:
                  vpcId:
                    type: string
                  subnetIds:
                    type: array
                    items:
                      type: string
                  securityGroupIds:
                    type: array
                    items:
                      type: string
                  serviceIpv4Cidr:
                    type: string
                    pattern: '^([0-9]{1,3}\.){3}[0-9]{1,3}/[0-9]{1,2}$'
              # 加密配置
              encryption:
                type: object
                properties:
                  secretsEncryption:
                    type: boolean
                    default: true
                  kmsKeyId:
                    type: string
              # 日志配置
              logging:
                type: object
                properties:
                  enable:
                    type: boolean
                    default: true
                  types:
                    type: array
                    items:
                      type: string
                      enum: ["api", "audit", "authenticator", "controllerManager", "scheduler"]
            required: ["clusterName", "clusterType", "region", "availabilityZones", "nodeGroups"]
          status:
            type: object
            properties:
              phase:
                type: string
                enum: ["Creating", "Active", "Updating", "Deleting", "Failed"]
              version:
                type: string
              endpoint:
                type: string
              certificateAuthority:
                type: string
              nodeGroups:
                type: array
                items:
                  type: object
                  properties:
                    name:
                      type: string
                    status:
                      type: string
                    capacity:
                      type: integer
                    ready:
                      type: integer
              conditions:
                type: array
                items:
                  type: object
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                    reason:
                      type: string
                    message:
                      type: string
                    lastTransitionTime:
                      type: string
                      format: date-time
    
    additionalPrinterColumns:
    - name: Cluster Name
      type: string
      jsonPath: .spec.clusterName
    - name: Type
      type: string
      jsonPath: .spec.clusterType
    - name: Region
      type: string
      jsonPath: .spec.region
    - name: Status
      type: string
      jsonPath: .status.phase
    - name: Version
      type: string
      jsonPath: .status.version
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
  
  # 集群级别资源
  scope: Cluster
  
  names:
    plural: clusters
    singular: cluster
    kind: Cluster
    shortNames:
    - cls
    categories:
    - all
    - infrastructure

---
# 集群级别的 Custom Resource 实例
apiVersion: infrastructure.example.com/v1
kind: Cluster
metadata:
  name: production-cluster-east
  labels:
    environment: production
    region: us-east-1
    managed-by: cluster-operator
spec:
  clusterName: "prod-east-k8s"
  clusterType: "production"
  region: "us-east-1"
  availabilityZones:
  - "us-east-1a"
  - "us-east-1b"
  - "us-east-1c"
  nodeGroups:
  - name: "system"
    instanceType: "m5.large"
    minSize: 3
    maxSize: 6
    desiredSize: 3
    diskSize: 50
    labels:
      node-type: "system"
      workload: "system-pods"
    taints:
    - key: "node-type"
      value: "system"
      effect: "NoSchedule"
  - name: "application"
    instanceType: "m5.xlarge"
    minSize: 5
    maxSize: 20
    desiredSize: 10
    diskSize: 100
    labels:
      node-type: "application"
      workload: "user-pods"
  networking:
    vpcId: "vpc-12345678"
    subnetIds:
    - "subnet-12345678"
    - "subnet-87654321"
    - "subnet-11111111"
    securityGroupIds:
    - "sg-12345678"
    serviceIpv4Cidr: "10.100.0.0/16"
  encryption:
    secretsEncryption: true
    kmsKeyId: "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
  logging:
    enable: true
    types:
    - "api"
    - "audit"
    - "authenticator"
status:
  phase: "Active"
  version: "1.24.7"
  endpoint: "https://12345678.gr7.us-east-1.eks.amazonaws.com"
  certificateAuthority: "LS0tLS1CRUdJTi..."
  nodeGroups:
  - name: "system"
    status: "Active"
    capacity: 3
    ready: 3
  - name: "application"
    status: "Active"
    capacity: 10
    ready: 10
  conditions:
  - type: "Ready"
    status: "True"
    reason: "ClusterReady"
    message: "Cluster is ready and all node groups are active"
    lastTransitionTime: "2024-01-15T10:30:00Z"

高级配置

1. 带有子资源的 CRD

yaml
# 支持子资源的 CRD
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: applications.app.example.com
  labels:
    component: application-operator
spec:
  group: app.example.com
  
  versions:
  - name: v1
    served: true
    storage: true
    
    # 子资源配置
    subresources:
      # 状态子资源
      status: {}
      
      # 扩缩容子资源
      scale:
        # 指定副本数字段
        specReplicasPath: .spec.replicas
        statusReplicasPath: .status.replicas
        # 标签选择器路径
        labelSelectorPath: .status.selector
    
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              # 应用配置
              application:
                type: object
                properties:
                  name:
                    type: string
                  version:
                    type: string
                  image:
                    type: string
                required: ["name", "version", "image"]
              
              # 副本数(用于扩缩容)
              replicas:
                type: integer
                minimum: 0
                maximum: 100
                default: 1
              
              # 服务配置
              service:
                type: object
                properties:
                  enabled:
                    type: boolean
                    default: true
                  type:
                    type: string
                    enum: ["ClusterIP", "NodePort", "LoadBalancer"]
                    default: "ClusterIP"
                  port:
                    type: integer
                    minimum: 1
                    maximum: 65535
                    default: 80
              
              # 资源配置
              resources:
                type: object
                properties:
                  requests:
                    type: object
                    properties:
                      cpu:
                        type: string
                      memory:
                        type: string
                  limits:
                    type: object
                    properties:
                      cpu:
                        type: string
                      memory:
                        type: string
              
              # 健康检查
              healthCheck:
                type: object
                properties:
                  enabled:
                    type: boolean
                    default: true
                  path:
                    type: string
                    default: "/health"
                  port:
                    type: integer
                    default: 8080
                  initialDelaySeconds:
                    type: integer
                    minimum: 0
                    default: 30
                  periodSeconds:
                    type: integer
                    minimum: 1
                    default: 10
            
            required: ["application", "replicas"]
          
          status:
            type: object
            properties:
              # 当前副本数
              replicas:
                type: integer
              # 就绪副本数
              readyReplicas:
                type: integer
              # 可用副本数
              availableReplicas:
                type: integer
              # 标签选择器
              selector:
                type: string
              # 应用状态
              phase:
                type: string
                enum: ["Pending", "Running", "Updating", "Failed"]
              # 条件
              conditions:
                type: array
                items:
                  type: object
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                    reason:
                      type: string
                    message:
                      type: string
                    lastTransitionTime:
                      type: string
                      format: date-time
              # 观察到的生成
              observedGeneration:
                type: integer
    
    additionalPrinterColumns:
    - name: App Name
      type: string
      jsonPath: .spec.application.name
    - name: Version
      type: string
      jsonPath: .spec.application.version
    - name: Replicas
      type: string
      jsonPath: .status.replicas
    - name: Ready
      type: string
      jsonPath: .status.readyReplicas
    - name: Status
      type: string
      jsonPath: .status.phase
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
  
  scope: Namespaced
  
  names:
    plural: applications
    singular: application
    kind: Application
    shortNames:
    - app
    categories:
    - all

---
# Application 实例
apiVersion: app.example.com/v1
kind: Application
metadata:
  name: web-frontend
  namespace: production
  labels:
    app: web-frontend
    tier: frontend
spec:
  application:
    name: "web-frontend"
    version: "v1.2.0"
    image: "myregistry/web-frontend:v1.2.0"
  
  replicas: 5
  
  service:
    enabled: true
    type: "LoadBalancer"
    port: 80
  
  resources:
    requests:
      cpu: "200m"
      memory: "256Mi"
    limits:
      cpu: "500m"
      memory: "512Mi"
  
  healthCheck:
    enabled: true
    path: "/api/health"
    port: 8080
    initialDelaySeconds: 15
    periodSeconds: 5

status:
  replicas: 5
  readyReplicas: 5
  availableReplicas: 5
  selector: "app=web-frontend"
  phase: "Running"
  conditions:
  - type: "Available"
    status: "True"
    reason: "MinimumReplicasAvailable"
    message: "Deployment has minimum availability."
    lastTransitionTime: "2024-01-15T10:30:00Z"
  - type: "Progressing"
    status: "True"
    reason: "NewReplicaSetAvailable"
    message: "ReplicaSet has successfully progressed."
    lastTransitionTime: "2024-01-15T10:25:00Z"
  observedGeneration: 1

2. 带有验证和默认值的 CRD

yaml
# 高级验证和默认值的 CRD
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: microservices.platform.example.com
  labels:
    platform: microservices
spec:
  group: platform.example.com
  
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              # 服务基本信息
              service:
                type: object
                properties:
                  name:
                    type: string
                    minLength: 1
                    maxLength: 63
                    pattern: '^[a-z0-9]([-a-z0-9]*[a-z0-9])?$'
                    description: "服务名称,必须符合 DNS 标准"
                  description:
                    type: string
                    maxLength: 256
                    description: "服务描述"
                  version:
                    type: string
                    pattern: '^v[0-9]+\.[0-9]+\.[0-9]+$'
                    description: "服务版本,格式:v1.0.0"
                  team:
                    type: string
                    description: "负责团队"
                  owner:
                    type: string
                    format: email
                    description: "服务负责人邮箱"
                required: ["name", "version", "team", "owner"]
              
              # 部署配置
              deployment:
                type: object
                properties:
                  image:
                    type: string
                    pattern: '^[a-zA-Z0-9._/-]+:[a-zA-Z0-9._-]+$'
                    description: "容器镜像"
                  replicas:
                    type: integer
                    minimum: 1
                    maximum: 50
                    default: 2
                    description: "副本数量"
                  strategy:
                    type: object
                    properties:
                      type:
                        type: string
                        enum: ["RollingUpdate", "Recreate"]
                        default: "RollingUpdate"
                      rollingUpdate:
                        type: object
                        properties:
                          maxUnavailable:
                            oneOf:
                            - type: integer
                              minimum: 0
                            - type: string
                              pattern: '^[0-9]+%$'
                            default: "25%"
                          maxSurge:
                            oneOf:
                            - type: integer
                              minimum: 0
                            - type: string
                              pattern: '^[0-9]+%$'
                            default: "25%"
                    default:
                      type: "RollingUpdate"
                      rollingUpdate:
                        maxUnavailable: "25%"
                        maxSurge: "25%"
                  resources:
                    type: object
                    properties:
                      requests:
                        type: object
                        properties:
                          cpu:
                            type: string
                            pattern: '^[0-9]+m?$'
                            default: "100m"
                          memory:
                            type: string
                            pattern: '^[0-9]+[KMGT]i$'
                            default: "128Mi"
                        default:
                          cpu: "100m"
                          memory: "128Mi"
                      limits:
                        type: object
                        properties:
                          cpu:
                            type: string
                            pattern: '^[0-9]+m?$'
                            default: "500m"
                          memory:
                            type: string
                            pattern: '^[0-9]+[KMGT]i$'
                            default: "512Mi"
                        default:
                          cpu: "500m"
                          memory: "512Mi"
                    default:
                      requests:
                        cpu: "100m"
                        memory: "128Mi"
                      limits:
                        cpu: "500m"
                        memory: "512Mi"
                required: ["image"]
              
              # 网络配置
              networking:
                type: object
                properties:
                  service:
                    type: object
                    properties:
                      enabled:
                        type: boolean
                        default: true
                      type:
                        type: string
                        enum: ["ClusterIP", "NodePort", "LoadBalancer"]
                        default: "ClusterIP"
                      ports:
                        type: array
                        items:
                          type: object
                          properties:
                            name:
                              type: string
                            port:
                              type: integer
                              minimum: 1
                              maximum: 65535
                            targetPort:
                              oneOf:
                              - type: integer
                                minimum: 1
                                maximum: 65535
                              - type: string
                            protocol:
                              type: string
                              enum: ["TCP", "UDP"]
                              default: "TCP"
                          required: ["port"]
                        minItems: 1
                        default:
                        - port: 80
                          targetPort: 8080
                          protocol: "TCP"
                    default:
                      enabled: true
                      type: "ClusterIP"
                      ports:
                      - port: 80
                        targetPort: 8080
                        protocol: "TCP"
                  ingress:
                    type: object
                    properties:
                      enabled:
                        type: boolean
                        default: false
                      className:
                        type: string
                        default: "nginx"
                      hosts:
                        type: array
                        items:
                          type: object
                          properties:
                            host:
                              type: string
                              format: hostname
                            paths:
                              type: array
                              items:
                                type: object
                                properties:
                                  path:
                                    type: string
                                    default: "/"
                                  pathType:
                                    type: string
                                    enum: ["Exact", "Prefix", "ImplementationSpecific"]
                                    default: "Prefix"
                                default:
                                  path: "/"
                                  pathType: "Prefix"
                              default:
                              - path: "/"
                                pathType: "Prefix"
                          required: ["host"]
                      tls:
                        type: array
                        items:
                          type: object
                          properties:
                            secretName:
                              type: string
                            hosts:
                              type: array
                              items:
                                type: string
                                format: hostname
                    default:
                      enabled: false
                      className: "nginx"
                default:
                  service:
                    enabled: true
                    type: "ClusterIP"
                    ports:
                    - port: 80
                      targetPort: 8080
                      protocol: "TCP"
                  ingress:
                    enabled: false
                    className: "nginx"
              
              # 监控配置
              monitoring:
                type: object
                properties:
                  enabled:
                    type: boolean
                    default: true
                  metrics:
                    type: object
                    properties:
                      enabled:
                        type: boolean
                        default: true
                      path:
                        type: string
                        default: "/metrics"
                      port:
                        type: integer
                        minimum: 1
                        maximum: 65535
                        default: 9090
                    default:
                      enabled: true
                      path: "/metrics"
                      port: 9090
                  healthCheck:
                    type: object
                    properties:
                      enabled:
                        type: boolean
                        default: true
                      livenessProbe:
                        type: object
                        properties:
                          path:
                            type: string
                            default: "/health"
                          port:
                            type: integer
                            minimum: 1
                            maximum: 65535
                            default: 8080
                          initialDelaySeconds:
                            type: integer
                            minimum: 0
                            default: 30
                          periodSeconds:
                            type: integer
                            minimum: 1
                            default: 10
                        default:
                          path: "/health"
                          port: 8080
                          initialDelaySeconds: 30
                          periodSeconds: 10
                      readinessProbe:
                        type: object
                        properties:
                          path:
                            type: string
                            default: "/ready"
                          port:
                            type: integer
                            minimum: 1
                            maximum: 65535
                            default: 8080
                          initialDelaySeconds:
                            type: integer
                            minimum: 0
                            default: 5
                          periodSeconds:
                            type: integer
                            minimum: 1
                            default: 5
                        default:
                          path: "/ready"
                          port: 8080
                          initialDelaySeconds: 5
                          periodSeconds: 5
                    default:
                      enabled: true
                      livenessProbe:
                        path: "/health"
                        port: 8080
                        initialDelaySeconds: 30
                        periodSeconds: 10
                      readinessProbe:
                        path: "/ready"
                        port: 8080
                        initialDelaySeconds: 5
                        periodSeconds: 5
                default:
                  enabled: true
                  metrics:
                    enabled: true
                    path: "/metrics"
                    port: 9090
                  healthCheck:
                    enabled: true
                    livenessProbe:
                      path: "/health"
                      port: 8080
                      initialDelaySeconds: 30
                      periodSeconds: 10
                    readinessProbe:
                      path: "/ready"
                      port: 8080
                      initialDelaySeconds: 5
                      periodSeconds: 5
            
            required: ["service", "deployment"]
            
            # 全局默认值
            default:
              deployment:
                replicas: 2
                strategy:
                  type: "RollingUpdate"
                  rollingUpdate:
                    maxUnavailable: "25%"
                    maxSurge: "25%"
                resources:
                  requests:
                    cpu: "100m"
                    memory: "128Mi"
                  limits:
                    cpu: "500m"
                    memory: "512Mi"
              networking:
                service:
                  enabled: true
                  type: "ClusterIP"
                  ports:
                  - port: 80
                    targetPort: 8080
                    protocol: "TCP"
                ingress:
                  enabled: false
                  className: "nginx"
              monitoring:
                enabled: true
                metrics:
                  enabled: true
                  path: "/metrics"
                  port: 9090
                healthCheck:
                  enabled: true
                  livenessProbe:
                    path: "/health"
                    port: 8080
                    initialDelaySeconds: 30
                    periodSeconds: 10
                  readinessProbe:
                    path: "/ready"
                    port: 8080
                    initialDelaySeconds: 5
                    periodSeconds: 5
          
          status:
            type: object
            properties:
              phase:
                type: string
                enum: ["Pending", "Deploying", "Running", "Updating", "Failed"]
              conditions:
                type: array
                items:
                  type: object
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                    reason:
                      type: string
                    message:
                      type: string
                    lastTransitionTime:
                      type: string
                      format: date-time
              deploymentStatus:
                type: object
                properties:
                  replicas:
                    type: integer
                  readyReplicas:
                    type: integer
                  availableReplicas:
                    type: integer
              serviceStatus:
                type: object
                properties:
                  clusterIP:
                    type: string
                  externalIPs:
                    type: array
                    items:
                      type: string
              ingressStatus:
                type: object
                properties:
                  loadBalancer:
                    type: object
                    properties:
                      ingress:
                        type: array
                        items:
                          type: object
                          properties:
                            ip:
                              type: string
                            hostname:
                              type: string
    
    additionalPrinterColumns:
    - name: Service
      type: string
      jsonPath: .spec.service.name
    - name: Version
      type: string
      jsonPath: .spec.service.version
    - name: Team
      type: string
      jsonPath: .spec.service.team
    - name: Replicas
      type: integer
      jsonPath: .spec.deployment.replicas
    - name: Status
      type: string
      jsonPath: .status.phase
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
  
  scope: Namespaced
  
  names:
    plural: microservices
    singular: microservice
    kind: Microservice
    shortNames:
    - ms
    - micro
    categories:
    - all
    - platform

---
# Microservice 实例(使用默认值)
apiVersion: platform.example.com/v1
kind: Microservice
metadata:
  name: user-service
  namespace: production
  labels:
    app: user-service
    tier: backend
spec:
  service:
    name: "user-service"
    description: "用户管理服务"
    version: "v1.0.0"
    team: "backend-team"
    owner: "backend-team@example.com"
  
  deployment:
    image: "myregistry/user-service:v1.0.0"
    replicas: 3  # 覆盖默认值 2
  
  networking:
    service:
      ports:
      - port: 80
        targetPort: 8080
        name: "http"
    ingress:
      enabled: true  # 覆盖默认值 false
      hosts:
      - host: "user-service.example.com"
        paths:
        - path: "/api/users"
          pathType: "Prefix"
      tls:
      - secretName: "user-service-tls"
        hosts:
        - "user-service.example.com"
  
  # monitoring 使用所有默认值

status:
  phase: "Running"
  conditions:
  - type: "Available"
    status: "True"
    reason: "MinimumReplicasAvailable"
    message: "Deployment has minimum availability."
    lastTransitionTime: "2024-01-15T10:30:00Z"
  deploymentStatus:
    replicas: 3
    readyReplicas: 3
    availableReplicas: 3
  serviceStatus:
    clusterIP: "10.96.100.50"
  ingressStatus:
    loadBalancer:
      ingress:
      - ip: "203.0.113.10"

实际应用场景

1. 应用程序管理平台

yaml
# 应用程序生命周期管理
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: applifecycles.platform.example.com
spec:
  group: platform.example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              # 应用元数据
              metadata:
                type: object
                properties:
                  name:
                    type: string
                  version:
                    type: string
                  description:
                    type: string
                  maintainer:
                    type: string
                  repository:
                    type: string
                    format: uri
              
              # 环境配置
              environments:
                type: array
                items:
                  type: object
                  properties:
                    name:
                      type: string
                      enum: ["development", "staging", "production"]
                    namespace:
                      type: string
                    config:
                      type: object
                      properties:
                        replicas:
                          type: integer
                        resources:
                          type: object
                        env:
                          type: array
                          items:
                            type: object
                            properties:
                              name:
                                type: string
                              value:
                                type: string
              
              # 部署策略
              deploymentStrategy:
                type: object
                properties:
                  type:
                    type: string
                    enum: ["BlueGreen", "Canary", "RollingUpdate"]
                  canary:
                    type: object
                    properties:
                      steps:
                        type: array
                        items:
                          type: object
                          properties:
                            weight:
                              type: integer
                            duration:
                              type: string
              
              # CI/CD 集成
              cicd:
                type: object
                properties:
                  enabled:
                    type: boolean
                  pipeline:
                    type: string
                  triggers:
                    type: array
                    items:
                      type: string
                      enum: ["push", "tag", "manual"]
          
          status:
            type: object
            properties:
              environments:
                type: array
                items:
                  type: object
                  properties:
                    name:
                      type: string
                    status:
                      type: string
                    version:
                      type: string
                    lastDeployment:
                      type: string
                      format: date-time
  scope: Namespaced
  names:
    plural: applifecycles
    singular: applifecycle
    kind: AppLifecycle
    shortNames: ["alc"]

2. 数据库管理

yaml
# 数据库集群管理
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: dbclusters.database.example.com
spec:
  group: database.example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              # 数据库类型和版本
              database:
                type: object
                properties:
                  type:
                    type: string
                    enum: ["mysql", "postgresql", "mongodb", "redis"]
                  version:
                    type: string
                  charset:
                    type: string
                    default: "utf8mb4"
              
              # 集群配置
              cluster:
                type: object
                properties:
                  mode:
                    type: string
                    enum: ["standalone", "master-slave", "cluster"]
                  nodes:
                    type: integer
                    minimum: 1
                    maximum: 9
                  topology:
                    type: object
                    properties:
                      primary:
                        type: integer
                        default: 1
                      secondary:
                        type: integer
                        default: 2
              
              # 存储配置
              storage:
                type: object
                properties:
                  size:
                    type: string
                  storageClass:
                    type: string
                  backup:
                    type: object
                    properties:
                      enabled:
                        type: boolean
                      schedule:
                        type: string
                      retention:
                        type: string
              
              # 安全配置
              security:
                type: object
                properties:
                  tls:
                    type: boolean
                    default: true
                  authentication:
                    type: object
                    properties:
                      enabled:
                        type: boolean
                        default: true
                      method:
                        type: string
                        enum: ["password", "certificate", "ldap"]
                  encryption:
                    type: object
                    properties:
                      atRest:
                        type: boolean
                        default: true
                      inTransit:
                        type: boolean
                        default: true
              
              # 监控配置
              monitoring:
                type: object
                properties:
                  enabled:
                    type: boolean
                    default: true
                  metrics:
                    type: array
                    items:
                      type: string
                  alerts:
                    type: array
                    items:
                      type: object
                      properties:
                        name:
                          type: string
                        condition:
                          type: string
                        threshold:
                          type: string
          
          status:
            type: object
            properties:
              phase:
                type: string
                enum: ["Creating", "Running", "Updating", "Failed"]
              nodes:
                type: array
                items:
                  type: object
                  properties:
                    name:
                      type: string
                    role:
                      type: string
                    status:
                      type: string
                    endpoint:
                      type: string
              backup:
                type: object
                properties:
                  lastBackup:
                    type: string
                    format: date-time
                  status:
                    type: string
  scope: Namespaced
  names:
    plural: dbclusters
    singular: dbcluster
    kind: DBCluster
    shortNames: ["dbc"]

3. 监控和告警

yaml
# 监控规则管理
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: monitoringrules.monitoring.example.com
spec:
  group: monitoring.example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              # 监控目标
              targets:
                type: array
                items:
                  type: object
                  properties:
                    type:
                      type: string
                      enum: ["service", "pod", "node", "custom"]
                    selector:
                      type: object
                      additionalProperties:
                        type: string
                    metrics:
                      type: array
                      items:
                        type: string
              
              # 告警规则
              alerts:
                type: array
                items:
                  type: object
                  properties:
                    name:
                      type: string
                    condition:
                      type: string
                    threshold:
                      type: object
                      properties:
                        value:
                          type: number
                        operator:
                          type: string
                          enum: [">=", "<=", ">", "<", "==", "!="]
                    duration:
                      type: string
                    severity:
                      type: string
                      enum: ["info", "warning", "critical"]
                    actions:
                      type: array
                      items:
                        type: object
                        properties:
                          type:
                            type: string
                            enum: ["email", "slack", "webhook", "scale"]
                          config:
                            type: object
              
              # 仪表板
              dashboards:
                type: array
                items:
                  type: object
                  properties:
                    name:
                      type: string
                    panels:
                      type: array
                      items:
                        type: object
                        properties:
                          title:
                            type: string
                          type:
                            type: string
                            enum: ["graph", "table", "gauge", "stat"]
                          query:
                            type: string
          
          status:
            type: object
            properties:
              activeAlerts:
                type: integer
              lastEvaluation:
                type: string
                format: date-time
              conditions:
                type: array
                items:
                  type: object
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                    message:
                      type: string
  scope: Namespaced
  names:
    plural: monitoringrules
    singular: monitoringrule
    kind: MonitoringRule
    shortNames: ["mr"]

命令行操作

基本操作

bash
# 查看 CRD
kubectl get crd
kubectl get customresourcedefinitions

# 查看特定 CRD
kubectl describe crd webapps.example.com
kubectl get crd webapps.example.com -o yaml

# 查看 CRD 支持的版本
kubectl get crd webapps.example.com -o jsonpath='{.spec.versions[*].name}'

# 查看 Custom Resource
kubectl get webapps
kubectl get webapp my-webapp -o yaml

# 使用简写
kubectl get wa  # 如果定义了 shortNames

CRD 管理

bash
# 创建 CRD
kubectl apply -f webapp-crd.yaml

# 更新 CRD
kubectl apply -f updated-webapp-crd.yaml

# 删除 CRD(会删除所有相关的 CR)
kubectl delete crd webapps.example.com

# 验证 CRD 状态
kubectl get crd webapps.example.com -o jsonpath='{.status.conditions}'

# 查看 CRD 的 OpenAPI 模式
kubectl get crd webapps.example.com -o jsonpath='{.spec.versions[0].schema.openAPIV3Schema}' | jq .

Custom Resource 操作

bash
# 创建 Custom Resource
kubectl apply -f my-webapp.yaml

# 查看 Custom Resource
kubectl get webapp my-webapp
kubectl describe webapp my-webapp

# 更新 Custom Resource
kubectl patch webapp my-webapp --type='merge' -p='{
  "spec": {
    "replicas": 5
  }
}'

# 删除 Custom Resource
kubectl delete webapp my-webapp

# 使用 kubectl scale(如果支持 scale 子资源)
kubectl scale webapp my-webapp --replicas=10

# 查看状态(如果支持 status 子资源)
kubectl get webapp my-webapp -o jsonpath='{.status}'

验证和调试

bash
# 验证 CRD 配置
kubectl apply --dry-run=client -f webapp-crd.yaml

# 验证 Custom Resource
kubectl apply --dry-run=client -f my-webapp.yaml

# 查看 API 资源
kubectl api-resources | grep example.com

# 查看 API 版本
kubectl api-versions | grep example.com

# 获取 CRD 的详细信息
kubectl explain webapp
kubectl explain webapp.spec
kubectl explain webapp.spec.replicas

# 查看事件
kubectl get events --field-selector involvedObject.kind=WebApp

批量操作脚本

bash
#!/bin/bash
# CRD 和 CR 管理脚本

CRD_DIR="./crds"
CR_DIR="./custom-resources"
NAMESPACE="default"

echo "=== CRD 和 Custom Resource 管理 ==="

# 1. 部署所有 CRD
echo "1. 部署 CRD..."
for crd_file in $CRD_DIR/*.yaml; do
    if [[ -f "$crd_file" ]]; then
        echo "部署 CRD: $crd_file"
        kubectl apply -f "$crd_file"
        
        # 等待 CRD 就绪
        crd_name=$(kubectl get -f "$crd_file" -o jsonpath='{.metadata.name}')
        echo "等待 CRD $crd_name 就绪..."
        kubectl wait --for=condition=Established crd/$crd_name --timeout=60s
    fi
done

# 2. 验证 CRD 状态
echo "2. 验证 CRD 状态"
for crd_file in $CRD_DIR/*.yaml; do
    if [[ -f "$crd_file" ]]; then
        crd_name=$(kubectl get -f "$crd_file" -o jsonpath='{.metadata.name}')
        echo "检查 CRD $crd_name 状态..."
        
        # 检查 CRD 是否建立
        if kubectl get crd $crd_name &>/dev/null; then
            status=$(kubectl get crd $crd_name -o jsonpath='{.status.conditions[?(@.type=="Established")].status}')
            if [[ "$status" == "True" ]]; then
                echo "✓ CRD $crd_name 已建立"
            else
                echo "✗ CRD $crd_name 未建立"
            fi
        else
            echo "✗ CRD $crd_name 不存在"
        fi
    fi
done

# 3. 部署 Custom Resources
echo "3. 部署 Custom Resources..."
for cr_file in $CR_DIR/*.yaml; do
    if [[ -f "$cr_file" ]]; then
        echo "部署 CR: $cr_file"
        kubectl apply -f "$cr_file" -n $NAMESPACE
    fi
done

# 4. 验证 Custom Resources
echo "4. 验证 Custom Resources..."
kubectl get all -n $NAMESPACE

# 5. 显示自定义资源
echo "5. 显示自定义资源..."
for crd_file in $CRD_DIR/*.yaml; do
    if [[ -f "$crd_file" ]]; then
        crd_name=$(kubectl get -f "$crd_file" -o jsonpath='{.metadata.name}')
        resource_name=$(kubectl get crd $crd_name -o jsonpath='{.spec.names.plural}')
        echo "=== $resource_name ==="
        kubectl get $resource_name -n $NAMESPACE
    fi
done

echo "完成!"

CRD 版本管理脚本

bash
#!/bin/bash
# CRD 版本管理和迁移脚本

CRD_NAME="$1"
OLD_VERSION="$2"
NEW_VERSION="$3"

if [[ -z "$CRD_NAME" || -z "$OLD_VERSION" || -z "$NEW_VERSION" ]]; then
    echo "用法: $0 <crd-name> <old-version> <new-version>"
    echo "示例: $0 webapps.example.com v1beta1 v1"
    exit 1
fi

echo "=== CRD 版本迁移: $CRD_NAME ($OLD_VERSION -> $NEW_VERSION) ==="

# 1. 检查当前 CRD 状态
echo "1. 检查当前 CRD 状态..."
if ! kubectl get crd $CRD_NAME &>/dev/null; then
    echo "错误: CRD $CRD_NAME 不存在"
    exit 1
fi

# 获取当前版本信息
current_versions=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.versions[*].name}')
echo "当前支持的版本: $current_versions"

storage_version=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.versions[?(@.storage==true)].name}')
echo "当前存储版本: $storage_version"

# 2. 备份现有资源
echo "2. 备份现有资源..."
resource_plural=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.names.plural}')
backup_dir="./backup-$(date +%Y%m%d-%H%M%S)"
mkdir -p $backup_dir

# 备份所有命名空间的资源
for ns in $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}'); do
    resources=$(kubectl get $resource_plural -n $ns -o name 2>/dev/null)
    if [[ -n "$resources" ]]; then
        echo "备份命名空间 $ns 中的资源..."
        kubectl get $resource_plural -n $ns -o yaml > "$backup_dir/$ns-$resource_plural.yaml"
    fi
done

# 备份集群级别资源(如果适用)
scope=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.scope}')
if [[ "$scope" == "Cluster" ]]; then
    echo "备份集群级别资源..."
    kubectl get $resource_plural -o yaml > "$backup_dir/cluster-$resource_plural.yaml"
fi

echo "备份完成,保存在: $backup_dir"

# 3. 验证新版本兼容性
echo "3. 验证版本兼容性..."
if [[ "$OLD_VERSION" == "$storage_version" ]]; then
    echo "警告: 正在迁移存储版本,请确保已测试兼容性"
    read -p "是否继续? (y/N): " confirm
    if [[ "$confirm" != "y" && "$confirm" != "Y" ]]; then
        echo "取消迁移"
        exit 0
    fi
fi

# 4. 更新 CRD(添加新版本)
echo "4. 更新 CRD 以支持新版本..."
echo "请手动更新 CRD 定义文件以包含新版本 $NEW_VERSION"
echo "然后运行: kubectl apply -f <updated-crd.yaml>"

read -p "CRD 已更新? (y/N): " updated
if [[ "$updated" != "y" && "$updated" != "Y" ]]; then
    echo "请先更新 CRD"
    exit 1
fi

# 5. 验证新版本可用
echo "5. 验证新版本..."
new_versions=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.versions[*].name}')
if [[ "$new_versions" == *"$NEW_VERSION"* ]]; then
    echo "✓ 新版本 $NEW_VERSION 已添加"
else
    echo "✗ 新版本 $NEW_VERSION 未找到"
    exit 1
fi

# 6. 迁移资源(如果需要)
echo "6. 资源迁移..."
echo "如果需要迁移现有资源到新版本,请手动执行"
echo "示例命令:"
echo "kubectl get $resource_plural -o yaml | sed 's/$OLD_VERSION/$NEW_VERSION/g' | kubectl apply -f -"

echo "迁移完成!"
echo "备份位置: $backup_dir"

故障排查

常见问题

1. CRD 创建失败

问题现象

bash
error validating data: ValidationError(CustomResourceDefinition.spec.versions[0].schema.openAPIV3Schema): 
unknown field "example" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.JSONSchemaProps

解决方案

bash
# 检查 OpenAPI 模式语法
kubectl apply --dry-run=client -f crd.yaml

# 验证 JSON Schema
cat crd.yaml | yq eval '.spec.versions[0].schema.openAPIV3Schema' - | jq .

# 常见错误修复
# 1. 移除不支持的字段(如 example)
# 2. 确保 type 字段正确
# 3. 检查 required 字段是否存在于 properties 中

2. Custom Resource 验证失败

问题现象

bash
error validating data: ValidationError(WebApp.spec.replicas): 
invalid value: 0, must be greater than or equal to 1

解决方案

bash
# 检查资源定义
kubectl explain webapp.spec.replicas

# 验证资源配置
kubectl apply --dry-run=client -f webapp.yaml

# 查看 CRD 验证规则
kubectl get crd webapps.example.com -o jsonpath='{.spec.versions[0].schema.openAPIV3Schema.properties.spec}' | jq .

3. 版本转换问题

问题现象

bash
conversion webhook for webapps.example.com failed: Post "https://webhook.example.com/convert": 
connection refused

解决方案

bash
# 检查转换 webhook 状态
kubectl get crd webapps.example.com -o jsonpath='{.spec.conversion}'

# 验证 webhook 服务
kubectl get svc webhook-service -n webhook-system
kubectl get endpoints webhook-service -n webhook-system

# 检查 webhook 证书
kubectl get secret webhook-certs -n webhook-system -o yaml

# 测试 webhook 连接
kubectl run test-pod --image=curlimages/curl --rm -it -- \
  curl -k https://webhook-service.webhook-system.svc.cluster.local/convert

诊断步骤

bash
#!/bin/bash
# CRD 诊断脚本

CRD_NAME="$1"
if [[ -z "$CRD_NAME" ]]; then
    echo "用法: $0 <crd-name>"
    echo "示例: $0 webapps.example.com"
    exit 1
fi

echo "=== CRD 诊断: $CRD_NAME ==="

# 1. 检查 CRD 存在性
echo "1. 检查 CRD 存在性..."
if kubectl get crd $CRD_NAME &>/dev/null; then
    echo "✓ CRD $CRD_NAME 存在"
else
    echo "✗ CRD $CRD_NAME 不存在"
    exit 1
fi

# 2. 检查 CRD 状态
echo "2. 检查 CRD 状态..."
established=$(kubectl get crd $CRD_NAME -o jsonpath='{.status.conditions[?(@.type=="Established")].status}')
names_accepted=$(kubectl get crd $CRD_NAME -o jsonpath='{.status.conditions[?(@.type=="NamesAccepted")].status}')

echo "Established: $established"
echo "NamesAccepted: $names_accepted"

if [[ "$established" != "True" || "$names_accepted" != "True" ]]; then
    echo "✗ CRD 状态异常"
    kubectl describe crd $CRD_NAME
    exit 1
fi

# 3. 检查版本信息
echo "3. 检查版本信息..."
versions=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.versions[*].name}')
storage_version=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.versions[?(@.storage==true)].name}')
served_versions=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.versions[?(@.served==true)].name}')

echo "所有版本: $versions"
echo "存储版本: $storage_version"
echo "服务版本: $served_versions"

# 4. 检查 API 可用性
echo "4. 检查 API 可用性..."
resource_plural=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.names.plural}')
group=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.group}')

for version in $served_versions; do
    echo "测试 API: $group/$version"
    if kubectl api-resources --api-group=$group | grep -q $resource_plural; then
        echo "✓ API $group/$version 可用"
    else
        echo "✗ API $group/$version 不可用"
    fi
done

# 5. 检查现有资源
echo "5. 检查现有资源..."
scope=$(kubectl get crd $CRD_NAME -o jsonpath='{.spec.scope}')

if [[ "$scope" == "Namespaced" ]]; then
    echo "命名空间级别资源:"
    for ns in $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}'); do
        count=$(kubectl get $resource_plural -n $ns --no-headers 2>/dev/null | wc -l)
        if [[ $count -gt 0 ]]; then
            echo "  $ns: $count 个资源"
        fi
    done
else
    echo "集群级别资源:"
    count=$(kubectl get $resource_plural --no-headers 2>/dev/null | wc -l)
    echo "  总计: $count 个资源"
fi

# 6. 检查相关事件
echo "6. 检查相关事件..."
kubectl get events --all-namespaces --field-selector involvedObject.kind=$CRD_NAME

# 7. 检查 OpenAPI 模式
echo "7. 验证 OpenAPI 模式..."
for version in $served_versions; do
    echo "检查版本 $version 的模式..."
    schema=$(kubectl get crd $CRD_NAME -o jsonpath="{.spec.versions[?(@.name=='$version')].schema.openAPIV3Schema}")
    if [[ -n "$schema" ]]; then
        echo "✓ 版本 $version 有有效模式"
    else
        echo "✗ 版本 $version 缺少模式"
    fi
done

echo "诊断完成!"

常见错误和解决方案

1. 模式验证错误

yaml
# 错误示例
properties:
  replicas:
    type: integer
    minimum: 1
    example: 3  # 错误:不支持 example 字段

# 正确示例
properties:
  replicas:
    type: integer
    minimum: 1
    default: 3  # 使用 default 而不是 example

2. 版本兼容性问题

yaml
# 问题:存储版本变更导致数据丢失
versions:
- name: v1beta1
  served: true
  storage: false  # 从 true 改为 false
- name: v1
  served: true
  storage: true   # 新的存储版本

# 解决方案:使用转换 webhook
conversion:
  strategy: Webhook
  webhook:
    clientConfig:
      service:
        name: webhook-service
        namespace: webhook-system
        path: /convert
    conversionReviewVersions: ["v1", "v1beta1"]

3. 子资源配置错误

yaml
# 错误示例
subresources:
  scale:
    specReplicasPath: .spec.replicas
    statusReplicasPath: .status.replicas
    # 缺少 labelSelectorPath

# 正确示例
subresources:
  scale:
    specReplicasPath: .spec.replicas
    statusReplicasPath: .status.replicas
    labelSelectorPath: .status.selector  # 必需字段

最佳实践

设计原则

1. API 设计

yaml
# 良好的 API 设计示例
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: applications.platform.example.com
  # 使用标签进行分类
  labels:
    app.kubernetes.io/name: application-operator
    app.kubernetes.io/component: crd
    app.kubernetes.io/version: v1.0.0
spec:
  group: platform.example.com  # 使用有意义的组名
  
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              # 使用清晰的字段名
              applicationName:
                type: string
                description: "应用程序名称"
                minLength: 1
                maxLength: 63
                pattern: '^[a-z0-9]([-a-z0-9]*[a-z0-9])?$'
              
              # 提供合理的默认值
              replicas:
                type: integer
                minimum: 1
                maximum: 100
                default: 1
                description: "副本数量"
              
              # 使用枚举限制选项
              environment:
                type: string
                enum: ["development", "staging", "production"]
                default: "development"
                description: "部署环境"
              
              # 嵌套对象结构清晰
              resources:
                type: object
                description: "资源配置"
                properties:
                  requests:
                    type: object
                    properties:
                      cpu:
                        type: string
                        pattern: '^[0-9]+m?$'
                        default: "100m"
                      memory:
                        type: string
                        pattern: '^[0-9]+[KMGT]i$'
                        default: "128Mi"
                  limits:
                    type: object
                    properties:
                      cpu:
                        type: string
                        pattern: '^[0-9]+m?$'
                        default: "500m"
                      memory:
                        type: string
                        pattern: '^[0-9]+[KMGT]i$'
                        default: "512Mi"
            
            # 明确必需字段
            required: ["applicationName"]
          
          status:
            type: object
            description: "应用程序状态"
            properties:
              phase:
                type: string
                enum: ["Pending", "Running", "Failed"]
                description: "当前阶段"
              
              conditions:
                type: array
                description: "状态条件"
                items:
                  type: object
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                      enum: ["True", "False", "Unknown"]
                    reason:
                      type: string
                    message:
                      type: string
                    lastTransitionTime:
                      type: string
                      format: date-time
                  required: ["type", "status"]
    
    # 提供有用的打印列
    additionalPrinterColumns:
    - name: App Name
      type: string
      description: Application name
      jsonPath: .spec.applicationName
    - name: Environment
      type: string
      description: Deployment environment
      jsonPath: .spec.environment
    - name: Replicas
      type: integer
      description: Number of replicas
      jsonPath: .spec.replicas
    - name: Status
      type: string
      description: Current status
      jsonPath: .status.phase
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
  
  scope: Namespaced
  
  names:
    plural: applications
    singular: application
    kind: Application
    # 提供有用的简写
    shortNames: ["app", "apps"]
    # 分类便于发现
    categories: ["all", "platform"]

2. 版本管理策略

yaml
# 版本演进策略
versions:
# 当前稳定版本
- name: v1
  served: true
  storage: true
  schema:
    # 完整的 v1 模式

# 下一个版本(开发中)
- name: v1alpha1
  served: true
  storage: false
  schema:
    # 新功能的模式

# 废弃版本(向后兼容)
- name: v1beta1
  served: true
  storage: false
  deprecated: true
  deprecationWarning: "platform.example.com/v1beta1 Application is deprecated; use platform.example.com/v1 Application"
  schema:
    # 简化的向后兼容模式

3. 安全考虑

yaml
# RBAC 配置示例
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: application-operator
rules:
# CRD 管理权限
- apiGroups: ["apiextensions.k8s.io"]
  resources: ["customresourcedefinitions"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]

# Custom Resource 权限
- apiGroups: ["platform.example.com"]
  resources: ["applications"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

# 状态更新权限
- apiGroups: ["platform.example.com"]
  resources: ["applications/status"]
  verbs: ["get", "update", "patch"]

# 扩缩容权限(如果支持)
- apiGroups: ["platform.example.com"]
  resources: ["applications/scale"]
  verbs: ["get", "update", "patch"]

---
# 用户权限示例
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: application-user
rules:
# 只读权限
- apiGroups: ["platform.example.com"]
  resources: ["applications"]
  verbs: ["get", "list", "watch"]

# 特定资源的管理权限
- apiGroups: ["platform.example.com"]
  resources: ["applications"]
  resourceNames: ["my-app"]
  verbs: ["update", "patch"]

监控和告警

yaml
# Prometheus 监控规则
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: crd-monitoring
  labels:
    prometheus: kube-prometheus
spec:
  groups:
  - name: crd.rules
    rules:
    # CRD 可用性监控
    - alert: CRDNotEstablished
      expr: |
        apiserver_crd_established{condition="False"} == 1
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "CRD {{ $labels.crd }} is not established"
        description: "CRD {{ $labels.crd }} has not been established for more than 5 minutes."
    
    # Custom Resource 数量监控
    - alert: CustomResourceCountHigh
      expr: |
        apiserver_current_inflight_requests{requestKind="Application"} > 1000
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "High number of Custom Resources"
        description: "There are {{ $value }} Application resources, which is unusually high."
    
    # API 请求错误率
    - alert: CustomResourceAPIErrors
      expr: |
        rate(apiserver_request_total{resource="applications",code=~"5.."}[5m]) > 0.1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High error rate for Custom Resource API"
        description: "Error rate for Application API is {{ $value }} errors per second."

自动化管理

bash
#!/bin/bash
# CRD 自动化部署和管理脚本

set -euo pipefail

# 配置
CRD_DIR="./crds"
CR_DIR="./examples"
NAMESPACE="default"
DRY_RUN=${DRY_RUN:-false}
VERBOSE=${VERBOSE:-false}

# 日志函数
log() {
    echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"
}

verbose() {
    if [[ "$VERBOSE" == "true" ]]; then
        log "$*"
    fi
}

error() {
    log "ERROR: $*" >&2
}

# 验证 CRD 函数
validate_crd() {
    local crd_file="$1"
    
    verbose "验证 CRD 文件: $crd_file"
    
    # 语法验证
    if ! kubectl apply --dry-run=client -f "$crd_file" &>/dev/null; then
        error "CRD 文件语法错误: $crd_file"
        return 1
    fi
    
    # 检查必需字段
    local crd_name
    crd_name=$(yq eval '.metadata.name' "$crd_file")
    if [[ "$crd_name" == "null" ]]; then
        error "CRD 缺少名称: $crd_file"
        return 1
    fi
    
    # 检查版本
    local versions
    versions=$(yq eval '.spec.versions | length' "$crd_file")
    if [[ "$versions" -eq 0 ]]; then
        error "CRD 缺少版本定义: $crd_file"
        return 1
    fi
    
    # 检查存储版本
    local storage_versions
    storage_versions=$(yq eval '.spec.versions[] | select(.storage == true) | .name' "$crd_file" | wc -l)
    if [[ "$storage_versions" -ne 1 ]]; then
        error "CRD 必须有且仅有一个存储版本: $crd_file"
        return 1
    fi
    
    verbose "✓ CRD 验证通过: $crd_file"
    return 0
}

# 部署 CRD 函数
deploy_crd() {
    local crd_file="$1"
    local crd_name
    
    crd_name=$(yq eval '.metadata.name' "$crd_file")
    log "部署 CRD: $crd_name"
    
    if [[ "$DRY_RUN" == "true" ]]; then
        log "[DRY RUN] 将部署 CRD: $crd_name"
        return 0
    fi
    
    # 应用 CRD
    if kubectl apply -f "$crd_file"; then
        log "✓ CRD 应用成功: $crd_name"
    else
        error "CRD 应用失败: $crd_name"
        return 1
    fi
    
    # 等待 CRD 建立
    log "等待 CRD 建立: $crd_name"
    if kubectl wait --for=condition=Established crd/$crd_name --timeout=60s; then
        log "✓ CRD 已建立: $crd_name"
    else
        error "CRD 建立超时: $crd_name"
        return 1
    fi
    
    return 0
}

# 验证 Custom Resource 函数
validate_cr() {
    local cr_file="$1"
    
    verbose "验证 Custom Resource: $cr_file"
    
    # 语法验证
    if ! kubectl apply --dry-run=client -f "$cr_file" &>/dev/null; then
        error "Custom Resource 语法错误: $cr_file"
        return 1
    fi
    
    verbose "✓ Custom Resource 验证通过: $cr_file"
    return 0
}

# 主函数
main() {
    log "开始 CRD 自动化管理"
    
    # 检查依赖
    for cmd in kubectl yq; do
        if ! command -v $cmd &>/dev/null; then
            error "缺少依赖命令: $cmd"
            exit 1
        fi
    done
    
    # 验证目录
    if [[ ! -d "$CRD_DIR" ]]; then
        error "CRD 目录不存在: $CRD_DIR"
        exit 1
    fi
    
    # 1. 验证和部署 CRD
    log "=== 阶段 1: 验证和部署 CRD ==="
    
    local crd_files=()
    while IFS= read -r -d '' file; do
        crd_files+=("$file")
    done < <(find "$CRD_DIR" -name "*.yaml" -o -name "*.yml" -print0)
    
    if [[ ${#crd_files[@]} -eq 0 ]]; then
        log "未找到 CRD 文件"
    else
        for crd_file in "${crd_files[@]}"; do
            if validate_crd "$crd_file"; then
                deploy_crd "$crd_file"
            else
                error "跳过无效的 CRD: $crd_file"
            fi
        done
    fi
    
    # 2. 验证和部署示例 Custom Resource
    if [[ -d "$CR_DIR" ]]; then
        log "=== 阶段 2: 验证和部署示例 Custom Resource ==="
        
        local cr_files=()
        while IFS= read -r -d '' file; do
            cr_files+=("$file")
        done < <(find "$CR_DIR" -name "*.yaml" -o -name "*.yml" -print0)
        
        if [[ ${#cr_files[@]} -eq 0 ]]; then
            log "未找到 Custom Resource 文件"
        else
            for cr_file in "${cr_files[@]}"; do
                if validate_cr "$cr_file"; then
                    if [[ "$DRY_RUN" == "true" ]]; then
                        log "[DRY RUN] 将部署 Custom Resource: $cr_file"
                    else
                        log "部署 Custom Resource: $cr_file"
                        kubectl apply -f "$cr_file" -n "$NAMESPACE"
                    fi
                else
                    error "跳过无效的 Custom Resource: $cr_file"
                fi
            done
        fi
    fi
    
    # 3. 验证部署结果
    if [[ "$DRY_RUN" != "true" ]]; then
        log "=== 阶段 3: 验证部署结果 ==="
        
        # 显示 CRD 状态
        log "CRD 状态:"
        kubectl get crd
        
        # 显示 Custom Resource
        log "Custom Resource 状态:"
        for crd_file in "${crd_files[@]}"; do
            local crd_name resource_name
            crd_name=$(yq eval '.metadata.name' "$crd_file")
            resource_name=$(yq eval '.spec.names.plural' "$crd_file")
            
            if kubectl get crd "$crd_name" &>/dev/null; then
                log "$resource_name 资源:"
                kubectl get "$resource_name" -A 2>/dev/null || log "  无资源"
            fi
        done
    fi
    
    log "CRD 自动化管理完成"
}

# 显示帮助
show_help() {
    cat << EOF
CRD 自动化管理脚本

用法: $0 [选项]

选项:
  -h, --help     显示帮助信息
  -d, --dry-run  干运行模式,不实际部署
  -v, --verbose  详细输出
  -n, --namespace NAMESPACE  指定命名空间 (默认: default)
  --crd-dir DIR  CRD 文件目录 (默认: ./crds)
  --cr-dir DIR   Custom Resource 文件目录 (默认: ./examples)

环境变量:
  DRY_RUN=true   启用干运行模式
  VERBOSE=true   启用详细输出

示例:
  $0                           # 正常部署
  $0 --dry-run                 # 干运行
  $0 --verbose --namespace prod # 详细输出并部署到 prod 命名空间
EOF
}

# 解析命令行参数
while [[ $# -gt 0 ]]; do
    case $1 in
        -h|--help)
            show_help
            exit 0
            ;;
        -d|--dry-run)
            DRY_RUN=true
            shift
            ;;
        -v|--verbose)
            VERBOSE=true
            shift
            ;;
        -n|--namespace)
            NAMESPACE="$2"
            shift 2
            ;;
        --crd-dir)
            CRD_DIR="$2"
            shift 2
            ;;
        --cr-dir)
            CR_DIR="$2"
            shift 2
            ;;
        *)
            error "未知选项: $1"
            show_help
            exit 1
            ;;
    esac
done

# 运行主函数
main

总结

Custom Resource Definition (CRD) 是 Kubernetes 最强大的扩展机制之一,它允许用户:

核心价值

  1. API 扩展:无需修改 Kubernetes 核心代码即可添加新的资源类型
  2. 声明式管理:使用熟悉的 YAML 和 kubectl 管理自定义资源
  3. 原生集成:与 Kubernetes 生态系统完全集成
  4. 版本控制:支持 API 版本演进和向后兼容

使用建议

  1. 设计阶段

    • 仔细设计 API 结构,考虑未来扩展性
    • 提供合理的默认值和验证规则
    • 使用清晰的字段名和描述
  2. 开发阶段

    • 从简单的 CRD 开始,逐步添加复杂功能
    • 充分测试 OpenAPI 模式验证
    • 考虑版本兼容性和迁移策略
  3. 运维阶段

    • 建立完善的监控和告警
    • 定期备份 Custom Resource
    • 制定版本升级计划

适用场景

  • 应用程序管理:定义应用程序生命周期管理资源
  • 基础设施管理:管理数据库、消息队列等基础设施
  • 平台抽象:为开发者提供高级抽象接口
  • Operator 开发:作为 Operator 模式的基础

CRD 是构建 Kubernetes 原生应用和平台的基石,正确使用可以大大提升系统的可扩展性和可维护性。