Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

Nginxが展開できない #12

Open
Noiri opened this issue Mar 7, 2021 · 19 comments
Open

Nginxが展開できない #12

Noiri opened this issue Mar 7, 2021 · 19 comments
Assignees
Labels
50 Day2 problem at Day2

Comments

@Noiri
Copy link
Collaborator

Noiri commented Mar 7, 2021

概要
インカレ技術サークル ICTSC に入ったあなたは、K8sクラスタチームに配属されました。
チームの先輩から、K8s上にチーム紹介のWebサイトをつくるよう頼まれました。
K8sのmanifestは先輩に渡されたものから変更しないで欲しいそうです。
しかし、クラスタに適用したところ、Webサイト用のPodがPendingになってしまいます。
卒研の進捗が思わしくない先輩は教員の課題から手が離せないため、頑張ってあなたが解決してください。
解決後は、チーム名の書かれたWebサイトが見えるようにしておいてください。
問題が解決できればサークル内で共有したいため、原因と解決方法を報告してください。
先輩から渡され、k8sに適用したmanifestは/home/user/manifestに保存されています。

初期状態
NginxのDeploymentがPendingになっている。
終了状態
pendigとなっていたDeploymentが正常に稼働している。
正常化したDeploymentによって稼働するNginxで、解答するチームの名前が書かれたWebサイトが確認できるようになっている。
開始時と同じmanifestが変更されずに適用されている

@Noiri Noiri added the 50 label Mar 7, 2021
@kitakou0313 kitakou0313 self-assigned this Mar 7, 2021
@ritsuxis ritsuxis added the Day2 problem at Day2 label Mar 7, 2021
@kitakou0313
Copy link
Member

user@kzz-k8s-master:~$ kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-64c5ffb9fd-666ct   0/1     Pending   0          6d16h
nginx-deployment-64c5ffb9fd-vgsdz   0/1     Pending   0          6d16h
nginx-deployment-64c5ffb9fd-wfcqf   0/1     Pending   0          6d16h
user@kzz-k8s-master:~$ kubectl get deployments
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   0/3     3            0           6d16h
user@kzz-k8s-master:~$

確かに動いてない

@kitakou0313
Copy link
Member

user@kzz-k8s-master:~$ cat /home/user/manifest/
cluster.yaml         kube-flannel.yml     namespace.yaml       test-nginx.yaml
common.yaml          metallb-config.yaml  operator.yaml        toolbox.yaml
filesystem.yaml      metallb.yaml         storageclass.yaml

たくさんある

@kitakou0313
Copy link
Member

kitakou0313 commented Mar 7, 2021

各三台のpodの詳細を見ていく

user@kzz-k8s-master:~$ kubectl describe pod nginx-deployment-64c5ffb9fd-666ct
Name:           nginx-deployment-64c5ffb9fd-666ct
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=nginx
                pod-template-hash=64c5ffb9fd
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/nginx-deployment-64c5ffb9fd
Containers:
  nginx:
    Image:        nginx:1.7.9
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /usr/share/nginx/html from nginx-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8jgmm (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  nginx-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  cephfs-pvc
    ReadOnly:   false
  default-token-8jgmm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8jgmm
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  3m45s (x543 over 11h)  default-scheduler  0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims.
user@kzz-k8s-master:~$ kubectl describe pods nginx-deployment-64c5ffb9fd-vgsdz
Name:           nginx-deployment-64c5ffb9fd-vgsdz
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=nginx
                pod-template-hash=64c5ffb9fd
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/nginx-deployment-64c5ffb9fd
Containers:
  nginx:
    Image:        nginx:1.7.9
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /usr/share/nginx/html from nginx-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8jgmm (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  nginx-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  cephfs-pvc
    ReadOnly:   false
  default-token-8jgmm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8jgmm
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  18s (x547 over 11h)  default-scheduler  0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims.
user@kzz-k8s-master:~$
user@kzz-k8s-master:~$ kubectl describe pods nginx-deployment-64c5ffb9fd-wfcqf
Name:           nginx-deployment-64c5ffb9fd-wfcqf
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=nginx
                pod-template-hash=64c5ffb9fd
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/nginx-deployment-64c5ffb9fd
Containers:
  nginx:
    Image:        nginx:1.7.9
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /usr/share/nginx/html from nginx-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8jgmm (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  nginx-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  cephfs-pvc
    ReadOnly:   false
  default-token-8jgmm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8jgmm
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  90s (x548 over 11h)  default-scheduler  0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims.

@kitakou0313
Copy link
Member

kitakou0313 commented Mar 7, 2021

全部同じエラーでVolume周りくさい

@kitakou0313
Copy link
Member

kitakou0313 commented Mar 7, 2021

マニフェストを見てみる

user@kzz-k8s-master:~$ cat /home/user/manifest/test-nginx.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: csi-cephfs
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
        volumeMounts:
        - name: nginx-data
          mountPath: /usr/share/nginx/html
      volumes:
      - name: nginx-data
        persistentVolumeClaim:
          claimName: cephfs-pvc
          readOnly: false
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer

@kitakou0313
Copy link
Member

kitakou0313 commented Mar 7, 2021

PV一覧

user@kzz-k8s-master:~$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                STORAGECLASS   REASON   AGE
pvc-186fb034-73c9-4534-b64e-e89a070df5e5   1Gi        RWX            Retain           Released   default/cephfs-pvc   csi-cephfs              6d17h
pvc-5527b451-238f-4cc5-b4d8-c5ff4097d1da   1Gi        RWX            Retain           Released   default/cephfs-pvc   csi-cephfs              6d18h

pvc一覧(pendingになっている)

user@kzz-k8s-master:~$ kubectl get pvc
NAME         STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cephfs-pvc   Pending                                      csi-cephfs     6d16h
user@kzz-k8s-master:~$
user@kzz-k8s-master:~$ kubectl describe pvc cephfs-pvc
Name:          cephfs-pvc
Namespace:     default
StorageClass:  csi-cephfs
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       nginx-deployment-64c5ffb9fd-666ct
               nginx-deployment-64c5ffb9fd-vgsdz
               nginx-deployment-64c5ffb9fd-wfcqf
Events:
  Type    Reason                Age                    From
                        Message
  ----    ------                ----                   ----
                        -------
  Normal  Provisioning          3m37s (x143 over 11h)  rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-558b4777b-mh8sp_5f078395-d0d9-4c74-ab2f-d8421bdff716  External provisioner is provisioning volume for claim "default/cephfs-pvc"
  Normal  ExternalProvisioning  117s (x2721 over 11h)  persistentvolume-controller
                        waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
user@kzz-k8s-master:~$

@kitakou0313
Copy link
Member

PVCのstatusがProvisioningになっており、pvが作り切れていないことが原因みたい?

@kitakou0313
Copy link
Member

kitakou0313 commented Mar 7, 2021

cephでdynamic provisioningをしてるみたいね

user@kzz-k8s-master:~$ cat /home/user/manifest/storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
  # clusterID is the namespace where operator is deployed.
  clusterID: rook-ceph

  # CephFS filesystem name into which the volume shall be created
  fsName: myfs

  # Ceph pool into which the volume shall be created
  # Required for provisionVolume: "true"
  pool: myfs-data0

  # Root path of an existing CephFS volume
  # Required for provisionVolume: "false"
  # rootPath: /absolute/path

  # The secrets contain Ceph admin credentials. These are generated automatically by the operator
  # in the same namespace as the cluster.
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

  # (optional) The driver can use either ceph-fuse (fuse) or ceph kernel client (kernel)
  # If omitted, default volume mounter will be used - this is determined by probing for ceph-fuse
  # or by setting the default mounter explicitly via --volumemounter command-line argument.
  # mounter: kernel
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
  # uncomment the following line for debugging
  #- debug

@kitakou0313
Copy link
Member

user@kzz-k8s-master:~$ kubectl describe pv pvc-186fb034-73c9-4534-b64e-e89a070df5e5
Name:            pvc-186fb034-73c9-4534-b64e-e89a070df5e5
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: rook-ceph.cephfs.csi.ceph.com
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    csi-cephfs
Status:          Released
Claim:           default/cephfs-pvc
Reclaim Policy:  Retain
Access Modes:    RWX
VolumeMode:      Filesystem
Capacity:        1Gi
Node Affinity:   <none>
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            rook-ceph.cephfs.csi.ceph.com
    FSType:            ext4
    VolumeHandle:      0001-0009-rook-ceph-0000000000000001-17cd4cd5-799a-11eb-9a73-e6b541d9a6c7
    ReadOnly:          false
    VolumeAttributes:      clusterID=rook-ceph
                           fsName=myfs
                           pool=myfs-data0
                           storage.kubernetes.io/csiProvisionerIdentity=1614496901078-8081-rook-ceph.cephfs.csi.ceph.com
Events:                <none>
user@kzz-k8s-master:~$

@kitakou0313
Copy link
Member

user@kzz-k8s-master:~$ cat /home/user/manifest/cluster.yaml
#################################################################################################################
# Define the settings for the rook-ceph cluster with common settings for a production cluster.
# All nodes with available raw devices will be used for the Ceph cluster. At least three nodes are required
# in this example. See the documentation for more details on storage settings available.

# For example, to create the cluster:
#   kubectl create -f common.yaml
#   kubectl create -f operator.yaml
#   kubectl create -f cluster.yaml
#################################################################################################################

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    # The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
    # v13 is mimic, v14 is nautilus, and v15 is octopus.
    # RECOMMENDATION: In production, use a specific version tag instead of the general v14 flag, which pulls the latest release and could result in different
    # versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
    # If you want to be more precise, you can always use a timestamp tag such ceph/ceph:v14.2.5-20190917
    # This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilities
    image: ceph/ceph:v14.2.9
    # Whether to allow unsupported versions of Ceph. Currently mimic and nautilus are supported, with the recommendation to upgrade to nautilus.
    # Octopus is the version allowed when this is set to true.
    # Do not set to true in production.
    allowUnsupported: false
  # The path on the host where configuration files will be persisted. Must be specified.
  # Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
  # In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
  dataDirHostPath: /var/lib/rook
  # Whether or not upgrade should continue even if a check fails
  # This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise
  # Use at your OWN risk
  # To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/master/ceph-upgrade.html#ceph-version-upgrades
  skipUpgradeChecks: false
  # Whether or not continue if PGs are not clean during an upgrade
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  # set the amount of mons to be started
  mon:
    count: 3
    allowMultiplePerNode: true
  mgr:
    modules:
    # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
    # are already enabled by other settings in the cluster CR and the "rook" module is always enabled.
    - name: pg_autoscaler
      enabled: true
  # enable the ceph dashboard for viewing cluster status
  dashboard:
    enabled: true
    # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
    # urlPrefix: /ceph-dashboard
    # serve the dashboard at the given port.
    # port: 8443
    # serve the dashboard using SSL
    ssl: false
  # enable prometheus alerting for cluster
  monitoring:
    # requires Prometheus to be pre-installed
    enabled: false
    # namespace to deploy prometheusRule in. If empty, namespace of the cluster will be used.
    # Recommended:
    # If you have a single rook-ceph cluster, set the rulesNamespace to the same namespace as the cluster or keep it empty.
    # If you have multiple rook-ceph clusters in the same k8s cluster, choose the same namespace (ideally, namespace with prometheus
    # deployed) to set rulesNamespace for all the clusters. Otherwise, you will get duplicate alerts with multiple alert definitions.
    rulesNamespace: rook-ceph
  network:
    # toggle to use hostNetwork
    hostNetwork: false
  rbdMirroring:
    # The number of daemons that will perform the rbd mirroring.
    # rbd mirroring must be configured with "rbd mirror" from the rook toolbox.
    workers: 0
  # enable the crash collector for ceph daemon crash collection
  crashCollector:
    disable: true
  # To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
  # The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
  # tolerate taints with a key of 'storage-node'.
#  placement:
#    all:
#      nodeAffinity:
#        requiredDuringSchedulingIgnoredDuringExecution:
#          nodeSelectorTerms:
#          - matchExpressions:
#            - key: role
#              operator: In
#              values:
#              - storage-node
#      podAffinity:
#      podAntiAffinity:
#      tolerations:
#      - key: storage-node
#        operator: Exists
# The above placement information can also be specified for mon, osd, and mgr components
#    mon:
# Monitor deployments may contain an anti-affinity rule for avoiding monitor
# collocation on the same node. This is a required rule when host network is used
# or when AllowMultiplePerNode is false. Otherwise this anti-affinity rule is a
# preferred rule with weight: 50.
#    osd:
#    mgr:
  annotations:
#    all:
#    mon:
#    osd:
# If no mgr annotations are set, prometheus scrape annotations will be set by default.
#   mgr:
  resources:
# The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
#    mgr:
#      limits:
#        cpu: "500m"
#        memory: "1024Mi"
#      requests:
#        cpu: "500m"
#        memory: "1024Mi"
# The above example requests/limits can also be added to the mon and osd components
#    mon:
#    osd:
#    prepareosd:
#    crashcollector:
  # The option to automatically remove OSDs that are out and are safe to destroy.
  removeOSDsIfOutAndSafeToRemove: false
#  priorityClassNames:
#    all: rook-ceph-default-priority-class
#    mon: rook-ceph-mon-priority-class
#    osd: rook-ceph-osd-priority-class
#    mgr: rook-ceph-mgr-priority-class
  storage: # cluster level storage configuration and selection
    useAllNodes: true
    useAllDevices: true
    #deviceFilter:
    config:
      # The default and recommended storeType is dynamically set to bluestore for devices and filestore for directories.
      # Set the storeType explicitly only if it is required not to use the default.
      storeType: filestore
      # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
      # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
      # journalSizeMB: "1024"  # uncomment if the disks are 20 GB or smaller
      # osdsPerDevice: "1" # this value can be overridden at the node or device level
      # encryptedDevice: "true" # the default value for this option is "false"
# Cluster level list of directories to use for filestore-based OSD storage. If uncomment, this example would create an OSD under the dataDirHostPath.
    directories:
    - path: /var/lib/rook
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
#    nodes:
#    - name: "172.17.4.101"
#      directories: # specific directories to use for storage can be specified for each node
#      - path: "/rook/storage-dir"
#      resources:
#        limits:
#          cpu: "500m"
#          memory: "1024Mi"
#        requests:
#          cpu: "500m"
#          memory: "1024Mi"
#    - name: "172.17.4.201"
#      devices: # specific devices to use for storage can be specified for each node
#      - name: "sdb"
#      - name: "nvme01" # multiple osds can be created on high performance devices
#        config:
#          osdsPerDevice: "5"
#      config: # configuration can be specified at the node level which overrides the cluster level config
#        storeType: filestore
#    - name: "172.17.4.301"
#      deviceFilter: "^sd."
  # The section for configuring management of daemon disruptions during upgrade or fencing.
  disruptionManagement:
    # If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically
    # via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will
    # block eviction of OSDs by default and unblock them safely when drains are detected.
    managePodBudgets: false
    # A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
    # default DOWN/OUT interval) when it is draining. This is only relevant when  `managePodBudgets` is `true`. The default value is `30` minutes.
    osdMaintenanceTimeout: 30
    # If true, the operator will create and manage MachineDisruptionBudgets to ensure OSDs are only fenced when the cluster is healthy.
    # Only available on OpenShift.
    manageMachineDisruptionBudgets: false
    # Namespace in which to watch for the MachineDisruptionBudgets.
    machineDisruptionBudgetNamespace: openshift-machine-api

@kitakou0313
Copy link
Member

user@kzz-k8s-master:~$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                STORAGECLASS   REASON   AGE
pvc-186fb034-73c9-4534-b64e-e89a070df5e5   1Gi        RWX            Retain           Released   default/cephfs-pvc   csi-cephfs              6d19h
pvc-5527b451-238f-4cc5-b4d8-c5ff4097d1da   1Gi        RWX            Retain           Released   default/cephfs-pvc   csi-cephfs              6d20h

いったん削除してみる

@kitakou0313
Copy link
Member

cephのログを見てみる

kubectl logs csi-cephfsplugin-provisioner-558b4777b-mh8sp csi-provisioner -n  rook-ceph
I0307 03:44:50.230665       1 event.go:255] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"cephfs-pvc", UID:"a04be798-2db4-4f0e-942f-be5febdf6df1", APIVersion:"v1", ResourceVersion:"28190", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "csi-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-a04be798-2db4-4f0e-942f-be5febdf6df1 already exists

@kitakou0313
Copy link
Member

sh-4.2# ceph osd tree
ID CLASS WEIGHT  TYPE NAME              STATUS REWEIGHT PRI-AFF
-1       0.08459 root default
-4       0.02820     host kzz-k8s-node1
 2       0.02820         osd.2            down        0 1.00000
-3       0.02820     host kzz-k8s-node2
 0       0.02820         osd.0            down        0 1.00000
-2       0.02820     host kzz-k8s-node3
 1       0.02820         osd.1            down        0 1.00000
sh-4.2#

down…?

@kitakou0313
Copy link
Member

kubectl exec rook-ceph-tools-c8dff9fb6-c5wrd -n rook-ceph -it sh

でcephのdebugツールに見れる

@kitakou0313
Copy link
Member

原因としては各ノードにあるosdがダウン状態なこと podは動いているので多分接続できないとかが原因

@kitakou0313
Copy link
Member

user@kzz-k8s-master:~$ cat /home/user/manifest/common.yaml | grep host
                hostNetwork:
    - hostPath
  # allowedHostPaths can be set to Rook's known host volume mount points when they are fully-known
  # Ceph requires host IPC for setting up encrypted devices
  hostIPC: true
  hostPID: true
  # hostNetwork can be set to 'false' if host networking isn't used
  hostNetwork: true
  hostPorts:
user@kzz-k8s-master:~$

ホストネットワーク使ってない?

@kitakou0313
Copy link
Member

user@kzz-k8s-master:~$ cat /home/user/manifest/cluster.yaml | grep host
  # The path on the host where configuration files will be persisted. Must be specified.
  # Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
    # toggle to use hostNetwork
    hostNetwork: false
# collocation on the same node. This is a required rule when host network is used
# nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
    # A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the
user@kzz-k8s-master:~$

storage.yamlのほうはhostnetwork使わない設定なのに、common.yamlのほうは使う設定になってる

@kitakou0313
Copy link
Member

---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: rook-privileged
spec:
  privileged: true
  allowedCapabilities:
    # required by CSI
    - SYS_ADMIN
  # fsGroup - the flexVolume agent has fsGroup capabilities and could potentially be any group
  fsGroup:
    rule: RunAsAny
  # runAsUser, supplementalGroups - Rook needs to run some pods as root
  # Ceph pods could be run as the Ceph user, but that user isn't always known ahead of time
  runAsUser:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  # seLinux - seLinux context is unknown ahead of time; set if this is well-known
  seLinux:
    rule: RunAsAny
  volumes:
    # recommended minimum set
    - configMap
    - downwardAPI
    - emptyDir
    - persistentVolumeClaim
    - secret
    - projected
    # required for Rook
    - hostPath
    - flexVolume
  # allowedHostPaths can be set to Rook's known host volume mount points when they are fully-known
  # directory-based OSDs make this hard to nail down
  # allowedHostPaths:
  #   - pathPrefix: "/run/udev"  # for OSD prep
  #     readOnly: false
  #   - pathPrefix: "/dev"  # for OSD prep
  #     readOnly: false
  #   - pathPrefix: "/var/lib/rook"  # or whatever the dataDirHostPath value is set to
  #     readOnly: false
  # Ceph requires host IPC for setting up encrypted devices
  hostIPC: true
  # Ceph OSDs need to share the same PID namespace
  hostPID: true
  # hostNetwork can be set to 'false' if host networking isn't used
  hostNetwork: true
  hostPorts:
    # Ceph messenger protocol v1
    - min: 6789
      max: 6790 # <- support old default port
    # Ceph messenger protocol v2
    - min: 3300
      max: 3300
    # Ceph RADOS ports for OSDs, MDSes
    - min: 6800
      max: 7300
    # # Ceph dashboard port HTTP (not recommended)
    # - min: 7000
    #   max: 7000
    # Ceph dashboard port HTTPS
    - min: 8443
      max: 8443
    # Ceph mgr Prometheus Metrics
    - min: 9283
      max: 9283
# OLM: END CLUSTER POD SECURITY POLICY
# OLM: BEGIN POD SECURITY POLICY BINDINGS

@kitakou0313
Copy link
Member

cluster.yamlのhostnetworkをtrueにした down解消せず

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
50 Day2 problem at Day2
Projects
None yet
Development

No branches or pull requests

3 participants