Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

第八章:故障檢測方法 - Sam #96

Open
samwu4166 opened this issue Aug 2, 2022 · 2 comments
Open

第八章:故障檢測方法 - Sam #96

samwu4166 opened this issue Aug 2, 2022 · 2 comments
Labels
question Further information is requested

Comments

@samwu4166
Copy link

本章節一開始有帶到分布式系統不是一個理想的世界,時常發生預期外的錯誤,文中滿多著墨在網路的部分,不過我之前就滿常遇過不是單一網路問題造成的節點失效,還滿常 A(這邊假設GKE) -> B(POD/Container) 沒問題,不過某幾台 B -> C(Internal Service) 會偶爾出現問題或是直接掛掉,目前是用下面這種神奇的方式去主動偵測預期外的掛掉:

          livenessProbe:
            exec:
              command:
              - /bin/sh
              - -c
              - "cat `find ./health.json -mmin -1440 | awk -v def=default-cannot-cat-file '{print} END { if (NR==0) {print def} }'`"
            initialDelaySeconds: 60
            periodSeconds: 60
            failureThreshold: 5

不知道有沒有人有其他檢測的方法呢? 或是都怎麼偵測一個系統是不是活著或是一個活著的殭屍(?

@kylemocode kylemocode added the question Further information is requested label Aug 3, 2022
@kylemocode
Copy link
Collaborator

kylemocode commented Aug 3, 2022

我們也是差不多,也是靠 k8s 設定

livenessProbe:
          httpGet:
            path: /.healthcheck
            port: http
          initialDelaySeconds: 10
          periodSeconds: 2
          failureThreshold: 10

// server...
 server.get(`/.healthcheck`, (_req, res) => {
    res.send('OK');
  });

印象中是看回傳的 status code 200 <= status < 400,如果不是在這範圍就會砍掉 container 再重啟一個

@0x171-0
Copy link

0x171-0 commented Aug 3, 2022

  • health check 機制
    • 主動寫 health check 檔案(但是還是可能會有節點活著服務失敗的狀況)
      • k8s 判定死掉就砍掉重啟
    • http 打看 status code 多首
    • Spring 有多種自動化機制可以參考,但是客製化比較困難
    • Grafana 可以監控所有服務,定期去打所有服務

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants