Highly scalable and standards based Model Inference Platform on Kubernetes for Trusted AI
- Istio: Simplify observability, traffic management, security, and policy with the leading service mesh.
- Knative: Kubernetes-based platform to deploy and manage modern serverless workloads.
- KServe: Highly scalable and standards based Model Inference Platform on Kubernetes for Trusted AI
- Cert Manager: cert-manager adds certificates and certificate issuers as resource types in Kubernetes clusters, and simplifies the process of obtaining, renewing and using those certificates.
Serverless
: <- QuickStartRawDeployment
:ModelMeshDeployment
: designed for high-scale, high-density and frequently-changing model use cases
Default DeploymentMode is set in configmap
kubectl get configmap -n kserve inferenceservice-config -o jsonpath='{.data.deploy}'
{
"defaultDeploymentMode": "Serverless"
}
Solves the scalability problem:
- Overhead resource due to the sidecars injected into each pod
- Maximum number of pods per node
- Each pod in
InferenceService
requires an independent IP
-
Prepare a Kubernetes cluster. If you run in local, please use Kubernetes in Docker Desktop (or minikube. I just confirmed with Docker Desktop). (Couldn't find a way to connect to
LoadBalancer
typeService
from the host machine forkind
cluster unless we useNodePort
instead.) -
Install
cert-manager
,istio
,knative
,kserve
.curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.7/hack/quick_install.sh" | bash
The script installs:
-
Istio: 1.10.3
curl -L https://git.io/getLatestIstio | sh - cd istio-${ISTIO_VERSION}
Install IstioOperator in
istio-system
namespace withistioctl
-
KNatve: v0.23.2 Install CRDs, core, and, release.
-
Cert Manager: v1.3.0
-
KServe: v0.7.0
※ Might fail installation. Ususally rerunning the script would succeed. Might be caused by x509 certificate related errors, in that case, we need to restart
istiod
Pod.pods
kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE cert-manager cert-manager-76b7c557d5-rnkgx 1/1 Running 0 3m39s cert-manager cert-manager-cainjector-655d695d74-gxvzn 1/1 Running 0 3m39s cert-manager cert-manager-webhook-7955b9bb97-mj89j 1/1 Running 0 3m39s istio-system istio-egressgateway-5547fcc8fc-4f5lx 1/1 Running 0 4m5s istio-system istio-ingressgateway-8f568d595-9h9kj 1/1 Running 0 4m5s istio-system istiod-568d797f55-dxs89 1/1 Running 0 4m23s knative-serving activator-7c4fbc97cf-c7jd8 1/1 Running 0 3m44s knative-serving autoscaler-87c6f49c-zsmpc 1/1 Running 0 3m44s knative-serving controller-78d6897c65-chqzl 1/1 Running 0 3m44s knative-serving istio-webhook-7b4d84887c-85tc9 1/1 Running 0 3m42s knative-serving networking-istio-595947b649-x9jrh 1/1 Running 0 3m42s knative-serving webhook-6bcf6c6658-qhlj8 1/1 Running 0 3m44s kserve kserve-controller-manager-0 2/2 Running 0 2m49s kube-system coredns-558bd4d5db-6f6nv 1/1 Running 2 13d kube-system coredns-558bd4d5db-l7csh 1/1 Running 2 13d kube-system etcd-kind-control-plane 1/1 Running 2 13d kube-system kindnet-vfxnf 1/1 Running 2 13d kube-system kube-apiserver-kind-control-plane 1/1 Running 2 13d kube-system kube-controller-manager-kind-control-plane 1/1 Running 2 13d kube-system kube-proxy-ph9fg 1/1 Running 2 13d kube-system kube-scheduler-kind-control-plane 1/1 Running 2 13d local-path-storage local-path-provisioner-547f784dff-5t42j 1/1 Running 3 13d
If cluster doesn't support LoadBalancer
Change istio-ingressgateway service type if you're running in the Kubernetes cluster that doesn't support LoadBalancer.
-
-
Check ingress gateway. (
EXTERNAL-IP
islocalhost
)kubectl get svc istio-ingressgateway -n istio-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE istio-ingressgateway LoadBalancer 10.101.34.67 localhost 15021:30144/TCP,80:30440/TCP,443:30663/TCP,31400:31501/TCP,15443:32341/TCP 73s
※ Might encounter Kubernetes Load balanced services are sometimes marked as "Pending" issue. The only way seems to be reset Kubernetes and restart Docker process.
-
Create test
InferenceService
.sklearn-inference-service.yaml
:apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" metadata: name: "sklearn-iris" spec: predictor: sklearn: storageUri: "gs://kfserving-samples/models/sklearn/iris"
kubectl create ns kserve-test kubectl apply -f sklearn-inference-service.yaml -n kserve-test
wait a min
kubectl get inferenceservices sklearn-iris -n kserve-test NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE sklearn-iris http://sklearn-iris.kserve-test.example.com True 100 sklearn-iris-predictor-default-00001 2m26s
-
Check with
iris-input.json
{ "instances": [ [6.8, 2.8, 4.8, 1.4], [6.0, 3.4, 4.5, 1.6] ] }
# export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}') export INGRESS_HOST=localhost # if you're using docker-desktop export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -H "Host: ${SERVICE_HOSTNAME}" http://$INGRESS_HOST:$INGRESS_PORT/v1/models/sklearn-iris:predict -d @./data/iris-input.json {"predictions": [1, 1]}%
-
Performance test
kubectl create -f https://raw.githubusercontent.com/kserve/kserve/release-0.7/docs/samples/v1beta1/sklearn/v1/perf.yaml -n kserve-test
kubectl logs load-testvbfqw-hfbqj -n kserve-test Requests [total, rate, throughput] 30000, 500.02, 500.00 Duration [total, attack, wait] 1m0s, 59.998s, 2.104ms Latencies [min, mean, 50, 90, 95, 99, max] 1.837ms, 3.261ms, 2.819ms, 4.162ms, 5.264ms, 12.196ms, 49.735ms Bytes In [total, mean] 690000, 23.00 Bytes Out [total, mean] 2460000, 82.00 Success [ratio] 100.00% Status Codes [code:count] 200:30000 Error Set:
-
Visualize by
kiali
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.12/samples/addons/prometheus.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.12/samples/addons/kiali.yaml
bin/istioctl dashboard kiali
-
Cleanup
export ISTIO_VERSION=1.10.3 export KNATIVE_VERSION=v0.23.2 export KSERVE_VERSION=v0.7.0 export CERT_MANAGER_VERSION=v1.3.0
Delete KServce
kubectl delete -f https://github.com/kserve/kserve/releases/download/$KSERVE_VERSION/kserve.yaml
Delete cert-manager
kubectl delete -f https://github.com/jetstack/cert-manager/releases/download/$CERT_MANAGER_VERSION/cert-manager.yaml
Delete Knative
kubectl delete --filename https://github.com/knative/serving/releases/download/$KNATIVE_VERSION/serving-crds.yaml kubectl delete --filename https://github.com/knative/serving/releases/download/$KNATIVE_VERSION/serving-core.yaml kubectl delete --filename https://github.com/knative/net-istio/releases/download/$KNATIVE_VERSION/release.yaml
Delete istio if
istioctl
existsbin/istioctl manifest generate --set profile=demo | kubectl delete --ignore-not-found=true -f - kubectl delete namespace istio-system
- Fetch
InferenceService
- Filter by annotation
- Skip reconcilation for
ModelMeshDeployment
mode. - Finalizer logic (add finalizer if not exists & deleteExternalResources if being deleted)
- Add predictors (required), transformers (optional), and explainers (optional) to
reconciler
. - Call
Reconcile
for all the reconcilers set above. - Reconcile ingress.
RawDeployment
->NewRawIngressReconciler
Serveless
->NewIngressReconciler
- Reconcile modelConfig.
Interesting point is InferenceServiceReconciler's reconcile function calls the reconcile function of another controller.
-
Create model.
from sklearn import svm from sklearn import datasets from joblib import dump clf = svm.SVC(gamma='scale') iris = datasets.load_iris() X, y = iris.data, iris.target clf.fit(X, y) dump(clf, 'model.joblib')
python create_model.py
->
model.joblib
-
Install sklearn-server following Scikit-Learn Server
git clone https://github.com/kserve/kserve.git cd kserve/python/kserve
https://github.com/kserve/kserve/tree/master/python/kserve
pip install -e .
cd - && cd kserve/python/sklearnserver
pip install -e .
-
Run SKLearn Server
python -m sklearnserver --model_dir ./ --model_name svm [I 220129 09:23:24 model_server:150] Registering model: svm [I 220129 09:23:24 model_server:123] Listening on port 8080 [I 220129 09:23:24 model_server:125] Will fork 1 workers
-
Send a request from a client.
python check_sklearn_server.py <Response [200]> {"predictions": [0]}