-
Notifications
You must be signed in to change notification settings - Fork 0
FYI: Steps to Run Non System Tenant Perf Test
Use your knowledge to set the confgiuration about the node size, machine type, NUM_NODES, KUBEAMRK_NUM_NODES etc. This is just for your reference.
make quick-release
For 100-node, I use
export RUN_PREFIX=[some-prefix-you-prefer]
export MASTER_SIZE=n1-highmem-32 NUM_NODES=2 KUBEMARK_NUM_NODES=100
export SCALEOUT_TP_COUNT=2
export MASTER_DISK_SIZE=200GB MASTER_ROOT_DISK_SIZE=200GB KUBE_GCE_ZONE=us-central1-b NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=200GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} ENABLE_KCM_LEADER_ELECT=false SHARE_PARTITIONSERVER=false LOGROTATE_FILES_MAX_COUNT=10 LOGROTATE_MAX_SIZE=200M TEST_CLUSTER_LOG_LEVEL=--v=2 APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0 KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true SCALEOUT_CLUSTER=true
For 1k node run, this is the setting I used
export RUN_PREFIX=[some-prefix-you-prefer]
export SCALEOUT_TP_COUNT=2
export MASTER_SIZE=n1-highmem-32 NUM_NODES=12 KUBEMARK_NUM_NODES=1000
export MASTER_ROOT_DISK_SIZE=500GB MASTER_DISK_SIZE=500GB KUBE_GCE_ZONE=us-west1-b NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=300GB KUBE_GCE_NETWORK=${RUN_PREFIX} GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} APISERVERS_EXTRA_NUM=0 KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true SCALEOUT_CLUSTER=true
For 5k-node test, I use
export RUN_PREFIX=[some-prefix-you-prefer]
export SCALEOUT_TP_COUNT=2
export MASTER_SIZE=n1-highmem-96 NUM_NODES=55 KUBEMARK_NUM_NODES=5000
export MASTER_DISK_SIZE=500GB MASTER_ROOT_DISK_SIZE=500GB KUBE_GCE_ZONE=us-central1-b NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=500GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true ETCD_QUOTA_BACKEND_BYTES=8589934592 TEST_CLUSTER_LOG_LEVEL=--v=2 ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false SHARE_PARTITIONSERVER=false APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0 LOGROTATE_FILES_MAX_COUNT=50 LOGROTATE_MAX_SIZE=200M KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true SCALEOUT_CLUSTER=true
For 10k or 2 * 5k-node test, I use
export RUN_PREFIX=[some-prefix-you-prefer]
export SCALEOUT_TP_COUNT=2
export MASTER_SIZE=n1-highmem-96 NUM_NODES=100 KUBEMARK_NUM_NODES=10000
export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB KUBE_GCE_ZONE=us-central1-a NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=1000GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true ETCD_QUOTA_BACKEND_BYTES=8589934592 TEST_CLUSTER_LOG_LEVEL=--v=2 ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false SHARE_PARTITIONSERVER=false APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0 LOGROTATE_FILES_MAX_COUNT=50 LOGROTATE_MAX_SIZE=200M KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true SCALEOUT_CLUSTER=true
For 20K-node test,
export RUN_PREFIX=[some-prefix-you-prefer]
export KUBEMARK_NUM_NODES=20000 NUM_NODES=210
export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB KUBE_GCE_ZONE=us-central1-b MASTER_SIZE=n1-highmem-96 NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=1000GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} ENABLE_KCM_LEADER_ELECT=false SHARE_PARTITIONSERVER=false LOGROTATE_FILES_MAX_COUNT=10 LOGROTATE_MAX_SIZE=200M APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0 SCALEOUT_CLUSTER=true KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true SCALEOUT_TP_COUNT=2
export TEST_CLUSTER_LOG_LEVEL=--v=2 HOLLOW_KUBELET_TEST_LOG_LEVEL=--v=2
For 30K-node test,
export RUN_PREFIX=[some-prefix-you-prefer]
export KUBEMARK_NUM_NODES=30000 NUM_NODES=315
export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB KUBE_GCE_ZONE=us-central1-b MASTER_SIZE=n1-highmem-96 NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=1000GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} ENABLE_KCM_LEADER_ELECT=false SHARE_PARTITIONSERVER=false LOGROTATE_FILES_MAX_COUNT=10 LOGROTATE_MAX_SIZE=200M APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0 SCALEOUT_CLUSTER=true KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true SCALEOUT_TP_COUNT=2
export TEST_CLUSTER_LOG_LEVEL=--v=2 HOLLOW_KUBELET_TEST_LOG_LEVEL=--v=2
./cluster/kube-up.sh
./test/kubemark/start-kubemark.sh
Check the number of hollow-nodes. The output of the following command should be Expected_node_num+2
./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp get nodes | wc -l
In case the number of the hollow nodes is different from desired, run the following command and wait the hollow nodes to be ready:
./_output/dockerized/bin/linux/amd64/kubectl scale replicationcontroller hollow-node -n kubemark --replicas=[desired_hollow_node_num]
Then create the test tenant following this example
./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark.tp-1.direct create tenant arktos
Create more test tenants if needed. Note that the kubeconfig may need to be changed if the SCALEOUT_TP_COUNT is not 2.
Here is what I usually run:
./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp run sanitytest --image=nginx --tenant arktos
./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp get namespaces --tenant arktos
./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp get deployments --all-namespaces --tenant arktos
./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp scale deployment sanitytest --replicas=3 --tenant arktos
./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp get pods --all-namespaces --tenant arktos
./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp delete deployment sanitytest --tenant arktos
./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp get pods --all-namespaces --tenant arktos
SCALEOUT_TEST_TENANT=[arktos] perf-tests/clusterloader2/run-e2e.sh --nodes=[Kubemark_Node_Num] --provider=kubemark --kubeconfig=~/go/src/k8s.io/arktos/test/kubemark/resources/kubeconfig.kubemark-tp --report-dir=~/perf-logs --testconfig=testing/density/config.yaml --testoverrides=./testing/experiments/use_simple_latency_query.yaml 2>&1 | tee ~/perf-logs/perf-run-$(date +"%m-%d-%T").log
To run two tenants in parallel:
SCALEOUT_TEST_TENANT=arktos ./perf-tests/clusterloader2/run-e2e.sh --nodes=100 --provider=kubemark --kubeconfig=/home/cloudshare/go/src/k8s.io/kubernetes/test/kubemark/resources/kubeconfig.kubemark.proxy.saved --report-dir=/home/cloudshare/logs/testarktos-run40 --testconfig=testing/density/config.yaml > testArktos-run40.log 2>&1 &
SCALEOUT_TEST_TENANT=zeta ./perf-tests/clusterloader2/run-e2e.sh --nodes=100 --provider=kubemark --kubeconfig=/home/cloudshare/go/src/k8s.io/kubernetes/test/kubemark/resources/kubeconfig.kubemark.proxy.saved --report-dir=/home/cloudshare/logs/testzeta-run20 --testconfig=testing/density/config.yaml > testZeta-run20.log 2>&1 &
Build: Arktos scaleoutpoc branch with two commits reverted:
cloudshare@ybtest-11:~/go/src/k8s.io/kubernetes$ git log
commit 63d44700cd311b4f8789aa7f45a3685b49c3aa7b
Author: Yunwen Bai <[email protected]>
Date: Fri Dec 25 22:13:11 2020 +0000
Revert "expose haproxy prometheus (#886)"
This reverts commit e0e81cec9ef8bdb0e3a1deb754fc4ef2cc983761.
commit 3b29eb9ef887a4e8d5c36cea3b2c8a2a917b7e8c
Author: Yunwen Bai <[email protected]>
Date: Fri Dec 25 22:12:54 2020 +0000
Revert "fix haproxy failure due to restarting too quickly (#889)"
This reverts commit cb344b50309f363724b3068f12c26c43713ba9df.
commit 0b020ea37558e4fd5c2144e91bd04f531e6c8823
Author: chenqianfzh <[email protected]>
Date: Thu Dec 24 15:51:09 2020 -0800
fix mizar-controller name typo (#894)
Co-authored-by: Ubuntu <[email protected]>
Test executor machines:
Ybtest-11
Ybtest-2
Test env exports and setup:
export RUN_PREFIX=new-yb01-k8s-scaleout
export MASTER_SIZE=n1-highmem-96 NUM_NODES=100 KUBEMARK_NUM_NODES=10000
export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB KUBE_GCE_ZONE=us-central1-a NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=1000GB KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true ETCD_QUOTA_BACKEND_BYTES=8589934592 TEST_CLUSTER_LOG_LEVEL=--v=2 ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false SHARE_PARTITIONSERVER=false APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0 LOGROTATE_FILES_MAX_COUNT=50 LOGROTATE_MAX_SIZE=200M KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} CREATE_TEST_TENANTS=true SCALEOUT_CLUSTER=true SCALEOUT_CLUSTER_TWO_TPS=true
./cluster/kube-up.sh > up.log 2>&1 &
./test/kubemark/start-kubemark.sh > start.log 2>&1 &
Sanity test the clusters:
cloudshare@ybtest-2:~/go/src/k8s.io/kubernetes$ kubectl --kubeconfig=test/kubemark/resources/kubeconfig.kubemark.proxy.saved get nodes | wc -l
10002
cloudshare@ybtest-2:~/go/src/k8s.io/kubernetes$ kubectl --kubeconfig=test/kubemark/resources/kubeconfig.kubemark.proxy.saved get tenants
NAME STORAGEID STATUS AGE
system 0 Active 30m
zeta 0 Active 14m
cloudshare@ybtest-2:~/go/src/k8s.io/kubernetes$
cloudshare@ybtest-11:~/go/src/k8s.io/kubernetes$ kubectl --kubeconfig=test/kubemark/resources/kubeconfig.kubemark.proxy.saved get nodes | wc -l
10002
cloudshare@ybtest-11:~/go/src/k8s.io/kubernetes$ kubectl --kubeconfig=test/kubemark/resources/kubeconfig.kubemark.proxy.saved get tenants
NAME STORAGEID STATUS AGE
arktos 0 Active 15m
system 0 Active 34m
cloudshare@ybtest-11:~/go/src/k8s.io/kubernetes$
Start the tests:
On test excutor machine 1:
SCALEOUT_TEST_TENANT=arktos ./perf-tests/clusterloader2/run-e2e.sh --nodes=10000 --provider=kubemark --kubeconfig=/home/cloudshare/go/src/k8s.io/kubernetes/test/kubemark/resources/kubeconfig.kubemark.proxy.saved --report-dir=/home/cloudshare/logs/testarktos-2x10k --testconfig=testing/density/config.yaml > testArktos-2x10k.log 2>&1 &
On tests executor machine 2:
SCALEOUT_TEST_TENANT=zeta ./perf-tests/clusterloader2/run-e2e.sh --nodes=10000 --provider=kubemark --kubeconfig=/home/cloudshare/go/src/k8s.io/kubernetes/test/kubemark/resources/kubeconfig.kubemark.proxy.saved --report-dir=/home/cloudshare/logs/testzeta-2x10k --testconfig=testing/density/config.yaml > testZeta-2x10k.log 2>&1 &