Skip to content

FYI: Steps to Run Non System Tenant Perf Test

chenqianfzh edited this page Jan 29, 2021 · 32 revisions

Use your knowledge to set the confgiuration about the node size, machine type, NUM_NODES, KUBEAMRK_NUM_NODES etc. This is just for your reference.

0. Build

make quick-release

1. Env var setup

For 100-node, I use

export RUN_PREFIX=[some-prefix-you-prefer]

export MASTER_SIZE=n1-highmem-32  NUM_NODES=2 KUBEMARK_NUM_NODES=100 

export SCALEOUT_TP_COUNT=2

export MASTER_DISK_SIZE=200GB MASTER_ROOT_DISK_SIZE=200GB KUBE_GCE_ZONE=us-central1-b NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=200GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} ENABLE_KCM_LEADER_ELECT=false SHARE_PARTITIONSERVER=false LOGROTATE_FILES_MAX_COUNT=10 LOGROTATE_MAX_SIZE=200M TEST_CLUSTER_LOG_LEVEL=--v=2 APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0  KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true  SCALEOUT_CLUSTER=true 

For 1k node run, this is the setting I used

export RUN_PREFIX=[some-prefix-you-prefer]  

export SCALEOUT_TP_COUNT=2

export MASTER_SIZE=n1-highmem-32 NUM_NODES=12 KUBEMARK_NUM_NODES=1000 

export MASTER_ROOT_DISK_SIZE=500GB MASTER_DISK_SIZE=500GB KUBE_GCE_ZONE=us-west1-b NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=300GB KUBE_GCE_NETWORK=${RUN_PREFIX} GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} APISERVERS_EXTRA_NUM=0   KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true  SCALEOUT_CLUSTER=true 

For 5k-node test, I use

export RUN_PREFIX=[some-prefix-you-prefer]

export SCALEOUT_TP_COUNT=2

export MASTER_SIZE=n1-highmem-96 NUM_NODES=55 KUBEMARK_NUM_NODES=5000 

export MASTER_DISK_SIZE=500GB MASTER_ROOT_DISK_SIZE=500GB KUBE_GCE_ZONE=us-central1-b NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=500GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true ETCD_QUOTA_BACKEND_BYTES=8589934592 TEST_CLUSTER_LOG_LEVEL=--v=2 ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false SHARE_PARTITIONSERVER=false APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0 LOGROTATE_FILES_MAX_COUNT=50 LOGROTATE_MAX_SIZE=200M KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX}   KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true  SCALEOUT_CLUSTER=true

For 10k or 2 * 5k-node test, I use

export RUN_PREFIX=[some-prefix-you-prefer]

export SCALEOUT_TP_COUNT=2

export MASTER_SIZE=n1-highmem-96 NUM_NODES=100 KUBEMARK_NUM_NODES=10000 

export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB KUBE_GCE_ZONE=us-central1-a NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=1000GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true ETCD_QUOTA_BACKEND_BYTES=8589934592 TEST_CLUSTER_LOG_LEVEL=--v=2 ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false SHARE_PARTITIONSERVER=false APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0 LOGROTATE_FILES_MAX_COUNT=50 LOGROTATE_MAX_SIZE=200M KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX}   KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true  SCALEOUT_CLUSTER=true

For 20K-node test,

export RUN_PREFIX=[some-prefix-you-prefer]

export KUBEMARK_NUM_NODES=20000 NUM_NODES=210

export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB KUBE_GCE_ZONE=us-central1-b MASTER_SIZE=n1-highmem-96 NODE_SIZE=n1-highmem-16  NODE_DISK_SIZE=1000GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} ENABLE_KCM_LEADER_ELECT=false SHARE_PARTITIONSERVER=false LOGROTATE_FILES_MAX_COUNT=10 LOGROTATE_MAX_SIZE=200M APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0  SCALEOUT_CLUSTER=true   KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true  SCALEOUT_TP_COUNT=2

export  TEST_CLUSTER_LOG_LEVEL=--v=2 HOLLOW_KUBELET_TEST_LOG_LEVEL=--v=2

For 30K-node test,

export RUN_PREFIX=[some-prefix-you-prefer]

export KUBEMARK_NUM_NODES=30000 NUM_NODES=315

export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB KUBE_GCE_ZONE=us-central1-b MASTER_SIZE=n1-highmem-96 NODE_SIZE=n1-highmem-16  NODE_DISK_SIZE=1000GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} ENABLE_KCM_LEADER_ELECT=false SHARE_PARTITIONSERVER=false LOGROTATE_FILES_MAX_COUNT=10 LOGROTATE_MAX_SIZE=200M APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0  SCALEOUT_CLUSTER=true   KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true  SCALEOUT_TP_COUNT=2

export  TEST_CLUSTER_LOG_LEVEL=--v=2 HOLLOW_KUBELET_TEST_LOG_LEVEL=--v=2

2. start admin cluster

./cluster/kube-up.sh

3. start kubemark cluster

./test/kubemark/start-kubemark.sh

4. Check the kubemark cluster is up

Check the number of hollow-nodes. The output of the following command should be Expected_node_num+2

./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp get nodes | wc -l 

In case the number of the hollow nodes is different from desired, run the following command and wait the hollow nodes to be ready:

./_output/dockerized/bin/linux/amd64/kubectl scale replicationcontroller hollow-node -n kubemark --replicas=[desired_hollow_node_num]

Then create the test tenant following this example

./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark.tp-1.direct create tenant arktos

Create more test tenants if needed. Note that the kubeconfig may need to be changed if the SCALEOUT_TP_COUNT is not 2.

5.Run some sanity tests

Here is what I usually run:

./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp run sanitytest --image=nginx --tenant arktos

./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp get namespaces --tenant arktos

./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp get deployments --all-namespaces  --tenant arktos

./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp scale  deployment sanitytest --replicas=3 --tenant arktos

./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp get pods --all-namespaces  --tenant arktos

./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp delete deployment sanitytest  --tenant arktos

./_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=./test/kubemark/resources/kubeconfig.kubemark-tp get pods --all-namespaces  --tenant arktos

6. Run Perf-test

SCALEOUT_TEST_TENANT=[arktos] perf-tests/clusterloader2/run-e2e.sh --nodes=[Kubemark_Node_Num] --provider=kubemark --kubeconfig=~/go/src/k8s.io/arktos/test/kubemark/resources/kubeconfig.kubemark-tp --report-dir=~/perf-logs --testconfig=testing/density/config.yaml --testoverrides=./testing/experiments/use_simple_latency_query.yaml 2>&1 | tee ~/perf-logs/perf-run-$(date +"%m-%d-%T").log

To run two tenants in parallel:

SCALEOUT_TEST_TENANT=arktos ./perf-tests/clusterloader2/run-e2e.sh --nodes=100 --provider=kubemark --kubeconfig=/home/cloudshare/go/src/k8s.io/kubernetes/test/kubemark/resources/kubeconfig.kubemark.proxy.saved --report-dir=/home/cloudshare/logs/testarktos-run40  --testconfig=testing/density/config.yaml  > testArktos-run40.log  2>&1 &

SCALEOUT_TEST_TENANT=zeta ./perf-tests/clusterloader2/run-e2e.sh --nodes=100 --provider=kubemark --kubeconfig=/home/cloudshare/go/src/k8s.io/kubernetes/test/kubemark/resources/kubeconfig.kubemark.proxy.saved --report-dir=/home/cloudshare/logs/testzeta-run20 --testconfig=testing/density/config.yaml  > testZeta-run20.log  2>&1 &

Example for a 2x5k test run:

Build: Arktos scaleoutpoc branch with two commits reverted:
cloudshare@ybtest-11:~/go/src/k8s.io/kubernetes$ git log
commit 63d44700cd311b4f8789aa7f45a3685b49c3aa7b
Author: Yunwen Bai <[email protected]>
Date:   Fri Dec 25 22:13:11 2020 +0000

    Revert "expose haproxy prometheus (#886)"
    
    This reverts commit e0e81cec9ef8bdb0e3a1deb754fc4ef2cc983761.

commit 3b29eb9ef887a4e8d5c36cea3b2c8a2a917b7e8c
Author: Yunwen Bai <[email protected]>
Date:   Fri Dec 25 22:12:54 2020 +0000

    Revert "fix haproxy failure due to restarting too quickly (#889)"
    
    This reverts commit cb344b50309f363724b3068f12c26c43713ba9df.

commit 0b020ea37558e4fd5c2144e91bd04f531e6c8823
Author: chenqianfzh <[email protected]>
Date:   Thu Dec 24 15:51:09 2020 -0800

    fix mizar-controller name typo (#894)
    
    Co-authored-by: Ubuntu <[email protected]>

Test executor machines:
Ybtest-11
Ybtest-2

Test env exports and setup:

export RUN_PREFIX=new-yb01-k8s-scaleout

export MASTER_SIZE=n1-highmem-96 NUM_NODES=100 KUBEMARK_NUM_NODES=10000

export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB KUBE_GCE_ZONE=us-central1-a NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=1000GB KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true ETCD_QUOTA_BACKEND_BYTES=8589934592 TEST_CLUSTER_LOG_LEVEL=--v=2 ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false SHARE_PARTITIONSERVER=false APISERVERS_EXTRA_NUM=0 WORKLOADCONTROLLER_EXTRA_NUM=0 ETCD_EXTRA_NUM=0 LOGROTATE_FILES_MAX_COUNT=50 LOGROTATE_MAX_SIZE=200M KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX}  CREATE_TEST_TENANTS=true SCALEOUT_CLUSTER=true SCALEOUT_CLUSTER_TWO_TPS=true 

./cluster/kube-up.sh > up.log 2>&1 &
./test/kubemark/start-kubemark.sh > start.log 2>&1 &

Sanity test the clusters:
cloudshare@ybtest-2:~/go/src/k8s.io/kubernetes$ kubectl --kubeconfig=test/kubemark/resources/kubeconfig.kubemark.proxy.saved get nodes | wc -l
10002
cloudshare@ybtest-2:~/go/src/k8s.io/kubernetes$ kubectl --kubeconfig=test/kubemark/resources/kubeconfig.kubemark.proxy.saved get tenants
NAME     STORAGEID   STATUS   AGE
system   0           Active   30m
zeta     0           Active   14m
cloudshare@ybtest-2:~/go/src/k8s.io/kubernetes$

cloudshare@ybtest-11:~/go/src/k8s.io/kubernetes$ kubectl --kubeconfig=test/kubemark/resources/kubeconfig.kubemark.proxy.saved get nodes | wc -l
10002
cloudshare@ybtest-11:~/go/src/k8s.io/kubernetes$ kubectl --kubeconfig=test/kubemark/resources/kubeconfig.kubemark.proxy.saved get tenants
NAME     STORAGEID   STATUS   AGE
arktos   0           Active   15m
system   0           Active   34m
cloudshare@ybtest-11:~/go/src/k8s.io/kubernetes$ 

Start the tests:
On test excutor machine 1:
SCALEOUT_TEST_TENANT=arktos ./perf-tests/clusterloader2/run-e2e.sh --nodes=10000 --provider=kubemark --kubeconfig=/home/cloudshare/go/src/k8s.io/kubernetes/test/kubemark/resources/kubeconfig.kubemark.proxy.saved --report-dir=/home/cloudshare/logs/testarktos-2x10k  --testconfig=testing/density/config.yaml  > testArktos-2x10k.log  2>&1 &


On tests executor machine 2:
SCALEOUT_TEST_TENANT=zeta ./perf-tests/clusterloader2/run-e2e.sh --nodes=10000 --provider=kubemark --kubeconfig=/home/cloudshare/go/src/k8s.io/kubernetes/test/kubemark/resources/kubeconfig.kubemark.proxy.saved --report-dir=/home/cloudshare/logs/testzeta-2x10k --testconfig=testing/density/config.yaml  > testZeta-2x10k.log  2>&1 &