Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cpu_pinning and isolate_emulator_thread to vm resource #105

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

brandboat
Copy link
Member

@brandboat brandboat commented Sep 24, 2024

related to harvester/harvester#6551, include ACC test case for cpu_pinning and isolate_emulator_thread.

Test Plan

Prerequisite: Prepare a harvester cluster with v1.4 dev version (e.g. v1.4.0-dev-20240918)

  1. run ACC test
    • Download kubeconfig from harvester, rename it to kubeconfig_test.yaml and then move the config under terraform-provider-harvester/
    • Execute ./scripts/test
  2. terraform apply

@brandboat brandboat self-assigned this Sep 24, 2024
@brandboat brandboat force-pushed the HARV-6551 branch 3 times, most recently from 2d3d359 to c46ed10 Compare September 26, 2024 15:34
@brandboat brandboat changed the title [WIP] Suport vm cpu pinning Suport vm cpu pinning Sep 27, 2024
@brandboat brandboat marked this pull request as ready for review September 27, 2024 02:16
@m-ildefons
Copy link
Contributor

Good job. Looks pretty good to me.

But one thing to consider is that a feature that goes hand-in-hand with CPU pinning is isolating the Qemu threads: https://kubevirt.io/user-guide/compute/dedicated_cpu_resources/#requesting-dedicated-cpu-for-qemu-emulator

This is an additional flag that can be set to allocate an additional physical CPU just for the purpose of running the emulator and (depending on the ioThreadsPolicy) the io threads. This can have an impact on the latency of the remaining CPUs, that are assigned to the VM.

Perhaps this flag should also be made available, to fully support workloads that benefit from CPU pinning?

@brandboat
Copy link
Member Author

But one thing to consider is that a feature that goes hand-in-hand with CPU pinning is isolating the Qemu threads: https://kubevirt.io/user-guide/compute/dedicated_cpu_resources/#requesting-dedicated-cpu-for-qemu-emulator

This is an additional flag that can be set to allocate an additional physical CPU just for the purpose of running the emulator and (depending on the ioThreadsPolicy) the io threads. This can have an impact on the latency of the remaining CPUs, that are assigned to the VM.

Perhaps this flag should also be made available, to fully support workloads that benefit from CPU pinning?

Thank you for the review! The isolateEmulatorThread configuration isn't currently covered in Harvester's CPU pinning implementation, but it seems like a valuable addition. We should be aware that enabling this attribute in VirtualMachine CR will result in an additional dedicated CPU being allocated. For example:

spec:
  domain:
    cpu:
      dedicatedCpuPlacement: true
      isolateEmulatorThread: true
    resources:
      limits:
        cpu: 2

In this case, the virt-launcher pod would actually acquire 3 CPUs.

c.c. @bk201, do you think we should consider this as a follow-up in the CPU pinning implementation?

@bk201
Copy link
Member

bk201 commented Sep 30, 2024

@brandboat We can create a new enhancement issue if it's not in the current implementation scope.

@brandboat
Copy link
Member Author

@brandboat We can create a new enhancement issue if it's not in the current implementation scope.

filed harvester/harvester#6672
I'll also include the isolateEmulatorThread config in terraform as part of this PR. Thanks!

@brandboat brandboat changed the title Suport vm cpu pinning Add cpu_pinning and isolate_emulator_thread to vm resource Oct 1, 2024
Copy link
Member

@FrankYang0529 FrankYang0529 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR. Test both with / without dedicatedCpuPlacement and isolateEmulatorThread cases. All cases work as expected.

@brandboat
Copy link
Member Author

Rebased PR to fix conflicts and refactor acc tests.

Copy link
Contributor

@m-ildefons m-ildefons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests fail for me:

Running tests:
        github.com/harvester/terraform-provider-harvester               coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/internal/provider/network             coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/internal/provider/storageclass                coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/internal/provider/keypair             coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/internal/provider             coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/internal/provider/clusternetwork              coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/internal/provider/image               coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/internal/provider/cloudinitsecret             coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/internal/provider/vlanconfig          coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/internal/provider/volume              coverage: 0.0% of statements
=== RUN   Test_checkKeyPairsInUserData
=== RUN   Test_checkKeyPairsInUserData/correct_ssh_authorized_keys_in_root
=== RUN   Test_checkKeyPairsInUserData/wrong_ssh_authorized_keys_in_root
=== RUN   Test_checkKeyPairsInUserData/correct_ssh_authorized_keys_in_users
=== RUN   Test_checkKeyPairsInUserData/wrong_ssh_authorized_keys_in_users
=== RUN   Test_checkKeyPairsInUserData/no_ssh_authorized_keys
=== RUN   Test_checkKeyPairsInUserData/empty_content
--- PASS: Test_checkKeyPairsInUserData (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/correct_ssh_authorized_keys_in_root (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/wrong_ssh_authorized_keys_in_root (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/correct_ssh_authorized_keys_in_users (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/wrong_ssh_authorized_keys_in_users (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/no_ssh_authorized_keys (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/empty_content (0.00s)
PASS
coverage: 2.8% of statements
ok      github.com/harvester/terraform-provider-harvester/internal/provider/virtualmachine      0.031s  coverage: 2.8% of statements
?       github.com/harvester/terraform-provider-harvester/pkg/constants [no test files]
        github.com/harvester/terraform-provider-harvester/pkg/helper            coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/pkg/importer          coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/internal/util         coverage: 0.0% of statements
        github.com/harvester/terraform-provider-harvester/pkg/client            coverage: 0.0% of statements
=== RUN   TestAccImage_basic
    provider_test.go:48: [{0 Get "https://192.168.0.131/k8s/clusters/local/apis/harvesterhci.io/v1beta1/settings/server-version": dial tcp 192.168.0.131:443: connect: connection refused  []}]
--- FAIL: TestAccImage_basic (0.00s)
=== RUN   TestAccKeyPair_basic
    resource_keypair_test.go:48: Step 2/3 error: Error running pre-apply plan: exit status 1
        
        Error: Get "https://192.168.0.131/k8s/clusters/local/apis/harvesterhci.io/v1beta1/settings/server-version": dial tcp 192.168.0.131:443: connect: connection refused
        
          on <empty> line 0:
          (source code not available)
        
        
--- FAIL: TestAccKeyPair_basic (1.45s)
=== RUN   TestAccNetwork_basic
    resource_network_test.go:50: Step 2/2 error: Error running pre-apply plan: exit status 1
        
        Error: Get "https://192.168.0.131/k8s/clusters/local/apis/harvesterhci.io/v1beta1/settings/server-version": dial tcp 192.168.0.131:443: connect: connection refused
        
          on <empty> line 0:
          (source code not available)
        
        
--- FAIL: TestAccNetwork_basic (0.25s)
=== RUN   TestAccStorageClass_basic
    resource_storageclass_test.go:45: Step 1/1 error: Error running pre-apply plan: exit status 1
        
        Error: Get "https://192.168.0.131/k8s/clusters/local/apis/harvesterhci.io/v1beta1/settings/server-version": dial tcp 192.168.0.131:443: connect: connection refused
        
          on <empty> line 0:
          (source code not available)
        
        
--- FAIL: TestAccStorageClass_basic (0.19s)
=== RUN   TestAccVirtualMachine_basic
    resource_virtualmachine_test.go:157: Step 1/2 error: Error running pre-apply plan: exit status 1
        
        Error: Get "https://192.168.0.131/k8s/clusters/local/apis/harvesterhci.io/v1beta1/settings/server-version": dial tcp 192.168.0.131:443: connect: connection refused
        
          on <empty> line 0:
          (source code not available)
        
        
--- FAIL: TestAccVirtualMachine_basic (0.19s)
=== RUN   TestAccVirtualMachine_cpu_pinning
--- FAIL: TestAccVirtualMachine_cpu_pinning (0.00s)
panic: interface conversion: interface {} is nil, not *client.Client [recovered]
        panic: interface conversion: interface {} is nil, not *client.Client

goroutine 834 [running]:
testing.tRunner.func1.2({0x25f1ec0, 0xc0010cef30})
        /usr/lib64/go/1.22/src/testing/testing.go:1631 +0x24a
testing.tRunner.func1()
        /usr/lib64/go/1.22/src/testing/testing.go:1634 +0x377
panic({0x25f1ec0?, 0xc0010cef30?})
        /usr/lib64/go/1.22/src/runtime/panic.go:770 +0x132
github.com/harvester/terraform-provider-harvester/internal/tests.TestAccVirtualMachine_cpu_pinning(0xc001093ba0)
        /go/src/github.com/harvester/terraform-provider-harvester/internal/tests/resource_virtualmachine_test.go:203 +0x14f7
testing.tRunner(0xc001093ba0, 0x2b9a500)
        /usr/lib64/go/1.22/src/testing/testing.go:1689 +0xfb
created by testing.(*T).Run in goroutine 1
        /usr/lib64/go/1.22/src/testing/testing.go:1742 +0x390
FAIL    github.com/harvester/terraform-provider-harvester/internal/tests        2.119s
FAIL
FATA[0039] exit status 1                                
make: *** [Makefile:11: test] Error 1

I'm not exactly sure what's up with the connection refused errors, since I definitively have a Harvester cluster available at that address, but the panic needs to be fixed regardless.

@brandboat
Copy link
Member Author

brandboat commented Nov 26, 2024

Actually I got the same error after make test finished, while I got AC by using ./scripts/test. And after I checkout to master branch, I got same result. (make test failed while ./scripts/tests AC).

I believe it's not related to this PR but I'll look into this issue, thanks for the comment @m-ildefons

@brandboat
Copy link
Member Author

After deb9126, now make test works

`make test` output
./.dapper test
[+] Building 1.9s (14/14) FINISHED                                                                                                                                      docker:default
 => [internal] load build definition from Dockerfile.dapper369921304                                                                                                              0.0s
 => => transferring dockerfile: 1.44kB                                                                                                                                            0.0s
 => [internal] load metadata for registry.suse.com/bci/golang:1.22                                                                                                                1.8s
 => [internal] load .dockerignore                                                                                                                                                 0.0s
 => => transferring context: 2B                                                                                                                                                   0.0s
 => [1/9] FROM registry.suse.com/bci/golang:1.22@sha256:670d86a82103520f105a97320d3075e5b3d72ccff0b17467ac4277d909ef18cb                                                          0.0s
 => [internal] load build context                                                                                                                                                 0.0s
 => => transferring context: 55B                                                                                                                                                  0.0s
 => CACHED [2/9] RUN zypper -n rm container-suseconnect &&     zypper -n install curl docker gzip tar wget awk zip unzip                                                          0.0s
 => CACHED [3/9] RUN GO111MODULE=on go install golang.org/x/tools/cmd/[email protected]                                                                                           0.0s
 => CACHED [4/9] RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s v1.57.1                                                        0.0s
 => CACHED [5/9] RUN wget --quiet https://github.com/docker/buildx/releases/download/v0.13.1/buildx-v0.13.1.linux-amd64 &&     wget --quiet https://releases.hashicorp.com/terra  0.0s
 => CACHED [6/9] COPY go.mod /go/src/github.com/harvester/terraform-provider-harvester/go.mod                                                                                     0.0s
 => CACHED [7/9] COPY go.sum /go/src/github.com/harvester/terraform-provider-harvester/go.sum                                                                                     0.0s
 => CACHED [8/9] WORKDIR /go/src/github.com/harvester/terraform-provider-harvester                                                                                                0.0s
 => CACHED [9/9] RUN go mod download                                                                                                                                              0.0s
 => exporting to image                                                                                                                                                            0.0s
 => => exporting layers                                                                                                                                                           0.0s
 => => writing image sha256:a1047c2860e630d30e61cfc4c2623682020f990aff070d7d831f2495bf22ac09                                                                                      0.0s
 => => naming to docker.io/library/terraform-provider-harvester:HARV-6551                                                                                                         0.0s
[+] Building 2.0s (7/7) FINISHED                                                                                                                                        docker:default
 => [internal] load build definition from Dockerfile.dapper1924461373                                                                                                             0.0s
 => => transferring dockerfile: 163B                                                                                                                                              0.0s
 => [internal] load metadata for docker.io/library/terraform-provider-harvester:HARV-6551                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                                                                 0.0s
 => => transferring context: 2B                                                                                                                                                   0.0s
 => [internal] load build context                                                                                                                                                 0.6s
 => => transferring context: 285.47MB                                                                                                                                             0.6s
 => CACHED [1/2] FROM docker.io/library/terraform-provider-harvester:HARV-6551                                                                                                    0.0s
 => [2/2] COPY . /go/src/github.com/harvester/terraform-provider-harvester/                                                                                                       0.4s
 => exporting to image                                                                                                                                                            0.8s
 => => exporting layers                                                                                                                                                           0.8s
 => => writing image sha256:c942e3de693c3f9fb71316fcd065d204d33e20a2e31f5ff91ad1538ccd66f1c8                                                                                      0.0s
 => => naming to docker.io/library/terraform-provider-harvester:HARV-6551                                                                                                         0.0s
Running tests:
	github.com/harvester/terraform-provider-harvester		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/internal/provider/clusternetwork		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/internal/provider		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/internal/provider/storageclass		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/internal/provider/cloudinitsecret		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/internal/provider/network		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/internal/provider/keypair		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/internal/provider/image		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/internal/provider/vlanconfig		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/internal/provider/volume		coverage: 0.0% of statements
=== RUN   Test_checkKeyPairsInUserData
=== RUN   Test_checkKeyPairsInUserData/correct_ssh_authorized_keys_in_root
=== RUN   Test_checkKeyPairsInUserData/wrong_ssh_authorized_keys_in_root
=== RUN   Test_checkKeyPairsInUserData/correct_ssh_authorized_keys_in_users
=== RUN   Test_checkKeyPairsInUserData/wrong_ssh_authorized_keys_in_users
=== RUN   Test_checkKeyPairsInUserData/no_ssh_authorized_keys
=== RUN   Test_checkKeyPairsInUserData/empty_content
--- PASS: Test_checkKeyPairsInUserData (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/correct_ssh_authorized_keys_in_root (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/wrong_ssh_authorized_keys_in_root (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/correct_ssh_authorized_keys_in_users (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/wrong_ssh_authorized_keys_in_users (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/no_ssh_authorized_keys (0.00s)
    --- PASS: Test_checkKeyPairsInUserData/empty_content (0.00s)
PASS
coverage: 2.8% of statements
ok  	github.com/harvester/terraform-provider-harvester/internal/provider/virtualmachine	0.030s	coverage: 2.8% of statements
?   	github.com/harvester/terraform-provider-harvester/pkg/constants	[no test files]
	github.com/harvester/terraform-provider-harvester/pkg/helper		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/internal/util		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/pkg/client		coverage: 0.0% of statements
	github.com/harvester/terraform-provider-harvester/pkg/importer		coverage: 0.0% of statements
=== RUN   TestAccImage_basic
--- PASS: TestAccImage_basic (82.69s)
=== RUN   TestAccKeyPair_basic
--- PASS: TestAccKeyPair_basic (11.63s)
=== RUN   TestAccNetwork_basic
--- PASS: TestAccNetwork_basic (12.90s)
=== RUN   TestAccStorageClass_basic
--- PASS: TestAccStorageClass_basic (10.89s)
=== RUN   TestAccVirtualMachine_basic
--- PASS: TestAccVirtualMachine_basic (41.30s)
=== RUN   TestAccVirtualMachine_cpu_pinning
    resource_virtualmachine_test.go:212: enable cpu manager on node harvester-node-0
    resource_virtualmachine_test.go:265: disable cpu manager on node harvester-node-0
--- PASS: TestAccVirtualMachine_cpu_pinning (131.90s)
=== RUN   TestAccVirtualMachine_input
--- PASS: TestAccVirtualMachine_input (30.89s)
=== RUN   TestAccVolume_basic
--- PASS: TestAccVolume_basic (11.88s)
PASS
coverage: 46.3% of statements
ok  	github.com/harvester/terraform-provider-harvester/internal/tests	334.103s	coverage: 46.3% of statements
INFO[0358] docker cp /go/src/github.com/harvester/terraform-provider-harvester/bin .
Successfully copied 139MB to /home/brandboat/SUSE/harvester/terraform-provider-harvester/.
INFO[0358] docker cp /go/src/github.com/harvester/terraform-provider-harvester/dist .
Successfully copied 139MB to /home/brandboat/SUSE/harvester/terraform-provider-harvester/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants