Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: error="Error:DirExist:Operation not permitted" extra="get local used size failed" #38169

Closed
1 task done
sunwsh opened this issue Dec 3, 2024 · 3 comments
Closed
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@sunwsh
Copy link

sunwsh commented Dec 3, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: milvus-v2.4.14
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2): null
- OS(Ubuntu or CentOS): Ubuntu-20.04
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

docker run --net=host -it -v /tmp/wsh:/work -w /milvus milvusdb/milvus-gpu:ubuntu-v2.4.14 /bin/bash

./bin/milvus run querynode &

然后程序报错。

Expected Behavior

看代码是这段程序报错了

CStatus
GetLocalUsedSize(const char* c_dir, int64_t* size) {
    try {
        auto local_chunk_manager =
            milvus::storage::LocalChunkManagerSingleton::GetInstance()
                .GetChunkManager();
        std::string dir(c_dir);
        if (local_chunk_manager->DirExist(dir)) {
            *size = local_chunk_manager->GetSizeOfDir(dir);
        } else {
            *size = 0;
        }
        return milvus::SuccessCStatus();
    } catch (std::exception& e) {
        return milvus::FailureCStatus(&e);
    }
}

发现个问题,在 querynode 上这个 local_chunk_manager 对象应该是 nullptr, 因为看代码 func InitLocalChunkManager(path string) 是在之后发生的。

上面代码应该判断一下 local_chunk_manager 不能为空, 或者这个判断 根本就不该放在这里

		localUsedSize, err := segments.GetLocalUsedSize(node.ctx, localRootPath)
		if err != nil {
			log.Warn("get local used size failed", zap.Error(err))
			initError = err
			return
		}

Steps To Reproduce

有的机器就必先,有的机器无法复现,原因还没有找到。

Milvus Log

[2024/12/02 18:39:09.883 +08:00] [WARN] [segments/cgo_util.go:86] ["CStatus returns err"] [error="Error:DirExist:Operation not permitted"] 
																				[extra="get local used size failed"]
[2024/12/02 18:39:09.883 +08:00] [WARN] [querynodev2/server.go:301] ["get local used size failed"] [error="Error:DirExist:Operation not permitted"]
[2024/12/02 18:39:09.883 +08:00] [ERROR] [querynode/service.go:144] ["QueryNode init error: "] [error="Error:DirExist:Operation not permitted"] 
					[stack="github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).init\n
					\t/work/milvus/internal/distributed/querynode/service.go:144\n
					github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).Run\n
					\t/work/milvus/internal/distributed/querynode/service.go:222\n
					github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n
					\t/work/milvus/cmd/components/query_node.go:59\n
					github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n
					\t/work/milvus/cmd/roles/roles.go:126"]
[2024/12/02 18:39:09.883 +08:00] [ERROR] [components/query_node.go:60] ["QueryNode starts error"] [error="Error:DirExist:Operation not permitted"] [stack="github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n\t/work/milvus/cmd/components/query_node.go:60\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/work/milvus/cmd/roles/roles.go:126"]

Anything else?

@sunwsh sunwsh added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 3, 2024
@yanliang567
Copy link
Contributor

/assign @czs007
/unassign

@sre-ci-robot sre-ci-robot assigned czs007 and unassigned yanliang567 Dec 3, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 3, 2024
@xiaofan-luan
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: milvus-v2.4.14
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2): null
- OS(Ubuntu or CentOS): Ubuntu-20.04
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

docker run --net=host -it -v /tmp/wsh:/work -w /milvus milvusdb/milvus-gpu:ubuntu-v2.4.14 /bin/bash

./bin/milvus run querynode &

然后程序报错。

Expected Behavior

看代码是这段程序报错了

CStatus
GetLocalUsedSize(const char* c_dir, int64_t* size) {
    try {
        auto local_chunk_manager =
            milvus::storage::LocalChunkManagerSingleton::GetInstance()
                .GetChunkManager();
        std::string dir(c_dir);
        if (local_chunk_manager->DirExist(dir)) {
            *size = local_chunk_manager->GetSizeOfDir(dir);
        } else {
            *size = 0;
        }
        return milvus::SuccessCStatus();
    } catch (std::exception& e) {
        return milvus::FailureCStatus(&e);
    }
}

发现个问题,在 querynode 上这个 local_chunk_manager 对象应该是 nullptr, 因为看代码 func InitLocalChunkManager(path string) 是在之后发生的。

上面代码应该判断一下 local_chunk_manager 不能为空, 或者这个判断 根本就不该放在这里

		localUsedSize, err := segments.GetLocalUsedSize(node.ctx, localRootPath)
		if err != nil {
			log.Warn("get local used size failed", zap.Error(err))
			initError = err
			return
		}

Steps To Reproduce

有的机器就必先,有的机器无法复现,原因还没有找到。

Milvus Log

[2024/12/02 18:39:09.883 +08:00] [WARN] [segments/cgo_util.go:86] ["CStatus returns err"] [error="Error:DirExist:Operation not permitted"] 
																				[extra="get local used size failed"]
[2024/12/02 18:39:09.883 +08:00] [WARN] [querynodev2/server.go:301] ["get local used size failed"] [error="Error:DirExist:Operation not permitted"]
[2024/12/02 18:39:09.883 +08:00] [ERROR] [querynode/service.go:144] ["QueryNode init error: "] [error="Error:DirExist:Operation not permitted"] 
					[stack="github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).init\n
					\t/work/milvus/internal/distributed/querynode/service.go:144\n
					github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).Run\n
					\t/work/milvus/internal/distributed/querynode/service.go:222\n
					github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n
					\t/work/milvus/cmd/components/query_node.go:59\n
					github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n
					\t/work/milvus/cmd/roles/roles.go:126"]
[2024/12/02 18:39:09.883 +08:00] [ERROR] [components/query_node.go:60] ["QueryNode starts error"] [error="Error:DirExist:Operation not permitted"] [stack="github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n\t/work/milvus/cmd/components/query_node.go:60\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/work/milvus/cmd/roles/roles.go:126"]

Anything else?

check the access of this directory.
Operation not permitted -> means milvus don't have access of this directory.

@xiaofan-luan
Copy link
Collaborator

/assign @sunwsh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants