Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: S3 Milvus - can insert entities but failing to create index (Error:GetObjectSize: No response body.) #25939

Closed
1 task done
cyrusvahidi opened this issue Jul 26, 2023 · 25 comments
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@cyrusvahidi
Copy link

cyrusvahidi commented Jul 26, 2023

Is there an existing issue for this?

  • I have searched the existing issues
  • Yes, the issue has been reported before, but no clear solution?

Environment

- Milvus version: 2.2.8
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus
- OS(Ubuntu or CentOS): Amazon Linux
- CPU/Memory: 16GB
- GPU: 
- Others:

Current Behavior

I have deployed Milvus standalone to my EC2 instance, connected to S3 for storage.

The connection seems to be successful. Insertions are successful, and files appear in my S3 bucket.

However, when I create an index, the command hangs and I get the following error logs:

milvus-standalone  | [2023/07/25 21:19:03.009 +00:00] [INFO] [indexnode/task.go:317] ["index params are ready"] [buildID=443101205607962167] ["index params"="{\"dim\":\"512\",\"index_type\":\"IVF_FLAT\",\"metric_type\":\"IP\",\"nlist\":\"4096\"}"]
milvus-standalone  | [2023/07/25 21:19:03.009 +00:00] [INFO] [gc/gc_tuner.go:84] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=138] ["total memory"=663] ["next GC"=302] ["new GOGC"=200] [gc-pause=54.667µs] [gc-pause-end=1690319943000578687]
milvus-standalone  | 2023-07-25 21:19:03,010 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize:  No response body.\n"]
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize:  No response body."] [stack="github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"]
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=443101205607962167] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize:  No response body."]

Here is my Milvus standalone config, kindly provided by @locustbaby:

docker-compose.yml ``` services: etcd: container_name: milvus-etcd image: quay.io/coreos/etcd:v3.5.5 environment: - ETCD_AUTO_COMPACTION_MODE=revision - ETCD_AUTO_COMPACTION_RETENTION=1000 - ETCD_QUOTA_BACKEND_BYTES=4294967296 - ETCD_SNAPSHOT_COUNT=50000 volumes: - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.2.8
    command: ["milvus", "run", "standalone"]
    environment:
      ETCD_ENDPOINTS: etcd:2379
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
      - ./milvus.yaml:/milvus/configs/milvus.yaml
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"

networks:
  default:
    name: milvus
</details>


<details>
<summary>milvus.yaml</summary>
    # Licensed to the LF AI & Data foundation under one
    # or more contributor license agreements. See the NOTICE file
    # distributed with this work for additional information
    # regarding copyright ownership. The ASF licenses this file
    # to you under the Apache License, Version 2.0 (the
    # "License"); you may not use this file except in compliance
    # with the License. You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    # Related configuration of etcd, used to store Milvus metadata & service discovery.
    etcd:
      endpoints:
        - localhost:2379
      rootPath: by-dev # The root path where data is stored in etcd
      metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath
      kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath
      log:
        # path is one of:
        #  - "default" as os.Stderr,
        #  - "stderr" as os.Stderr,
        #  - "stdout" as os.Stdout,
        #  - file path to append server logs to.
        # please adjust in embedded Milvus: /tmp/milvus/logs/etcd.log
        path: stdout
        level: info # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
      use:
        # please adjust in embedded Milvus: true
        embed: false # Whether to enable embedded Etcd (an in-process EtcdServer).
      data:
        # Embedded Etcd only.
        # please adjust in embedded Milvus: /tmp/milvus/etcdData/
        dir: default.etcd
      ssl:
        enabled: false # Whether to support ETCD secure connection mode
        tlsCert: /path/to/etcd-client.pem # path to your cert file
        tlsKey: /path/to/etcd-client-key.pem # path to your key file
        tlsCACert: /path/to/ca.pem # path to your CACert file
        # TLS min version
        # Optional values: 1.0, 1.1, 1.2, 1.3。
        # We recommend using version 1.2 and above
        tlsMinVersion: 1.3
    
    # Default value: etcd
    # Valid values: [etcd, mysql]
    metastore:
      type: etcd
    
    # Related configuration of mysql, used to store Milvus metadata.
    mysql:
      username: root
      password: 123456
      address: localhost
      port: 3306
      dbName: milvus_meta
      driverName: mysql
      maxOpenConns: 20
      maxIdleConns: 5
    
    # please adjust in embedded Milvus: /tmp/milvus/data/
    localStorage:
      path: /var/lib/milvus/data/
    
    # Related configuration of MinIO/S3/GCS or any other service supports S3 API, which is responsible for data persistence for Milvus.
    # We refer to the storage service as MinIO/S3 in the following description for simplicity.
    
    # Milvus supports three MQ: rocksmq(based on RockDB), Pulsar and Kafka, which should be reserved in config what you use.
    # There is a note about enabling priority if we config multiple mq in this file
    # 1. standalone(local) mode: rockskmq(default) > Pulsar > Kafka
    # 2. cluster mode:  Pulsar(default) > Kafka (rocksmq is unsupported)
    
    # Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.
    pulsar:
      address: localhost # Address of pulsar
      port: 6650 # Port of pulsar
      webport: 80 # Web port of pulsar, if you connect direcly without proxy, should use 8080
      maxMessageSize: 5242880 # 5 * 1024 * 1024 Bytes, Maximum size of each message in pulsar.
      tenant: public
      namespace: default
    
    # If you want to enable kafka, needs to comment the pulsar configs
    kafka:
      producer:
        client.id: dc
      consumer:
        client.id: dc1
    #  brokerList: localhost1:9092,localhost2:9092,localhost3:9092
    #  saslUsername: username
    #  saslPassword: password
    #  saslMechanisms: PLAIN
    #  securityProtocol: SASL_SSL
    
    rocksmq:
      # please adjust in embedded Milvus: /tmp/milvus/rdb_data
      path: /var/lib/milvus/rdb_data # The path where the message is stored in rocksmq
      rocksmqPageSize: 67108864 # 64 MB, 64 * 1024 * 1024 bytes, The size of each page of messages in rocksmq
      retentionTimeInMinutes: 4320 # 3 days, 3 * 24 * 60 minutes, The retention time of the message in rocksmq.
      retentionSizeInMB: 8192 # 8 GB, 8 * 1024 MB, The retention size of the message in rocksmq.
      compactionInterval: 86400 # 1 day, trigger rocksdb compaction every day to remove deleted data
      lrucacheratio: 0.06 # rocksdb cache memory ratio
    
    # Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests
    rootCoord:
      address: localhost
      port: 53100
      enableActiveStandby: false  # Enable active-standby
    
      dmlChannelNum: 16 # The number of dml channels created at system startup
      maxDatabaseNum: 64 # Maximum number of database
      maxPartitionNum: 4096 # Maximum number of partitions in a collection
      minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed
    
      # (in seconds) Duration after which an import task will expire (be killed). Default 900 seconds (15 minutes).
      # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
      importTaskExpiration: 900
      # (in seconds) Milvus will keep the record of import tasks for at least `importTaskRetention` seconds. Default 86400
      # seconds (24 hours).
      # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
      importTaskRetention: 86400
    
    # Related configuration of proxy, used to validate client requests and reduce the returned results.
    proxy:
      port: 19530
      internalPort: 19529
      http:
        enabled: true # Whether to enable the http server
        debug_mode: false # Whether to enable http server debug mode
    
      timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick
      msgStream:
        timeTick:
          bufSize: 512
      maxNameLength: 255  # Maximum length of name for a collection or alias
      maxFieldNum: 64     # Maximum number of fields in a collection.
      # As of today (2.2.0 and after) it is strongly DISCOURAGED to set maxFieldNum >= 64.
      # So adjust at your risk!
      maxDimension: 32768 # Maximum dimension of a vector
      # It's strongly DISCOURAGED to set `maxShardNum` > 64.
      maxShardNum: 16 # Maximum number of shards in a collection
      maxTaskNum: 1024 # max task number of proxy task queue
      # please adjust in embedded Milvus: false
      ginLogging: true # Whether to produce gin logs.
      grpc:
        serverMaxRecvSize: 67108864 # 64M
        serverMaxSendSize: 67108864 # 64M
        clientMaxRecvSize: 104857600 # 100 MB, 100 * 1024 * 1024
        clientMaxSendSize: 104857600 # 100 MB, 100 * 1024 * 1024
    
    
    
    # Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.
    queryCoord:
      address: localhost
      port: 19531
      autoHandoff: true # Enable auto handoff
      autoBalance: true # Enable auto balance
      balancer: ScoreBasedBalancer # Balancer to use
      globalRowCountFactor: 0.1 # expert parameters, only used by scoreBasedBalancer
      scoreUnbalanceTolerationFactor: 0.05 # expert parameters, only used by scoreBasedBalancer
      reverseUnBalanceTolerationFactor: 1.3 #expert parameters, only used by scoreBasedBalancer
      overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload
      balanceIntervalSeconds: 60
      memoryUsageMaxDifferencePercentage: 30
      checkInterval: 10000
      channelTaskTimeout: 60000 # 1 minute
      segmentTaskTimeout: 120000 # 2 minute
      distPullInterval: 500
      loadTimeoutSeconds: 1800
      checkHandoffInterval: 5000
      taskMergeCap: 8
      taskExecutionCap: 256
      enableActiveStandby: false  # Enable active-standby
      refreshTargetsIntervalSeconds: 300
    
    # Related configuration of queryNode, used to run hybrid search between vector and scalar data.
    queryNode:
      cacheSize: 32 # GB, default 32 GB, `cacheSize` is the memory used for caching data for faster query. The `cacheSize` must be less than system memory size.
      port: 21123
      loadMemoryUsageFactor: 3 # The multiply factor of calculating the memory usage while loading segments
      enableDisk: true # enable querynode load disk index, and search on disk index
      maxDiskUsagePercentage: 95
      gracefulStopTimeout: 30
    
      stats:
        publishInterval: 1000 # Interval for querynode to report node information (milliseconds)
      dataSync:
        flowGraph:
          maxQueueLength: 1024 # Maximum length of task queue in flowgraph
          maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
      # Segcore will divide a segment into multiple chunks to enbale small index
      segcore:
        chunkRows: 1024 # The number of vectors in a chunk.
        knowhereThreadPoolNumRatio: 4 # Use more threads to make good use of SSD throughput
        # Note: we have disabled segment small index since @2022.05.12. So below related configurations won't work.
        # We won't create small index for growing segments and search on these segments will directly use bruteforce scan.
        smallIndex:
          nlist: 128 # small index nlist, recommend to set sqrt(chunkRows), must smaller than chunkRows/8
          nprobe: 16 # nprobe to search small index, based on your accuracy requirement, must smaller than nlist
      cache:
        enabled: true
        memoryLimit: 2147483648 # 2 GB, 2 * 1024 *1024 *1024
    
      scheduler:
        receiveChanSize: 10240
        unsolvedQueueSize: 10240
        # maxReadConcurrentRatio is the concurrency ratio of read task (search task and query task).
        # Max read concurrency would be the value of `runtime.NumCPU * maxReadConcurrentRatio`.
        # It defaults to 2.0, which means max read concurrency would be the value of runtime.NumCPU * 2.
        # Max read concurrency must greater than or equal to 1, and less than or equal to runtime.NumCPU * 100.
        maxReadConcurrentRatio: 2.0 # (0, 100]
        cpuRatio: 10.0 # ratio used to estimate read task cpu usage.
        # maxTimestampLag is the max ts lag between serviceable and guarantee timestamp.
        # if the lag is larger than this config, scheduler will return error without waiting.
        # the valid value is [3600, infinite)
        maxTimestampLag: 86400
        # read task schedule policy: fifo(by default), user-task-polling.
        scheduleReadPolicy: 
          # fifo: A FIFO queue support the schedule.
          # user-task-polling: 
          #     The user's tasks will be polled one by one and scheduled. 
          #     Scheduling is fair on task granularity.
          #     The policy is based on the username for authentication.
          #     And an empty username is considered the same user. 
          #     When there are no multi-users, the policy decay into FIFO
          name: fifo
          # user-task-polling configure:
          taskQueueExpire: 60 # 1 min by default, expire time of inner user task queue since queue is empty.
    
      grouping:
        enabled: true
        maxNQ: 50000
        topKMergeRatio: 10.0
    
    indexCoord:
      address: localhost
      port: 31000
      enableActiveStandby: false  # Enable active-standby
    
      minSegmentNumRowsToEnableIndex: 1024 # It's a threshold. When the segment num rows is less than this value, the segment will not be indexed
    
      bindIndexNodeMode:
        enable: false
        address: "localhost:22930"
        withCred: false
        nodeID: 0
    
      gc:
        interval: 600 # gc interval in seconds
    
      scheduler:
        interval: 1000 # scheduler interval in Millisecond
    
    indexNode:
      port: 21121
      enableDisk: true # enable index node build disk vector index
      maxDiskUsagePercentage: 95
      gracefulStopTimeout: 30
    
      scheduler:
        buildParallel: 1
    
    dataCoord:
      address: localhost
      port: 13333
      enableCompaction: true # Enable data segment compaction
      enableGarbageCollection: true
      enableActiveStandby: false  # Enable active-standby
    
      channel:
        watchTimeoutInterval: 120 # Timeout on watching channels (in seconds). Datanode tickler update watch progress will reset timeout timer.
        balanceSilentDuration: 300 # The duration before the channelBalancer on datacoord to run
        balanceInterval: 360 #The interval for the channelBalancer on datacoord to check balance status
    
      segment:
        maxSize: 512 # Maximum size of a segment in MB
        diskSegmentMaxSize: 2048 # Maximun size of a segment in MB for collection which has Disk index
        # Minimum proportion for a segment which can be sealed.
        # Sealing early can prevent producing large growing segments in case these segments might slow down our search/query.
        # Segments that sealed early will be compacted into a larger segment (within maxSize) eventually.
        sealProportion: 0.23
        assignmentExpiration: 2000 # The time of the assignment expiration in ms
        maxLife: 86400 # The max lifetime of segment in seconds, 24*60*60
        # If a segment didn't accept dml records in `maxIdleTime` and the size of segment is greater than
        # `minSizeFromIdleToSealed`, Milvus will automatically seal it.
        maxIdleTime: 600 # The max idle time of segment in seconds, 10*60.
        minSizeFromIdleToSealed: 16 # The min size in MB of segment which can be idle from sealed.
        # The max number of binlog file for one segment, the segment will be sealed if
        # the number of binlog file reaches to max value.
        maxBinlogFileNumber: 32
        smallProportion: 0.5 # The segment is considered as "small segment" when its # of rows is smaller than
        # (smallProportion * segment max # of rows).
        compactableProportion: 0.85 # A compaction will happen on small segments if the segment after compaction will have
        # over (compactableProportion * segment max # of rows) rows.
        # MUST BE GREATER THAN OR EQUAL TO <smallProportion>!!!
        expansionRate: 1.25 # During compaction, the size of segment # of rows is able to exceed segment max # of rows by (expansionRate-1) * 100%.
    
      compaction:
        enableAutoCompaction: true
    
      gc:
        interval: 3600 # gc interval in seconds
        missingTolerance: 86400 # file meta missing tolerance duration in seconds, 60*24
        dropTolerance: 3600 # file belongs to dropped entity tolerance duration in seconds
    
    
    dataNode:
      port: 21124
    
      dataSync:
        flowGraph:
          maxQueueLength: 1024 # Maximum length of task queue in flowgraph
          maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
        maxParallelSyncTaskNum: 2 # Maximum number of sync tasks executed in parallel in each flush manager
      segment:
        # Max buffer size to flush for a single segment.
        insertBufSize: 16777216 # Bytes, 16 MB
        # Max buffer size to flush del for a single channel
        deleteBufBytes: 67108864 # Bytes, 64MB
        # The period to sync segments if buffer is not empty.
        syncPeriod: 600 # Seconds, 10min
    
      memory:
        forceSyncEnable: true # `true` to force sync if memory usage is too high
        forceSyncSegmentNum: 1 # number of segments to sync, segments with top largest buffer will be synced.
        watermarkStandalone: 0.2 # memory watermark for standalone, upon reaching this watermark, segments will be synced.
        watermarkCluster: 0.5 # memory watermark for cluster, upon reaching this watermark, segments will be synced.
    
    # Configures the system log output.
    log:
      level: debug # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
      stdout: "true" # default true, print log to stdout
      file:
        # please adjust in embedded Milvus: /tmp/milvus/logs
        rootPath: "" # root dir path to put logs, default "" means no log file will print
        maxSize: 300 # MB
        maxAge: 10 # Maximum time for log retention in day.
        maxBackups: 20
      format: text # text/json
    
    grpc:
      log:
        level: WARNING
    
      serverMaxRecvSize: 536870912 # 512MB
      serverMaxSendSize: 536870912 # 512MB
      clientMaxRecvSize: 104857600 # 100 MB, 100 * 1024 * 1024
      clientMaxSendSize: 104857600 # 100 MB, 100 * 1024 * 1024
    
      client:
        dialTimeout: 200
        keepAliveTime: 10000
        keepAliveTimeout: 20000
        maxMaxAttempts: 5
        initialBackOff: 1.0
        maxBackoff: 60.0
        backoffMultiplier: 2.0
      server:
        retryTimes: 5 # retry times when receiving a grpc return value with a failure and retryable state code
    
    # Configure the proxy tls enable.
    tls:
      serverPemPath: configs/cert/server.pem
      serverKeyPath: configs/cert/server.key
      caPemPath: configs/cert/ca.pem
    
    
    common:
      # Channel name generation rule: ${namePrefix}-${ChannelIdx}
      chanNamePrefix:
        cluster: "by-dev"
        rootCoordTimeTick: "rootcoord-timetick"
        rootCoordStatistics: "rootcoord-statistics"
        rootCoordDml: "rootcoord-dml"
        rootCoordDelta: "rootcoord-delta"
        search: "search"
        searchResult: "searchResult"
        queryTimeTick: "queryTimeTick"
        queryNodeStats: "query-node-stats"
        # Cmd for loadIndex, flush, etc...
        cmd: "cmd"
        dataCoordStatistic: "datacoord-statistics-channel"
        dataCoordTimeTick: "datacoord-timetick-channel"
        dataCoordSegmentInfo: "segment-info-channel"
    
      # Sub name generation rule: ${subNamePrefix}-${NodeID}
      subNamePrefix:
        rootCoordSubNamePrefix: "rootCoord"
        proxySubNamePrefix: "proxy"
        queryNodeSubNamePrefix: "queryNode"
        dataNodeSubNamePrefix: "dataNode"
        dataCoordSubNamePrefix: "dataCoord"
    
      defaultPartitionName: "_default"  # default partition name for a collection
      defaultIndexName: "_default_idx"  # default index name
      retentionDuration: 0     # time travel reserved time, insert/delete will not be cleaned in this period. disable it by default
      entityExpiration: -1     # Entity expiration in seconds, CAUTION make sure entityExpiration >= retentionDuration and -1 means never expire
    
      gracefulTime: 5000 # milliseconds. it represents the interval (in ms) by which the request arrival time needs to be subtracted in the case of Bounded Consistency.
      gracefulStopTimeout: 30 # seconds. it will force quit the server if the graceful stop process is not completed during this time.
    
      # Default value: auto
      # Valid values: [auto, avx512, avx2, avx, sse4_2]
      # This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.
      simdType: auto
      indexSliceSize: 16 # MB
      DiskIndex:
        MaxDegree: 56
        SearchListSize: 100
        PQCodeBudgetGBRatio: 0.125
        BuildNumThreadsRatio: 1.0
        SearchCacheBudgetGBRatio: 0.10
        LoadNumThreadRatio: 8.0
        BeamWidthRatio: 4.0
      # This parameter specify how many times the number of threads is the number of cores
      threadCoreCoefficient : 10
    
      # please adjust in embedded Milvus: local
      storageType: minio
    
      security:
        authorizationEnabled: false
        # The superusers will ignore some system check processes,
        # like the old password verification when updating the credential
        # superUsers:
        #  - "root"
        # tls mode values [0, 1, 2]
        # 0 is close, 1 is one-way authentication, 2 is two-way authentication.
        tlsMode: 0
    
      session:
        ttl: 20 # ttl value when session granting a lease to register service
        retryTimes: 30 # retry times when session sending etcd requests
    
      ImportMaxFileSize: 17179869184  # 16 * 1024 * 1024 * 1024
      # max file size to import for bulkInsert
    
    # QuotaConfig, configurations of Milvus quota and limits.
    # By default, we enable:
    #   1. TT protection;
    #   2. Memory protection.
    #   3. Disk quota protection.
    # You can enable:
    #   1. DML throughput limitation;
    #   2. DDL, DQL qps/rps limitation;
    #   3. DQL Queue length/latency protection;
    #   4. DQL result rate protection;
    # If necessary, you can also manually force to deny RW requests.
    quotaAndLimits:
      enabled: true # `true` to enable quota and limits, `false` to disable.
      limits:
        maxCollectionNum: 65536
        maxCollectionNumPerDB: 65536
      # quotaCenterCollectInterval is the time interval that quotaCenter
      # collects metrics from Proxies, Query cluster and Data cluster.
      quotaCenterCollectInterval: 3 # seconds, (0 ~ 65536)
    
      ddl: # ddl limit rates, default no limit.
        enabled: false
        collectionRate: -1 # qps, default no limit, rate for CreateCollection, DropCollection, LoadCollection, ReleaseCollection
        partitionRate: -1 # qps, default no limit, rate for CreatePartition, DropPartition, LoadPartition, ReleasePartition
    
      indexRate:
        enabled: false
        max: -1 # qps, default no limit, rate for CreateIndex, DropIndex
      flushRate:
        enabled: false
        max: -1 # qps, default no limit, rate for flush
      compactionRate:
        enabled: false
        max: -1 # qps, default no limit, rate for manualCompaction
    
      # dml limit rates, default no limit.
      # The maximum rate will not be greater than `max`.
      dml:
        enabled: false
        insertRate:
          collection:
            max: -1 # MB/s, default no limit
          max: -1 # MB/s, default no limit
        deleteRate:
          collection:
            max: -1 # MB/s, default no limit
          max: -1 # MB/s, default no limit
        bulkLoadRate: # not support yet. TODO: limit bulkLoad rate
          collection:
            max: -1 # MB/s, default no limit
          max: -1 # MB/s, default no limit
    
      # dql limit rates, default no limit.
      # The maximum rate will not be greater than `max`.
      dql:
        enabled: false
        searchRate:
          collection:
            max: -1 # vps (vectors per second), default no limit
          max: -1 # vps (vectors per second), default no limit
        queryRate:
          collection:
            max: -1 # qps, default no limit
          max: -1 # qps, default no limit
    
      # limitWriting decides whether dml requests are allowed.
      limitWriting:
        # forceDeny `false` means dml requests are allowed (except for some
        # specific conditions, such as memory of nodes to water marker), `true` means always reject all dml requests.
        forceDeny: false
        ttProtection:
          enabled: false
          # maxTimeTickDelay indicates the backpressure for DML Operations.
          # DML rates would be reduced according to the ratio of time tick delay to maxTimeTickDelay,
          # if time tick delay is greater than maxTimeTickDelay, all DML requests would be rejected.
          maxTimeTickDelay: 300 # in seconds
        memProtection:
          enabled: true
          # When memory usage > memoryHighWaterLevel, all dml requests would be rejected;
          # When memoryLowWaterLevel < memory usage < memoryHighWaterLevel, reduce the dml rate;
          # When memory usage < memoryLowWaterLevel, no action.
          # memoryLowWaterLevel should be less than memoryHighWaterLevel.
          dataNodeMemoryLowWaterLevel: 0.85 # (0, 1], memoryLowWaterLevel in DataNodes
          dataNodeMemoryHighWaterLevel: 0.95 # (0, 1], memoryHighWaterLevel in DataNodes
          queryNodeMemoryLowWaterLevel: 0.85 # (0, 1], memoryLowWaterLevel in QueryNodes
          queryNodeMemoryHighWaterLevel: 0.95 # (0, 1], memoryHighWaterLevel in QueryNodes
        growingSegmentsSizeProtection:
          # 1. No action will be taken if the ratio of growing segments size is less than the low water level.
          # 2. The DML rate will be reduced if the ratio of growing segments size is greater than the low water level and less than the high water level.
          # 3. All DML requests will be rejected if the ratio of growing segments size is greater than the high water level.
          enabled: false
          lowWaterLevel: 0.2
          highWaterLevel: 0.4
        diskProtection:
          # When the total file size of object storage is greater than `diskQuota`, all dml requests would be rejected;
          enabled: true
          diskQuota: -1 # MB, (0, +inf), default no limit
          diskQuotaPerCollection: -1 # MB, (0, +inf), default no limit
    
      # limitReading decides whether dql requests are allowed.
      limitReading:
        # forceDeny `false` means dql requests are allowed (except for some
        # specific conditions, such as collection has been dropped), `true` means always reject all dql requests.
        forceDeny: false
        queueProtection:
          enabled: false
          # nqInQueueThreshold indicated that the system was under backpressure for Search/Query path.
          # If NQ in any QueryNode's queue is greater than nqInQueueThreshold, search&query rates would gradually cool off
          # until the NQ in queue no longer exceeds nqInQueueThreshold. We think of the NQ of query request as 1.
          nqInQueueThreshold: -1 # int, default no limit
    
          # queueLatencyThreshold indicated that the system was under backpressure for Search/Query path.
          # If dql latency of queuing is greater than queueLatencyThreshold, search&query rates would gradually cool off
          # until the latency of queuing no longer exceeds queueLatencyThreshold.
          # The latency here refers to the averaged latency over a period of time.
          queueLatencyThreshold: -1 # milliseconds, default no limit
        resultProtection:
          enabled: false
          # maxReadResultRate indicated that the system was under backpressure for Search/Query path.
          # If dql result rate is greater than maxReadResultRate, search&query rates would gradually cool off
          # until the read result rate no longer exceeds maxReadResultRate.
          maxReadResultRate: -1 # MB/s, default no limit
        # coolOffSpeed is the speed of search&query rates cool off.
        coolOffSpeed: 0.9 # (0, 1]
    
    autoIndex:
      params:
        build: '{"M": 18,"efConstruction": 240,"index_type": "HNSW", "metric_type": "IP"}'
    
    minio:
      address: s3.us-west-2.amazonaws.com # localhost # Address of MinIO/S3
      port: 443 # 9000   # Port of MinIO/S3
      accessKeyID: <ak> # minioadmin # accessKeyID of MinIO/S3
      secretAccessKey: <sk> # minioadmin # MinIO/S3 encryption string
      useSSL: true # Access to MinIO/S3 with SSL
      bucketName: "bucketname" # Bucket name in MinIO/S3
      rootPath: test # The root path where the message is stored in MinIO/S3
      # Whether to use IAM role to access S3/GCS instead of access/secret keys
        # For more infomation, refer to
        # aws: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
        # gcp: https://cloud.google.com/storage/docs/access-control/iam
      useIAM: false
      cloudProvider: aws
</details>

### Expected Behavior

_No response_

### Steps To Reproduce
1. Deploy Milvus standalone

2. Create collection

from pymilvus import CollectionSchema, FieldSchema, DataType, Collection, connections

connections.connect(alias=, host=, port=)

book_id = FieldSchema(
name="book_id",
dtype=DataType.INT64,
is_primary=True,
)
book_name = FieldSchema(
name="book_name",
dtype=DataType.VARCHAR,
max_length=200,
)
word_count = FieldSchema(
name="word_count",
dtype=DataType.INT64,
)
book_intro = FieldSchema(
name="book_intro",
dtype=DataType.FLOAT_VECTOR,
dim=2
)
schema = CollectionSchema(
fields=[book_id, book_name, word_count, book_intro],
description="Test book search",
enable_dynamic_field=True
)
collection_name = "book"

collection = Collection(
collection_name, schema, consistency_level="Strong", using="default2"
)


Insert entities

import numpy as np

entities = [
[i for i in range(1000)],
["book_name" for i in range(1000)],
[100 for i in range(1000)],
[np.zeros(2) for i in range(1000)],
]

insert_result = collection.insert(entities)

6. Create Index (ERROR)

def create_index(
collection,
index_field="embeddings",
index_params={
"index_type": "IVF_FLAT",
"params": {"nlist": 4096},
"metric_type": "IP",
},
):
collection.create_index(field_name=index_field, index_params=index_params)
create_index(collection, "book_intro")


### Milvus Log

milvus-standalone | [2023/07/25 21:19:03.009 +00:00] [INFO] [indexnode/task.go:317] ["index params are ready"] [buildID=443101205607962167] ["index params"="{"dim":"512","index_type":"IVF_FLAT","metric_type":"IP","nlist":"4096"}"]
milvus-standalone | [2023/07/25 21:19:03.009 +00:00] [INFO] [gc/gc_tuner.go:84] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=138] ["total memory"=663] ["next GC"=302] ["new GOGC"=200] [gc-pause=54.667µs] [gc-pause-end=1690319943000578687]
milvus-standalone | 2023-07-25 21:19:03,010 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']
milvus-standalone | [2023/07/25 21:19:03.026 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize: No response body.\n"]
milvus-standalone | [2023/07/25 21:19:03.026 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize: No response body."] [stack="github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"]
milvus-standalone | [2023/07/25 21:19:03.026 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=443101205607962167] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize: No response body."]


### Anything else?

I have ensured that my EC2 instance and S3 bucket are in the same region. I have given all S3 privileges to my IAM user.
@cyrusvahidi cyrusvahidi added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 26, 2023
@yanliang567
Copy link
Contributor

/assign @jiaoew1991
sounds like a known issue to you?

/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 27, 2023
@jiaoew1991
Copy link
Contributor

/assign @zhagnlu
please take a look at the s3 problem

/unassign

@sre-ci-robot sre-ci-robot assigned zhagnlu and unassigned jiaoew1991 Jul 27, 2023
@zhagnlu
Copy link
Contributor

zhagnlu commented Jul 27, 2023

Could you upgrade milvus version to 2.2.12 which has ability to get more key log.

@zhagnlu
Copy link
Contributor

zhagnlu commented Jul 27, 2023

after upgrade, update milvus.yaml minio.logLevel to debug
image

@xiaofan-luan
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues
  • Yes, the issue has been reported before, but no clear solution?

Environment

- Milvus version: 2.2.8
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus
- OS(Ubuntu or CentOS): Amazon Linux
- CPU/Memory: 16GB
- GPU: 
- Others:

Current Behavior

I have deployed Milvus standalone to my EC2 instance, connected to S3 for storage.

The connection seems to be successful. Insertions are successful, and files appear in my S3 bucket.

However, when I create an index, the command hangs and I get the following error logs:

milvus-standalone  | [2023/07/25 21:19:03.009 +00:00] [INFO] [indexnode/task.go:317] ["index params are ready"] [buildID=443101205607962167] ["index params"="{\"dim\":\"512\",\"index_type\":\"IVF_FLAT\",\"metric_type\":\"IP\",\"nlist\":\"4096\"}"]
milvus-standalone  | [2023/07/25 21:19:03.009 +00:00] [INFO] [gc/gc_tuner.go:84] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=138] ["total memory"=663] ["next GC"=302] ["new GOGC"=200] [gc-pause=54.667µs] [gc-pause-end=1690319943000578687]
milvus-standalone  | 2023-07-25 21:19:03,010 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize:  No response body.\n"]
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize:  No response body."] [stack="github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"]
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=443101205607962167] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize:  No response body."]

Here is my Milvus standalone config, kindly provided by @locustbaby:

docker-compose.yml
milvus.yaml

Expected Behavior

No response

Steps To Reproduce

  1. Deploy Milvus standalone
  2. Create collection
from pymilvus import CollectionSchema, FieldSchema, DataType, Collection, connections

connections.connect(alias=<alias>, host=<host>, port=<port>)

book_id = FieldSchema(
  name="book_id",
  dtype=DataType.INT64,
  is_primary=True,
)
book_name = FieldSchema(
  name="book_name",
  dtype=DataType.VARCHAR,
  max_length=200,
)
word_count = FieldSchema(
  name="word_count",
  dtype=DataType.INT64,
)
book_intro = FieldSchema(
  name="book_intro",
  dtype=DataType.FLOAT_VECTOR,
  dim=2
)
schema = CollectionSchema(
  fields=[book_id, book_name, word_count, book_intro],
  description="Test book search",
  enable_dynamic_field=True
)
collection_name = "book"

collection = Collection(
    collection_name, schema, consistency_level="Strong", using="default2"
)

Insert entities

import numpy as np

entities = [
    [i for i in range(1000)],
    ["book_name" for i in range(1000)],
    [100 for i in range(1000)],
    [np.zeros(2) for i in range(1000)],
]

insert_result = collection.insert(entities)
  1. Create Index (ERROR)
def create_index(
    collection,
    index_field="embeddings",
    index_params={
        "index_type": "IVF_FLAT",
        "params": {"nlist": 4096},
        "metric_type": "IP",
    },
):
    collection.create_index(field_name=index_field, index_params=index_params)
create_index(collection, "book_intro")

Milvus Log

milvus-standalone  | [2023/07/25 21:19:03.009 +00:00] [INFO] [indexnode/task.go:317] ["index params are ready"] [buildID=443101205607962167] ["index params"="{\"dim\":\"512\",\"index_type\":\"IVF_FLAT\",\"metric_type\":\"IP\",\"nlist\":\"4096\"}"]
milvus-standalone  | [2023/07/25 21:19:03.009 +00:00] [INFO] [gc/gc_tuner.go:84] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=138] ["total memory"=663] ["next GC"=302] ["new GOGC"=200] [gc-pause=54.667µs] [gc-pause-end=1690319943000578687]
milvus-standalone  | 2023-07-25 21:19:03,010 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize:  No response body.\n"]
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize:  No response body."] [stack="github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"]
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=443101205607962167] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize:  No response body."]

Anything else?

I have ensured that my EC2 instance and S3 bucket are in the same region. I have given all S3 privileges to my IAM user.

I thought to solve this problem, you will need to specify region of your S3 in your milvus.yaml

@xiaofan-luan
Copy link
Collaborator

it would also work if you specify on your system env

@cyrusvahidi
Copy link
Author

Is there an existing issue for this?

  • I have searched the existing issues
  • Yes, the issue has been reported before, but no clear solution?

Environment

- Milvus version: 2.2.8
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus
- OS(Ubuntu or CentOS): Amazon Linux
- CPU/Memory: 16GB
- GPU: 
- Others:

Current Behavior

I have deployed Milvus standalone to my EC2 instance, connected to S3 for storage.
The connection seems to be successful. Insertions are successful, and files appear in my S3 bucket.
However, when I create an index, the command hangs and I get the following error logs:

milvus-standalone  | [2023/07/25 21:19:03.009 +00:00] [INFO] [indexnode/task.go:317] ["index params are ready"] [buildID=443101205607962167] ["index params"="{\"dim\":\"512\",\"index_type\":\"IVF_FLAT\",\"metric_type\":\"IP\",\"nlist\":\"4096\"}"]
milvus-standalone  | [2023/07/25 21:19:03.009 +00:00] [INFO] [gc/gc_tuner.go:84] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=138] ["total memory"=663] ["next GC"=302] ["new GOGC"=200] [gc-pause=54.667µs] [gc-pause-end=1690319943000578687]
milvus-standalone  | 2023-07-25 21:19:03,010 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize:  No response body.\n"]
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize:  No response body."] [stack="github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"]
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=443101205607962167] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize:  No response body."]

Here is my Milvus standalone config, kindly provided by @locustbaby:
docker-compose.yml
milvus.yaml

Expected Behavior

No response

Steps To Reproduce

  1. Deploy Milvus standalone
  2. Create collection
from pymilvus import CollectionSchema, FieldSchema, DataType, Collection, connections

connections.connect(alias=<alias>, host=<host>, port=<port>)

book_id = FieldSchema(
  name="book_id",
  dtype=DataType.INT64,
  is_primary=True,
)
book_name = FieldSchema(
  name="book_name",
  dtype=DataType.VARCHAR,
  max_length=200,
)
word_count = FieldSchema(
  name="word_count",
  dtype=DataType.INT64,
)
book_intro = FieldSchema(
  name="book_intro",
  dtype=DataType.FLOAT_VECTOR,
  dim=2
)
schema = CollectionSchema(
  fields=[book_id, book_name, word_count, book_intro],
  description="Test book search",
  enable_dynamic_field=True
)
collection_name = "book"

collection = Collection(
    collection_name, schema, consistency_level="Strong", using="default2"
)

Insert entities

import numpy as np

entities = [
    [i for i in range(1000)],
    ["book_name" for i in range(1000)],
    [100 for i in range(1000)],
    [np.zeros(2) for i in range(1000)],
]

insert_result = collection.insert(entities)
  1. Create Index (ERROR)
def create_index(
    collection,
    index_field="embeddings",
    index_params={
        "index_type": "IVF_FLAT",
        "params": {"nlist": 4096},
        "metric_type": "IP",
    },
):
    collection.create_index(field_name=index_field, index_params=index_params)
create_index(collection, "book_intro")

Milvus Log

milvus-standalone  | [2023/07/25 21:19:03.009 +00:00] [INFO] [indexnode/task.go:317] ["index params are ready"] [buildID=443101205607962167] ["index params"="{\"dim\":\"512\",\"index_type\":\"IVF_FLAT\",\"metric_type\":\"IP\",\"nlist\":\"4096\"}"]
milvus-standalone  | [2023/07/25 21:19:03.009 +00:00] [INFO] [gc/gc_tuner.go:84] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=138] ["total memory"=663] ["next GC"=302] ["new GOGC"=200] [gc-pause=54.667µs] [gc-pause-end=1690319943000578687]
milvus-standalone  | 2023-07-25 21:19:03,010 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize:  No response body.\n"]
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize:  No response body."] [stack="github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"]
milvus-standalone  | [2023/07/25 21:19:03.026 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=443101205607962167] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize:  No response body."]

Anything else?

I have ensured that my EC2 instance and S3 bucket are in the same region. I have given all S3 privileges to my IAM user.

I thought to solve this problem, you will need to specify region of your S3 in your milvus.yaml

yes I have set minio.address = s3.eu-north-1.amazonaws.com

@cyrusvahidi
Copy link
Author

cyrusvahidi commented Jul 27, 2023

it would also work if you specify on your system env

what is the name of the environment variables that should be set? what should they be set to?

@cyrusvahidi
Copy link
Author

cyrusvahidi commented Jul 27, 2023

after upgrade, update milvus.yaml minio.logLevel to debug image

Here is the log I get after upgrading and setting logLevel: error:

milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]\n"]
milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"] [stack="github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"]
milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=443101205607962167] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"]
milvus-standalone  | 2023-07-27 13:30:35,444 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']
milvus-standalone  | 2023-07-27 13:30:35,455 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:30:35.455 AWSClient [140358008452864] HTTP response code: 400
milvus-standalone  | Resolved remote host IP address: 16.12.10.17
milvus-standalone  | Request ID:
milvus-standalone  | Exception name:
milvus-standalone  | Error message: No response body.
milvus-standalone  | 6 response headers:
milvus-standalone  | connection : close
milvus-standalone  | content-type : application/xml
milvus-standalone  | date : Thu, 27 Jul 2023 13:30:34 GMT
milvus-standalone  | server : AmazonS3
milvus-standalone  | x-amz-request-id : 7XH2J3G55JSZ6W99
milvus-standalone  |
milvus-standalone  |
milvus-standalone  | [2023/07/27 13:30:35.455 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]\n"]

@xiaofan-luan
Copy link
Collaborator

after upgrade, update milvus.yaml minio.logLevel to debug image

Here is the log I get after upgrading and setting logLevel: error:

milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]\n"]
milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"] [stack="github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"]
milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=443101205607962167] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"]
milvus-standalone  | 2023-07-27 13:30:35,444 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']
milvus-standalone  | 2023-07-27 13:30:35,455 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:30:35.455 AWSClient [140358008452864] HTTP response code: 400
milvus-standalone  | Resolved remote host IP address: 16.12.10.17
milvus-standalone  | Request ID:
milvus-standalone  | Exception name:
milvus-standalone  | Error message: No response body.
milvus-standalone  | 6 response headers:
milvus-standalone  | connection : close
milvus-standalone  | content-type : application/xml
milvus-standalone  | date : Thu, 27 Jul 2023 13:30:34 GMT
milvus-standalone  | server : AmazonS3
milvus-standalone  | x-amz-request-id : 7XH2J3G55JSZ6W99
milvus-standalone  |
milvus-standalone  |
milvus-standalone  | [2023/07/27 13:30:35.455 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]\n"]

I think you will need to setup the region config, especially if you are trying to use S3 and running milvus on macos or windows. try to change the milvus.yaml file region field?

@xiaofan-luan
Copy link
Collaborator

but the aws S3 user experience really sucks

@cyrusvahidi
Copy link
Author

cyrusvahidi commented Jul 27, 2023

after upgrade, update milvus.yaml minio.logLevel to debug image

Here is the log I get after upgrading and setting logLevel: error:

milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]\n"]
milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"] [stack="github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"]
milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=443101205607962167] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"]
milvus-standalone  | 2023-07-27 13:30:35,444 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']
milvus-standalone  | 2023-07-27 13:30:35,455 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:30:35.455 AWSClient [140358008452864] HTTP response code: 400
milvus-standalone  | Resolved remote host IP address: 16.12.10.17
milvus-standalone  | Request ID:
milvus-standalone  | Exception name:
milvus-standalone  | Error message: No response body.
milvus-standalone  | 6 response headers:
milvus-standalone  | connection : close
milvus-standalone  | content-type : application/xml
milvus-standalone  | date : Thu, 27 Jul 2023 13:30:34 GMT
milvus-standalone  | server : AmazonS3
milvus-standalone  | x-amz-request-id : 7XH2J3G55JSZ6W99
milvus-standalone  |
milvus-standalone  |
milvus-standalone  | [2023/07/27 13:30:35.455 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]\n"]

I think you will need to setup the region config, especially if you are trying to use S3 and running milvus on macos or windows. try to change the milvus.yaml file region field?

I have set region: "eu-north-1" field in minio.

I am running on AWS EC2 linux, in the same service region (eu-north-1) as the S3 bucket. My IAM credential have FULL S3 access.

I have configured my aws from the CLI - I can query the target bucket succesfully with the same credentials:

[ec2-user milvus]$ aws s3 ls sp-milvus
                           PRE test/

I have set export AWS_REGION=eu-north-1.

but the aws S3 user experience really sucks

I agree. In general it's very counterintuitive.

It's strange that I can successfully insert entities, but making the index fails.

Here's my latest minio config in milvus.yaml:

minio:
  address: s3.eu-north-1.amazonaws.com # localhost # Address of MinIO/S3
  port: 443 # 9000   # Port of MinIO/S3
  accessKeyID: <id> # minioadmin # accessKeyID of MinIO/S3
  secretAccessKey: <key> # minioadmin # MinIO/S3 encryption string
  useSSL: true # Access to MinIO/S3 with SSL
  bucketName: "sp-milvus" # Bucket name in MinIO/S3
  rootPath: test # The root path where the message is stored in MinIO/S3
  # Whether to use IAM role to access S3/GCS instead of access/secret keys
    # For more infomation, refer to
    # aws: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
    # gcp: https://cloud.google.com/storage/docs/access-control/iam
  useIAM: false
  cloudProvider: aws
  logLevel: error
  region: "eu-north-1"

Is anything missing?

@cyrusvahidi
Copy link
Author

cyrusvahidi commented Jul 27, 2023

I deleted the local /volumes. Now I get the following log:

milvus-standalone  | [2023/07/27 13:59:00.799 +00:00] [INFO] [sessionutil/session_util.go:358] ["service begin to register to etcd"] [serverName=datacoord] [ServerID=18]
milvus-standalone  | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Http request to retrieve credentials failed
milvus-standalone  |
milvus-standalone  | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone
milvus-standalone  |
milvus-standalone  |
milvus-standalone  | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']

@xiaofan-luan
Copy link
Collaborator

after upgrade, update milvus.yaml minio.logLevel to debug image

Here is the log I get after upgrading and setting logLevel: error:

milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]\n"]
milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"] [stack="github.com/milvus-io/milvus/internal/indexnode.(*indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(*TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"]
milvus-standalone  | [2023/07/27 13:29:50.462 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=443101205607962167] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]"]
milvus-standalone  | 2023-07-27 13:30:35,444 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 's3.eu-north-1.amazonaws.com:443', default_bucket_name:'sp-milvus', use_secure:'true']
milvus-standalone  | 2023-07-27 13:30:35,455 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:30:35.455 AWSClient [140358008452864] HTTP response code: 400
milvus-standalone  | Resolved remote host IP address: 16.12.10.17
milvus-standalone  | Request ID:
milvus-standalone  | Exception name:
milvus-standalone  | Error message: No response body.
milvus-standalone  | 6 response headers:
milvus-standalone  | connection : close
milvus-standalone  | content-type : application/xml
milvus-standalone  | date : Thu, 27 Jul 2023 13:30:34 GMT
milvus-standalone  | server : AmazonS3
milvus-standalone  | x-amz-request-id : 7XH2J3G55JSZ6W99
milvus-standalone  |
milvus-standalone  |
milvus-standalone  | [2023/07/27 13:30:35.455 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:400, exception:, errmessage:No response body.]\n"]

I think you will need to setup the region config, especially if you are trying to use S3 and running milvus on macos or windows. try to change the milvus.yaml file region field?

I have set region: "eu-north-1" field in minio.

I am running on AWS EC2 linux, in the same service region (eu-north-1) as the S3 bucket. My IAM credential have FULL S3 access.

I have configured my aws from the CLI - I can query the target bucket succesfully with the same credentials:

[ec2-user milvus]$ aws s3 ls sp-milvus
                           PRE test/

I have set export AWS_REGION=eu-north-1.

but the aws S3 user experience really sucks

I agree. In general it's very counterintuitive.

It's strange that I can successfully insert entities, but making the index fails.

Here's my latest minio config in milvus.yaml:

minio:
  address: s3.eu-north-1.amazonaws.com # localhost # Address of MinIO/S3
  port: 443 # 9000   # Port of MinIO/S3
  accessKeyID: <id> # minioadmin # accessKeyID of MinIO/S3
  secretAccessKey: <key> # minioadmin # MinIO/S3 encryption string
  useSSL: true # Access to MinIO/S3 with SSL
  bucketName: "sp-milvus" # Bucket name in MinIO/S3
  rootPath: test # The root path where the message is stored in MinIO/S3
  # Whether to use IAM role to access S3/GCS instead of access/secret keys
    # For more infomation, refer to
    # aws: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
    # gcp: https://cloud.google.com/storage/docs/access-control/iam
  useIAM: false
  cloudProvider: aws
  logLevel: error
  region: "eu-north-1"

Is anything missing?

Not Every entity directly flushed to S3.
Usually what happens for streaming insertion will write to kafka/pulsar, and asyncly flush to pulsar.
So even if your S3 is not setup correctly you can still do write but after some time if fails

@xiaofan-luan
Copy link
Collaborator

milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Http request to retrieve credentials failed
milvus-standalone |
milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone
milvus-standalone |

I think that's the major reason, but I'm not sure what is happening. seems to be still related to IAM https://stackoverflow.com/questions/50826093/failed-to-retrieve-credentials-from-ec2-instance-metadata-service

milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Http request to retrieve credentials failed
milvus-standalone |
milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone
milvus-standalone |

@cyrusvahidi
Copy link
Author

milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Http request to retrieve credentials failed
milvus-standalone |
milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone
milvus-standalone |

I think that's the major reason, but I'm not sure what is happening. seems to be still related to IAM https://stackoverflow.com/questions/50826093/failed-to-retrieve-credentials-from-ec2-instance-metadata-service

milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Http request to retrieve credentials failed milvus-standalone | milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone milvus-standalone |

Thanks. I will take a look.

Is there no step-by-step guide on how to setup S3 with Milvus? Something that includes the IAM specification, bucket setup etc?

@cyrusvahidi
Copy link
Author

cyrusvahidi commented Jul 28, 2023

milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Http request to retrieve credentials failed
milvus-standalone |
milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone
milvus-standalone |

I think that's the major reason, but I'm not sure what is happening. seems to be still related to IAM https://stackoverflow.com/questions/50826093/failed-to-retrieve-credentials-from-ec2-instance-metadata-service

milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Http request to retrieve credentials failed milvus-standalone | milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone milvus-standalone |

Upon running

curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600" && curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/placement/availability

I can successfully retrieve the EC2 instance metadata and I no longer get 401 unauthorized when sending a HTTP request to http://169.254.169.254/latest/meta-data/placement/availability.

$ export METADATA_TOKEN="$(curl -s -X PUT 'http://169.254.169.254/latest/api/token' -H 'X-aws-ec2-metadata-token-ttl-seconds: 21600')"
$ AVAILABILITY_ZONE="$(curl -s -H "X-aws-ec2-metadata-token: $METADATA_TOKEN" http://169.254.169.254/latest/meta-data/placement/availability-zone)"
$ echo $AVAILABILITY_ZONE
eu-north-1b

Could this be the problem?

How can I make Milvus request this metedata token?

@cyrusvahidi
Copy link
Author

milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Http request to retrieve credentials failed
milvus-standalone |
milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone
milvus-standalone |

I think that's the major reason, but I'm not sure what is happening. seems to be still related to IAM https://stackoverflow.com/questions/50826093/failed-to-retrieve-credentials-from-ec2-instance-metadata-service
milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Http request to retrieve credentials failed milvus-standalone | milvus-standalone | 2023-07-27 13:59:00,799 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-07-27 13:59:00.799 EC2MetadataClient [140408944322304] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone milvus-standalone |

Upon running

curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600" && curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/placement/availability

I can successfully retrieve the EC2 instance metadata and I no longer get 401 unauthorized when sending a HTTP request to http://169.254.169.254/latest/meta-data/placement/availability.

Could this be the problem?

How can I make Milvus request this metedata token?

Is this an error in my configuration or how Milvus connects to the AWS metadata API?

@cyrusvahidi
Copy link
Author

Are there any specific ACLs or permissions that must be setup on the S3 bucket?

@cyrusvahidi
Copy link
Author

UPDATE: after switching to Google Cloud Storage, I successfully created the index with no friction.

It would be great it Google Cloud Storage setup is added to the documentation.

For anyone who is interested, here is my minio config:

minio:
  address: storage.googleapis.com
  port: 443 # 9000   # Port of MinIO/S3
  accessKeyID: "key" # accessKeyID of MinIO/S3
  secretAccessKey: "key" # minioadmin # MinIO/S3 encryption string
  useSSL: true # Access to MinIO/S3 with SSL
  bucketName: "bucket-name" # Bucket name in MinIO/S3
  rootPath: test # The root path where the message is stored in MinIO/S3
  # Whether to use IAM role to access S3/GCS instead of access/secret keys
    # For more infomation, refer to
    # aws: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
    # gcp: https://cloud.google.com/storage/docs/access-control/iam
  useIAM: false
  cloudProvider: gcp 
  logLevel: error
  iamEndpoint: ""

I setup HMAC keys on GCP and granted Cloud Storage admin access.

@zhagnlu
Copy link
Contributor

zhagnlu commented Aug 15, 2023

Sorry for late reply, according to your reply, your IAM credential have FULL S3 access, so you can use IAM mode to access bucket, and access key and value is not work.
image
change this and try it in S3

@wgzesg
Copy link

wgzesg commented Aug 22, 2023

  enabled: True
  host: minio.ucare.local
  port: 443
  accessKey: xxxxxx
  secretKey: xxxxx
  useSSL: True
  bucketName: milvus-data
  rootPath: ""
  useIAM: false
  cloudProvider: "aws"
  iamEndpoint: ""

Hi I was trying to use a local MinIO and run into the same problem. Datanode and working fine when inserting. But when craeting index, it gives the following. What cloudProvide Name should I use instead?

2023-08-22 05:24:47,012 | INFO | default | [SEGCORE][InitSDKAPI][milvus] init aws with log level:error
2023-08-22 05:24:48,015 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:48.015 CurlHttpClient [139835375351552] Curl returned error code 28 - Timeout was reached

2023-08-22 05:24:48,015 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:48.015 EC2MetadataClient [139835375351552] Http request to retrieve credentials failed

2023-08-22 05:24:49,016 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.016 CurlHttpClient [139835375351552] Curl returned error code 28 - Timeout was reached

2023-08-22 05:24:49,016 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.016 EC2MetadataClient [139835375351552] Http request to retrieve credentials failed

2023-08-22 05:24:49,016 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.016 EC2MetadataClient [139835375351552] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone

2023-08-22 05:24:49,017 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 'minio.ucare.local:443', default_bucket_name:'milvus-data', use_secure:'true']
2023-08-22 05:24:49,017 | WARNING | default | [KNOWHERE][GetGlobalThreadPool][milvus] Global ThreadPool has not been inialized yet, init it now with threads num: 8
2023-08-22 05:24:49,017 | INFO | default | [SEGCORE][N6milvus10ThreadPoolE::ThreadPool][milvus] Thread pool's worker num:80
2023-08-22 05:24:49,036 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.036 CurlHttpClient [139835293529856] Curl returned error code 60 - SSL peer certificate or SSH remote key was not OK

2023-08-22 05:24:49,036 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.036 AWSClient [139835293529856] HTTP response code: -1
Resolved remote host IP address: 192.168.8.20
Request ID: 
Exception name: 
Error message: curlCode: 60, SSL peer certificate or SSH remote key was not OK
0 response headers:

@xiaofan-luan
Copy link
Collaborator

  enabled: True
  host: minio.ucare.local
  port: 443
  accessKey: xxxxxx
  secretKey: xxxxx
  useSSL: True
  bucketName: milvus-data
  rootPath: ""
  useIAM: false
  cloudProvider: "aws"
  iamEndpoint: ""

Hi I was trying to use a local MinIO and run into the same problem. Datanode and working fine when inserting. But when craeting index, it gives the following. What cloudProvide Name should I use instead?

2023-08-22 05:24:47,012 | INFO | default | [SEGCORE][InitSDKAPI][milvus] init aws with log level:error
2023-08-22 05:24:48,015 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:48.015 CurlHttpClient [139835375351552] Curl returned error code 28 - Timeout was reached

2023-08-22 05:24:48,015 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:48.015 EC2MetadataClient [139835375351552] Http request to retrieve credentials failed

2023-08-22 05:24:49,016 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.016 CurlHttpClient [139835375351552] Curl returned error code 28 - Timeout was reached

2023-08-22 05:24:49,016 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.016 EC2MetadataClient [139835375351552] Http request to retrieve credentials failed

2023-08-22 05:24:49,016 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.016 EC2MetadataClient [139835375351552] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone

2023-08-22 05:24:49,017 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 'minio.ucare.local:443', default_bucket_name:'milvus-data', use_secure:'true']
2023-08-22 05:24:49,017 | WARNING | default | [KNOWHERE][GetGlobalThreadPool][milvus] Global ThreadPool has not been inialized yet, init it now with threads num: 8
2023-08-22 05:24:49,017 | INFO | default | [SEGCORE][N6milvus10ThreadPoolE::ThreadPool][milvus] Thread pool's worker num:80
2023-08-22 05:24:49,036 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.036 CurlHttpClient [139835293529856] Curl returned error code 60 - SSL peer certificate or SSH remote key was not OK

2023-08-22 05:24:49,036 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.036 AWSClient [139835293529856] HTTP response code: -1
Resolved remote host IP address: 192.168.8.20
Request ID: 
Exception name: 
Error message: curlCode: 60, SSL peer certificate or SSH remote key was not OK
0 response headers:

did you follow any of these documents https://milvus.io/docs/eks.md https://milvus.io/docs/aws.md

@wgzesg
Copy link

wgzesg commented Aug 23, 2023

  enabled: True
  host: minio.ucare.local
  port: 443
  accessKey: xxxxxx
  secretKey: xxxxx
  useSSL: True
  bucketName: milvus-data
  rootPath: ""
  useIAM: false
  cloudProvider: "aws"
  iamEndpoint: ""

Hi I was trying to use a local MinIO and run into the same problem. Datanode and working fine when inserting. But when craeting index, it gives the following. What cloudProvide Name should I use instead?

2023-08-22 05:24:47,012 | INFO | default | [SEGCORE][InitSDKAPI][milvus] init aws with log level:error
2023-08-22 05:24:48,015 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:48.015 CurlHttpClient [139835375351552] Curl returned error code 28 - Timeout was reached

2023-08-22 05:24:48,015 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:48.015 EC2MetadataClient [139835375351552] Http request to retrieve credentials failed

2023-08-22 05:24:49,016 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.016 CurlHttpClient [139835375351552] Curl returned error code 28 - Timeout was reached

2023-08-22 05:24:49,016 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.016 EC2MetadataClient [139835375351552] Http request to retrieve credentials failed

2023-08-22 05:24:49,016 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.016 EC2MetadataClient [139835375351552] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone

2023-08-22 05:24:49,017 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 'minio.ucare.local:443', default_bucket_name:'milvus-data', use_secure:'true']
2023-08-22 05:24:49,017 | WARNING | default | [KNOWHERE][GetGlobalThreadPool][milvus] Global ThreadPool has not been inialized yet, init it now with threads num: 8
2023-08-22 05:24:49,017 | INFO | default | [SEGCORE][N6milvus10ThreadPoolE::ThreadPool][milvus] Thread pool's worker num:80
2023-08-22 05:24:49,036 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.036 CurlHttpClient [139835293529856] Curl returned error code 60 - SSL peer certificate or SSH remote key was not OK

2023-08-22 05:24:49,036 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2023-08-22 05:24:49.036 AWSClient [139835293529856] HTTP response code: -1
Resolved remote host IP address: 192.168.8.20
Request ID: 
Exception name: 
Error message: curlCode: 60, SSL peer certificate or SSH remote key was not OK
0 response headers:

did you follow any of these documents https://milvus.io/docs/eks.md https://milvus.io/docs/aws.md

OH I found a fix from #25432
Downgraded 2.2.13-->2.2.4 fix

Problem should not be with aws sdk. Version 2.2.13 and 2.2.4 are using the same version aws.
I think the issue is with SSL connection to MinIO. I have tested by turning useSSL to false and it works.
But,,, interesting thing is at 2.2.4, ssl connection to MinIO is working fine. Is SSL connection to MinIO supported officially? Wonder why it fails at 2.2.13

@stale
Copy link

stale bot commented Sep 27, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants