Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roadmap: Cortex supports /files #1786

Open
dan-homebrew opened this issue Dec 10, 2024 · 6 comments
Open

roadmap: Cortex supports /files #1786

dan-homebrew opened this issue Dec 10, 2024 · 6 comments
Assignees
Milestone

Comments

@dan-homebrew
Copy link
Contributor

dan-homebrew commented Dec 10, 2024

Architecture for files:

Filesystem

~/cortexcpp
├── cortex.db
├── engines
├── files
│   └── Enterprise_Application_Infrastructure_v2_20140903_toCTC_v1.0.pdf
├── logs
├── models
└── threads

Database

  • File metadata will be stored in files table in cortex.db

Backward compatible

Provide backward compatible for previous jan, which stores file inside each thread.

./jan_1732370027
├── files
│   └── 01JDMG617BHMPW859VE18BPQ7Y.pdf
├── memory
│   ├── args.json
│   ├── docstore.json
│   └── hnswlib.index
├── messages.jsonl
└── thread.json

Pros and Cons

  • User can search for their file using their file name, we don't set their file name to any uuid like before.
  • Easy to share files with assistants, tools, etc.

API References

https://c6f778ad.cortex-docs.pages.dev/api-reference

@dan-homebrew dan-homebrew converted this from a draft issue Dec 10, 2024
@namchuai
Copy link
Contributor

namchuai commented Dec 10, 2024

Architecture for files:

Filesystem

~/cortexcpp
├── cortex.db
├── engines
├── files
├── 0001KNE06900GW0X476W5TVBFE
│   ├── AAPB2016_UIImplementation.pdf
│   └── metadata.json
├── 0001KNE3E500GW0X476W5TVBFE
│   ├── AAPB2016_UIImplementation.pdf
│   └── metadata.json
└── 0001KNE3EBC62D620DGYNG2R8H
    ├── AAPB2016_UIImplementation.pdf
    └── metadata.json
├── logs
├── models
└── threads

Sample metadata.json

{
        "bytes" : 2099009,
        "created_at" : 1733755081,
        "filename" : "AAPB2016_UIImplementation.pdf",
        "id" : "0001KNE06900GW0X476W5TVBFE",
        "object" : "file",
        "purpose" : "assistant"
}
  • User can search for their file using their file name, we don't set their file name to any uuid like before.
  • Easy to share files with assistants, tools, etc.

Backward compatible

We need to provide backward compatible for previous jan, which stores file inside each thread

./jan_1732370027
├── files
│   └── 01JDMG617BHMPW859VE18BPQ7Y.pdf
├── memory
│   ├── args.json
│   ├── docstore.json
│   └── hnswlib.index
├── messages.jsonl
└── thread.json

Cortex will provide same API path as OpenAI's path, but requires additional property thread_id so that we can make cortex API compatible.

@namchuai
Copy link
Contributor

Request for comment: Move the metadata.json to cortex.db

@dan-homebrew
Copy link
Contributor Author

dan-homebrew commented Dec 12, 2024

12 Dec

  • Docs
  • Migration? -> Files table
  • Update Spec for this issue (for future reference)

@dan-homebrew dan-homebrew moved this from Planning to In Progress in Jan & Cortex Dec 12, 2024
@namchuai namchuai moved this from In Progress to QA in Jan & Cortex Dec 12, 2024
@louis-jan
Copy link
Contributor

louis-jan commented Dec 16, 2024

Hi @dan-homebrew @namchuai, now that we’ve flattened the files under /files, which is more of a global storage, it’s not practical to keep the file name. Since file_search can be scoped to a thread or assistant, keeping the file name causes issues where duplicate file names are ignored, leading to incorrect behavior in most application

Case 1:

  • Users can have different files with the same name, they can't attach those to a thread.

Case 2:

  • Users can upload the same file to a thread to update embeddings after model changes.

Case 3:

  • Reinforcing or weighting specific data - increase its weight in search results? In case they ingested multiple files before in the same thread.

Update 1: Attempting to add a postfix on duplicate

@TC117 TC117 added this to the v1.0.5 milestone Dec 18, 2024
@TC117
Copy link

TC117 commented Dec 18, 2024

  • Can Upload files
    Image
    Image

  • Can get files list
    Image

  • Get file with ID
    Image

  • Get file content
    Image

  • Delete file with ID
    Image

Still need to check Backward compatible with Jan

@TC117
Copy link

TC117 commented Dec 19, 2024

Working with Jan
Image
Image

But it will create _1 for files with the same name

@TC117 TC117 moved this from QA to Completed in Jan & Cortex Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Completed
Development

No branches or pull requests

4 participants