Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Considerably slow responses obtaining threads at launch #718

Open
UTMark opened this issue Dec 13, 2023 · 9 comments
Open

Considerably slow responses obtaining threads at launch #718

UTMark opened this issue Dec 13, 2023 · 9 comments
Assignees
Labels
memory Issues or questions related to memory scale Issues related to support for higher scale solutions triage

Comments

@UTMark
Copy link

UTMark commented Dec 13, 2023

Good morning everyone!

I would like to report a bug that could make the application feel unusable if it were deployed in a production environment. The issue is in two parts but overall what it amounts to is it can take a considerable amount of time to wait for the chat application to load due to the extreme slow responses for each chat thread as they are loaded in.

Example on how long chat messages take to load on first entry: (4-9s)
image

Now imagine if you had 10 threads or 100 threads:
image


Here are some details about the server:

Copilot Build Version: v0.9
SKU: PremiumV3 (P2v3)
CosmosDB: 10K RU /each

  • Index usage 24.4kb (monitor)
  • 384MB Size
  • 10K RU allocation per table, but a tiny fraction is being used right now
    Memory: AzureCognitiveSearch

Cosmos Index Policy:
{ "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*" } ], "excludedPaths": [ { "path": "/\"_etag\"/?" } ] }

Thoughts:

I'm not a CosmosDB expert so I'm not sure why the little turtle is showing between 4s - 9s per request for each thread. I am more than willing to investigate with a bit of guidance on what to look into. If this were SQL, I would have imagined it would have been an indexing issue which is why I provided the indexing policy.

But in hindsight, I was wondering that if this is a bug and has a resolution to it, then why not obtain the threads in bulk and limit it to a certain maximum number and paginate as the user scrolls down the historical items? I will continue to look into this issue myself but some of these technologies are new to me. Thank you for your time!

@UTMark
Copy link
Author

UTMark commented Dec 15, 2023

I think what may be happening is that the cosmos database may need to have hierarchical partitions for the containers. For example:

chatmessages: id /chatId /userId
chatparticipants: id /userId /chatId

My reasoning behind this is because the slow database is about 400Mb in size, and the fast one is only 2mb. This probably means there is some indexing or odd querying going on. I tried just adding the new hierarchical partitions and set it in the config but it broke as it doesn't understand this. Meaning, the webapi might also need to be modified. I'm not sure how to proceed. I'm new to both C# and cosmos so I'm probably wrong. :)

@glahaye
Copy link
Contributor

glahaye commented Jan 4, 2024

So I looked at this and saw a few things (and there might be more).

The frontend loads ALL the messages from ALL of the chats of the user at log in. This is really the heart of the problem.

The frontend first calls the "chats" API to get a list of all of the chats for the logged in user. Then, instead of getting the messages for only the selected chat, the frontend gets all the messages for all of the user's chats.

To alleviate this problem, the frontend could get the messages for only the selected chat and fetch the appropriate messages for other chats as they are selected. Just this change would make a great difference.

In addition to the above frontend idiosyncrasies, I discovered another issue with the "chats/[chatIdGuid]/messages" API. The API takes in an optional maximum count of messages to fetch for a given chat. However, behind the scenes, the server always fetches ALL the messages for a chat from Cosmos DB and then returns only up to the max number requested. As chats grow longer and longer, this could eventually lead to more data than necessary being fetched from Cosmos DB. Even now, we "limit" the number of messages to 100, which is probably more than required.

More concerning is the fact that the messages returned are not the latest ones, but rather the earliest ones! What this means is that once you reach 100 messages in a chat, if you leave the chat and come back to it, you'll get the first 100 messages then all later messages won't appear in the interface although they exist in Cosmos DB. This will mess up chat continuity!

For now, though, to address your problem, limiting the frontend to only fetch messages for the selected chat should alleviate the problem. If you are comfortable with React, you could do this yourself. Otherwise, I'll see when someone can address this internally.

@UTMark
Copy link
Author

UTMark commented Jan 5, 2024

Thank you for the comprehensive audit! To clarify what probably needs to happen:

  • Initial app load should not load more than the first x threads from the most recent
  • The chat history (thread list) should paginate
  • When clicking on a thread, it should load the chat messages from the thread and start with the most recent, it should be limited to the amount of messages
  • While scrolling through the chat messages inside the thread, it should paginate appropriately
  • Search filter may need to search through the titles found in the database and cached results due to pagination limitations?

I am learning reactJS and C# but I'm mainly skilled in sql/php/javascript/css. The processes should be familiar but working with cosmos and C# will be the bottleneck. For instance, paginating is usually done by either sorting by ID or Time that is indexed in the database, but I think cosmos/nosql uses something called partitions to "join" information? I think today is a good day to start studying these things :) However, it's evident that something so important should be left to seasoned experts but we will look into it regardless and see what we can come up with.

PS. We learned about something called Azure cosmos dedicated gateway which might alleviate some of the burden for production use. However, it doesn't really resolve the existing issues but good to know it exists.

@anoopt
Copy link

anoopt commented Feb 3, 2024

On top of the above list, I think, the below information is related as well: The QueryEntitiesAsync method in CosmosDbContext.cs uses GetItemLinqQueryable to query the data synchronously. Changing that to get the data asynchronously using the ToFeedIterator() method should be helpful.

public async Task<IQueryable<T>> QueryEntitiesAsync(Expression<Func<T, bool>> predicate)
{
    using var feedIterator = this._container.GetItemLinqQueryable<T>().Where(predicate).ToFeedIterator();
    var results = new List<T>();
    while (feedIterator.HasMoreResults)
    {
        var response = await feedIterator.ReadNextAsync();
        results.AddRange(response);
    }
    return results.AsQueryable();
}

The return type is IQueryable (which is better for database operations). The methods which call QueryEntitiesAsync should be updated accordingly.


Also, the interval of 3000 in the BackendProbe component contributes to the initial load time. I do not know the reason behind setting it to 3000. Any input would be appreciated.

</2cents>

@evchaki evchaki closed this as completed Mar 20, 2024
@glahaye
Copy link
Contributor

glahaye commented Mar 20, 2024

This has not been addressed yet and is still being reported. Re-opening with an undefined timeline to address this.

@glahaye glahaye reopened this Mar 20, 2024
@evchaki evchaki added memory Issues or questions related to memory triage scale Issues related to support for higher scale solutions labels Mar 20, 2024
@glahaye
Copy link
Contributor

glahaye commented Mar 27, 2024

Partial fix: #902

github-merge-queue bot pushed a commit that referenced this issue Mar 27, 2024
…underlying DB level rather than higher (#902)

### Motivation and Context
As described in #718, we load ALL the messages from ALL of the chats of
the user in the frontend at log in.

Also, no matter how many messages we want, we read them ALL at the DB
level and then throw some away if the number read is too higher at the
service level.

### Description
Now, with this change, we actually respect the order, the number to skip
and the number to take of chat messages at the underlying DB level
rather than at higher service level.

This enables us to save some DB activity and make our queries from the
frontend eventually a lot tighter,

This change was made in a simple manner which doesn't change the
underlying architecture.

### Contribution Checklist
- [ ] The code builds clean without any errors or warnings
- [ ] The PR follows the [Contribution
Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [ ] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄
@glahaye
Copy link
Contributor

glahaye commented Mar 28, 2024

#902 is now in.

With it, we stop fetching ALL entries and then throwing all those over the number of messages we truly want. Additionally, we now get the LATEST messages when requesting fewer than exist in an entire chat so that we can populate the "meta-prompt" with the latest data instead of suddenly having a hole in our chat where we only have old messages and newer ones are not returned.

The "proper" way of designing this endpoint would probably be to have it support OData syntax but given that the team working on Chat Copilot has moved on to other tasks, this will do the trick.

#902 was the backend portion of the fix. A change or two are coming for the front end. First, the chat list on the left will be populated without having to fetch all the messages for all the chats when starting the app. Then, chat messages will be loaded when selecting a specific chat.

Coming soon...

@KishanJ98
Copy link

@glahaye Any updates on the front end part?

@ctolkien
Copy link

This section will also cause N queries to be fired to Cosmos for each ChatParticipant that a user has. 20 chats = 20 queries fired sequentially.

public async Task<IActionResult> GetAllChatSessionsAsync()

teamleader-dev pushed a commit to vlink-group/chat-copilot that referenced this issue Oct 7, 2024
…underlying DB level rather than higher (microsoft#902)

### Motivation and Context
As described in microsoft#718, we load ALL the messages from ALL of the chats of
the user in the frontend at log in.

Also, no matter how many messages we want, we read them ALL at the DB
level and then throw some away if the number read is too higher at the
service level.

### Description
Now, with this change, we actually respect the order, the number to skip
and the number to take of chat messages at the underlying DB level
rather than at higher service level.

This enables us to save some DB activity and make our queries from the
frontend eventually a lot tighter,

This change was made in a simple manner which doesn't change the
underlying architecture.

### Contribution Checklist
- [ ] The code builds clean without any errors or warnings
- [ ] The PR follows the [Contribution
Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [ ] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄
kb0039 pushed a commit to aaronba/chat-copilot that referenced this issue Jan 8, 2025
…underlying DB level rather than higher (microsoft#902)

### Motivation and Context
As described in microsoft#718, we load ALL the messages from ALL of the chats of
the user in the frontend at log in.

Also, no matter how many messages we want, we read them ALL at the DB
level and then throw some away if the number read is too higher at the
service level.

### Description
Now, with this change, we actually respect the order, the number to skip
and the number to take of chat messages at the underlying DB level
rather than at higher service level.

This enables us to save some DB activity and make our queries from the
frontend eventually a lot tighter,

This change was made in a simple manner which doesn't change the
underlying architecture.

### Contribution Checklist
- [ ] The code builds clean without any errors or warnings
- [ ] The PR follows the [Contribution
Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [ ] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory Issues or questions related to memory scale Issues related to support for higher scale solutions triage
Projects
None yet
Development

No branches or pull requests

6 participants