You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi together,
I have tested AnythingLLM now a few days and had Problems finding context in my files...
Was playing around with the settings but couldnt get it working like i wanted to. It delivered some infomation but was missing many parts.
Looked in the citations showed that the chunks of my office files looked like this:
Information....
10 empty lines
some information...
8 empty lines
Footer
All in all many empty lines eventualy because of style-elements in the document and redundant information because of footer on every page.
So i tried to "compress" the information a little bit by making changes to the document collectors/converters by adding:
function deduplicateContent(content) {
const seen = new Set();
return content
.split("\n")
.filter((line) => {
if (line.trim() === "") return false;
if (seen.has(line)) return false;
seen.add(line);
return true;
})
.join("\n");
}
And
const content = deduplicateContent(pageContent.join("\n"));
a little bit deeper...
What would you like to see?
Hi together,
I have tested AnythingLLM now a few days and had Problems finding context in my files...
Was playing around with the settings but couldnt get it working like i wanted to. It delivered some infomation but was missing many parts.
Looked in the citations showed that the chunks of my office files looked like this:
Information....
10 empty lines
some information...
8 empty lines
Footer
All in all many empty lines eventualy because of style-elements in the document and redundant information because of footer on every page.
So i tried to "compress" the information a little bit by making changes to the document collectors/converters by adding:
function deduplicateContent(content) {
const seen = new Set();
return content
.split("\n")
.filter((line) => {
if (line.trim() === "") return false;
if (seen.has(line)) return false;
seen.add(line);
return true;
})
.join("\n");
}
And
const content = deduplicateContent(pageContent.join("\n"));
a little bit deeper...
Here an example file:
asDocx.txt
The result is that all redundant lines are removed and the empty lines too (which are redundant too for sure ;-) )
Dont know if its the best method doing this but its working and helps me a lot so AnythingLLM can send better context to the LLM...
Eventualy something like this could be implemented from someone who is able to make it better ;-)
The text was updated successfully, but these errors were encountered: