You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
selfoss only adds an item from a feed when it is not already present for that source. However, newspapers often have separate feeds for different topics. When you subscribe to multiple feeds, you can end up with the same article from multiple feeds/sources.
So it would be nice if selfoss could check whether the article is present regardless of source. This is usually ok since the ID is the URL to the article, which should be unique across sources.
I have implemented this change in behavior here, controlled by an ini parameter: mrichtarsky@f31bf4f
Would this be interesting for others as well?
Thanks and best regards,
Martin
The text was updated successfully, but these errors were encountered:
This is a very nice idea, what are you using as identifier to deduplicate? The url?
What if the two feeds return a different content? Should not be an issue if you're using the full text recovery though.
what are you using as identifier to deduplicate? The url?
The UID. Most commonly, this is the post URL but it is not required. For example blogger.com will use something like tag:blogger.com,1999:blog-6112936277054198647.post-403878284366003238.
What if the two feeds return a different content? Should not be an issue if you're using the full text recovery though.
We could have findAll return the source id in addition to item id and check whether the content and url matches when the source id does not, and only deduplicate it then.
That would also probably resolve the uid collisions.
The issue that items will be missing from some of the sources will still remain, though, which is why I would like to test the performance impact of having sources table in m:n relation to items.
Hi,
selfoss only adds an item from a feed when it is not already present for that source. However, newspapers often have separate feeds for different topics. When you subscribe to multiple feeds, you can end up with the same article from multiple feeds/sources.
So it would be nice if selfoss could check whether the article is present regardless of source. This is usually ok since the ID is the URL to the article, which should be unique across sources.
I have implemented this change in behavior here, controlled by an ini parameter:
mrichtarsky@f31bf4f
Would this be interesting for others as well?
Thanks and best regards,
Martin
The text was updated successfully, but these errors were encountered: