Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include download stats #14

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

TChukwuleta
Copy link
Collaborator

Include a download stat column in the version table.

So that on install or uninstall or any plugin the count increases or reduces, and this can be used for analytics and also to measure popular plugins based on most downloaded

Cc. @dennisreimann

Copy link
Member

@rockstardev rockstardev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the code and ran local tests. There are a few issues:

  1. download_stat already envisioned:
    @NicolasDorier has already implemented download_stat by recording install events:

    await conn.InsertEvent("Download", new JObject

  2. Methods are [AllowAnonymous]:
    Both methods are [AllowAnonymous], making it easy to pump up download stats artificially.

Proposed changes:

  1. Add ip field to evts:
    And then record only one install per IP by adding an ip field to the evts table.

  2. Modify evts.type field:
    Change evts.type to varchar(16) - the current field being text is retarded. We can eventually even switch to small_int for faster queries... but we can leave it as is for now.

  3. Enhance evts with plugin_slug and build_id:
    Include plugin_slug and build_id in evts to improve event indexing. While we can keep the blob for now (although I’m not a fan), we should reduce the data we write into it.

  4. Add index on plugin_slug, type, build_id:
    This will improve query performance.

  5. Add primary key on evts table:
    We can go with auto-incrementing bigint for the primary key.

Implementation Steps:

  • When the plugins/{pluginSlug}/versions/{version}/download endpoint is hit, log the download in the evts table. If an entry with the same ip, plugin_slug, build_id, and type exists, update the timestamp.
  • Only increment download_stat in versions.download_stat if it’s a fresh insert.
  • Instead of adding another JOIN versions in the Plugins API query, update get_latest_versions and get_all_versions SQL functions to return download_stat. get_latest_versions should return the sum of downloads across versions.
  • Implement an UninstallPlugin method in ApiController to log plugin uninstalls in evts and decrement download_stat for version.

@TChukwuleta
Copy link
Collaborator Author

@rockstardev I have pushed requested changed for Proposed changes:, could you take another look?

}
else if (lastEventType?.ToLower() == "uninstall")
{
await conn.RecordPluginDownloadStatistics(pluginSlug, "install", version.VersionParts);
Copy link
Member

@NicolasDorier NicolasDorier Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is no different from the other branch of the if? what is the intent here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on Rockstar proposed changes:

Only increment download_stat in versions.download_stat if it’s a fresh insert.

A case scenario that played in my head was that:

"If a user install a plugin, the download_stat would increase, and if the user uninstalls the same plugin the stat would reduce.

If the user reinstalls the same plugin, the download stats should pop back up.. "

So what I am trying to achieve on this endpoint is to check if the last event the user (ip) did on that plugin was an uninstall that would mean the user is trying to re-install the plugin. I am then increasing the download_stat by one

}
else if (lastEventType?.ToLower() == "download")
{
await conn.RecordPluginDownloadStatistics(pluginSlug, "delete", version.VersionParts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both branch same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merge the else with the original 'if' for both scenario


CREATE INDEX idx_evts_plugin_slug ON evts(plugin_slug);
CREATE INDEX idx_evts_type ON evts(type);
CREATE INDEX idx_evts_build_id ON evts(build_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why idx_evts_build_id and idx_evts_type are necessary?

Copy link
Collaborator Author

@TChukwuleta TChukwuleta Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be core for now, but should incase we would need to run queries/filtering on the event type it would come a good to have

What do you think?

@NicolasDorier
Copy link
Member

NicolasDorier commented Oct 1, 2024

About data collection

  1. I think we shouldn't record IPs. It's private data. An alternative could be salted hash of IPs
  2. We are using a reverse proxy, I think as is, the remote IP will always be the internal reverse proxy IP. Probably need UseForwardedHeaders in the startup. (You can take a look at how we did in btcpay)
  3. I think we should remove the Uninstall route. There is no use of it outside data collection, this is not cool.

About queries

I am far from convinced that all those requests won't end up blowing up the server when there will be enough data. The indices seems to have been added in a "spread and pray" manner.

Alternative proposed

Counting stats seems an easy problem but it is surprisingly hard to do it correctly.
You can look at this blog post to understand why this is very hard.

The simplest solution for counting would be a trigger on events (the blog post show how). This would be fine in our case. However, your intention is that you don't want to count several time the same IP which downloaded the plugin.

If that's the case, I suspect you can't achieve this by your current approach, and you need to use HLL extension in postgres. But in this case, this also open another can of worm where we need to install this extension in our postgres image which isn't really easy at all...
Maybe the "Index-Only Scan" approach of the blog post is easier, but that remain to be seen.

All in all, I think we don't need a perfect solution for now. I would just:

  • Remove the collection of IP (or hash of it)
  • Use a trigger based approach on the evts table to update the download count column of the plugin

Anything more than this would require way too much time to achieve, when there is probably other lower hanging fruits with more impact in front of us.

@rockstardev rockstardev self-requested a review October 1, 2024 02:40
Copy link
Member

@rockstardev rockstardev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick glance and can already spot a couple of issues:

  • You can't modify old migrations; you need to create a new migration to make alterations to the table and those two functions.
  • GetLastEventTypeForIpAsync should also check the version, and the way you're handling the insert can be optimized. You could move it below the if check, or even consolidate everything into a single database call (though that's not a requirement). There definitely shouldn't be separate RecordPluginStatistics calls... all that goes into one DB query.

There is probably more, I need to look at it with fresh eyes in the morning.

But I am also thinking - since this isn’t urgent, @TChukwuleta, so maybe we can schedule a call to go over the issues and figure out which ones you can work on. I’ll finish my review passes before we loop in @NicolasDorier to take a look at the changes.

@rockstardev
Copy link
Member

rockstardev commented Oct 1, 2024

OK, only now seeing that @NicolasDorier did his own review with post... I propose we ice work on this PR until we meet over the call to get a consensus on how to move forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants