Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the tags count correct? #7

Closed
albertocottica opened this issue Feb 27, 2017 · 4 comments
Closed

Is the tags count correct? #7

albertocottica opened this issue Feb 27, 2017 · 4 comments
Assignees
Labels

Comments

@albertocottica
Copy link
Member

albertocottica commented Feb 27, 2017

I see @jason-vallet has added a list of stats as per #3 . However, I am worried that the number of tags may not be correct. I have my own script that returns a very different number: 3,915 annotations (not computed by the dashboard) with 867 codes (dashboard says 1,614).

There are also small differences in the numbers of comments (my script says 1,688, GraphRyders says 1,633) and users (my script says posts (my script says 316, GraphRyders says 318) The number of posts is the same: 352.

My script for annotations:

import requests

checklistAnnotations = []
checklistCodes = []

response = requests.get('https://edgeryders.eu/opencare/annotations')
annotations = response.json()

## annotations = requests.get('https://edgeryders.eu/opencare/annotations').json()

latest = ''

for item in annotations['nodes']:
        if item ['node'] ['user_name'] == 'Amelia':
                created = item ['node']['date_standardized']
                if created > latest:
                        latest = created
                if item ['node']['annotation_id'] not in checklistAnnotations:
                        checklistAnnotations.append(item ['node']['annotation_id'])
                if item ['node']['tag_id'] not in checklistCodes:
                        checklistCodes.append(item ['node']['tag_id'])

print ('annotations: ' + str(len(checklistAnnotations)) )
print ('latest on: ' + latest)
print ('codes: ' + str(len(checklistCodes)) )

The logic is that MySQL returns a lot of duplicates, and even distinct does not always work. So I need to build these checklists of entities I already encountered, and for each entity returned by the view check that it is not already there.

@jason-vallet
Copy link

I have used a simple (to not say dumb) request to get the number of tags (and users) and did not checked if they were used, so 1414 is actually the total number of tags existing within the db (as imported from ER), not just the ones used. I will correct this shortly.

I can explain the difference in the number of comments as some appears to not have existing parent (post or comment) when imported. Those elements are thus ignored to avoid discrepancy in the database.

Do you want to have the number of annotations available with the stats?

@albertocottica
Copy link
Member Author

Re tags: OK, that explains it.

Re number of comments: can you make an example (need the number of comment id)?

Re annotations: yes please.

@albertocottica
Copy link
Member Author

Got it. So:

  • The sub-comments of these comments are legitimate. We reused some old (but relevant) content in order to seed the conversation.
  • They do have their own comment_ids and post_ids, but they also have parent_comment_ids that do not correspond to anything in the database.
  • This happened when we migrated the first Edgeryders website (Drupal 6) onto the present one (Drupal 7), in spring 2013. It does not happen anywhere else in the db: it's a one-off.
  • It COULD happen again in the future. Under Edgeryders TOCs, each user owns their content. Suppose I comment your comment, and then you delete your comment: what happens then? Drupal has a solution: it displays my comment (previously a comment to your comment) as a comment to the post that originated the thread. In terms of the social interaction network, the edge is redirected: used to be from me to you, but now it's from me to the author of the post.
  • Still, I think these comments should be counted. They have annotations, so they induce edges in the co-occurrences graph, etc.
  • They are 50 out of the 53 missing. The remaining 3 can be explained away with the time-delay between your harvesting into the graph db and the Drupal db I am looking into. Good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants