fix: Fix bugs in the "get tags" REST API, make it more consistent (#119)

* fix: Fix bugs in the "get tags" REST API, make it more consistent * fix: shallow search was not returning parent tags with children that match * docs: update the ADR that describes the "get tags" endpoint * fix: don't throw an error when retrieving tags below a leaf tag * feat: return the number of matching children only with ?search_term=... * chore: Version bump
openedx · Nov 23, 2023 · b98f091 · b98f091
1 parent 30260cb
commit b98f091
Show file tree

Hide file tree

Showing 9 changed files with 305 additions and 185 deletions.
diff --git a/docs/decisions/0014-single-taxonomy-view-api.rst b/docs/decisions/0014-single-taxonomy-view-api.rst
@@ -4,10 +4,7 @@
 Context
 --------
 
-This view returns tags of a closed taxonomy (for MVP has not been implemented yet
-for open taxonomies). It is necessary to make a decision about what structure the tags are going 
-to have, how the pagination is going to work and how will the search for tags be implemented.
-It was taken into account that taxonomies commonly have the following characteristics:
+This view returns tags of a taxonomy (works with closed, open, and system-defined). It is necessary to make a decision about what structure the tags are going to have, how the pagination is going to work and how will the search for tags be implemented. It was taken into account that taxonomies commonly have the following characteristics:
 
 - It has few root tags.
 - It may a very large number of children for each tag.
@@ -17,16 +14,18 @@ For the decisions, the following use cases were taken into account:
 
 - As a taxonomy administrator, I want to see all the tags available for use with a closed taxonomy,
   so that I can see the taxonomy's structure in the management interface.
-    - As a taxonomy administrator, I want to see the available tags as a lits of root tags
-      that can be expanded to show children tags.
-    - As a taxonomy administrator, I want to sort the list of root tags alphabetically: A-Z (default) and Z-A.
-    - As a taxonomy administrator, I want to expand all root tags to see all children tags.
-    - As a taxonomy administrator, I want to search for tags, so I can find root and children tags more easily.
+
+  - As a taxonomy administrator, I want to see the available tags as a list of root tags
+    that can be expanded to show children tags.
+  - As a taxonomy administrator, I want to sort the list of root tags alphabetically: A-Z (default) and Z-A.
+  - As a taxonomy administrator, I want to expand all root tags to see all children tags.
+  - As a taxonomy administrator, I want to search for tags, so I can find root and children tags more easily.
 - As a course author, when I am editing the tags of a component, I want to see all the tags available
   from a particular taxonomy that I can use.
-    - As a course author, I want to see the available tags as a lits of root tags
-      that can be expanded to show children tags.
-    - As a course author, I want to search for tags, so I can find root and children tags more easily.
+
+  - As a course author, I want to see the available tags as a list of root tags
+    that can be expanded to show children tags.
+  - As a course author, I want to search for tags, so I can find root and children tags more easily.
 
 Excluded use cases:
 
@@ -41,107 +40,59 @@ Decision
 Views & Pagination
 ~~~~~~~~~~~~~~~~~~~
 
-Make one view:
+We will have one REST API endpoint that can handle these use cases:
 
-**get_matching_tags(parent_tag_id: str = None, search_term: str = None)**
+**/tagging/rest_api/v1/taxonomies/:id/tags/?parent_tag=...**
 
 that can handle this cases:
 
-- Get the root tags of the taxonomy. If ``parent_tag_id`` is ``None``.
+- Get the root tags of the taxonomy. If ``parent_tag`` is omitted.
 - Get the children of a tag. Called each time the user expands a parent tag to see its children.
-  If ``parent_tag_id`` is not ``None``.
+  In this case, ``parent_tag`` is set to the value of the parent tag.
 
 In both cases the results are paginated. In addition to the common pagination metadata, it is necessary to return:
 
 - Total number of pages.
-- Total number of root/children tags.
+- Total number of tags in the result.
 - Range index of current page, Ex. Page 1: 1-12, Page 2: 13-24.
 - Total number of children of each tag.
 
 The pagination of root tags and child tags are independent.
-In order to be able to fulfill the functionality of "Expand-all" in a scalable way,
-the following has been agreed:
 
-- Create a ``TAGS_THRESHOLD`` (default: 1000).
-- If ``taxonomy.tags.count < TAGS_THRESHOLD``, then ``get_matching_tags()`` will return all tags on the taxonomy,
-  roots and children.
-- Otherwise, ``get_matching_tags()`` will only return paginated root tags, and it will be necessary
-  to use ``get_matching_tags()`` to return paginated children. Also the "Expand-all" functionality will be disabled.
+**Optional full-depth response**
 
-For search you can see the next section (Search tags)
+In order to be able to fulfill the functionality of "Expand-all" in a scalable way, and allow users to quickly browse taxonomies that have lots of small branches, the API will accept an optional parameter ``?full_depth_threshold``. If specified (e.g. ``?full_depth_threshold=1000``) and there are fewer results than this threshold, the full tree of tags will be returned a a single giant page, including all tags up to three levels deep.
 
 **Pros**
 
-- It is the simplest way.
+- This approach is simple and flexible.
 - Paging both root tags and children mitigates the huge number of tags that can be found in large taxonomies.
 
 Search tags
 ~~~~~~~~~~~~
 
-Support tag search on the backend. Return a subset of matching tags.
-We will use the same view to perform a search with the same logic:
-
-**get_matching_tags(parent_tag_id: str = None, search_term: str = None)**
-
-We can use ``search_term`` to perform a search on all taxonomy tags or children tags depending of ``parent_tag_id``.
-The result will be a pruned tree with the necessary tags to be able to reach the results from a root tag. 
-Ex. if in the result there may be a child tag of ``depth=2``, but the parents are not found in the result.
-In this case, it is necessary to add the parent and the parent of the parent (root tag) to be able to show
-the child tag that is in the result.
-
-For the search, ``SEARCH_TAGS_THRESHOLD`` will be used. (It is recommended that it be 20% of ``TAGS_THRESHOLD``).
-It will work in the following way:
-
-- If ``search_result.count() < SEARCH_TAGS_THRESHOLD``, then it will return all tags on the result tree without pagination.
-- Otherwise, it will return the roots of the result tree with pagination. Each root will have the entire pruned branch.
+The same API endpoint will support an optional ``?search_term=...`` parameter to support searching/filtering tags by keyword.
 
-It will work in the same way of ``TAGS_THRESHOLD`` (see Views & Pagination)
-
-**Pros**
-
-- It is the most scalable way.
+The API endpoint will work exactly as described above (returning a single level of tags by default, paginated, optionally listing only the tags below a specific parent tag, optionally returning a deep tree if the results are small enough) - BUT only tags that match the keyword OR their ancestor tags will be returned. We return their ancestor tags (even if the ancestors themselves don't match the keyword) so that the tree of tags that do match can be displayed correctly. This also allows the UI to load the matching tags one layer at a time, paginated, if desired.
 
 Tag representation
 ~~~~~~~~~~~~~~~~~~~
 
-Return a list of root tags and within a link to obtain the children tags
-or the complete list of children tags depending of ``TAGS_THRESHOLD`` or ``SEARCH_TAGS_THRESHOLD``. 
-The list of root tags will be ordered alphabetically. If it has child tags, they must also
-be ordered alphabetically.
-
-**(taxonomy.tags.count < *_THRESHOLD)**::
-
-  {
-    "count": 100,
-    "tags": [
-        {
-            "id": "tag_1",
-            "value": "Tag 1",
-            "taxonomy_id": "1",
-            "sub_tags": [
-                {
-                    "id": "tag_2",
-                    "value": "Tag 2",
-                    "taxonomy_id": "1",
-                    "sub_tags": [
-                        (....)
-                    ]
-                },
-                (....)
-            ]
-  }
+Return a list of root tags and within a link to obtain the children tags or the complete list of children tags depending on the requested ``?full_depth_threshold`` and the number of results.
+The list of tags will be ordered in tree order (and alphabetically). If it has child tags, they must also be ordered alphabetically.
 
-
-**Otherwise**::
+**Example**::
 
   {
     "count": 100,
     "tags": [
         {
-            "id": "tag_1",
             "value": "Tag 1",
-            "taxonomy_id": "1",
-            "sub_tags_link": "http//api-call-to-get-children.com"
+            "depth": 0,
+            "external_id": None,
+            "child_count": 15,
+            "parent_value": None,
+            "sub_tags_url": "http//api-call-to-get-children.com"
         },
         (....)
     ]
@@ -155,6 +106,12 @@ be ordered alphabetically.
 - It is kept as a simple implementation.
 
 
+Backend Python API
+~~~~~~~~~~~~~~~~~~
+
+On the backend, a very flexible API is available as ``Taxonomy.get_filtered_tags()`` which can cover all of the same use cases as the REST API endpoint (listing tags, shallow or deep, filtering on search term). However, the Python API returns a ``QuerySet`` of tag data dictionaries, rather than a JSON response.
+
+
 Rejected Options
 -----------------
 
@@ -199,4 +156,4 @@ can return all the tags in one page. So we can perform the tag search on the fro
 **Cons:**
 
 - It is not scalable.
-- Sets limits of tags that can be created in the taxonomy.
+- Sets limits of tags that can be created in the taxonomy.
diff --git a/openedx_learning/__init__.py b/openedx_learning/__init__.py
@@ -1,4 +1,4 @@
 """
 Open edX Learning ("Learning Core").
 """
-__version__ = "0.3.5"
+__version__ = "0.3.6"
diff --git a/openedx_tagging/core/tagging/models/base.py b/openedx_tagging/core/tagging/models/base.py
@@ -493,13 +493,15 @@ def _get_filtered_tags_deep(
                     if pk is not None:
                         matching_ids.append(pk)
             qs = qs.filter(pk__in=matching_ids)
+            qs = qs.annotate(child_count=models.Count("children", filter=Q(children__pk__in=matching_ids)))
         elif excluded_values:
             raise NotImplementedError("Using excluded_values without search_term is not currently supported.")
             # We could implement this in the future but I'd prefer to get rid of the "excluded_values" API altogether.
             # It remains to be seen if it's useful to do that on the backend, or if we can do it better/simpler on the
             # frontend.
+        else:
+            qs = qs.annotate(child_count=models.Count("children"))
 
-        qs = qs.annotate(child_count=models.Count("children"))
         # Add the "depth" to each tag:
         qs = Tag.annotate_depth(qs)
         # Add the "lineage" as a field called "sort_key" to sort them in order correctly:

diff --git a/openedx_tagging/core/tagging/rest_api/paginators.py b/openedx_tagging/core/tagging/rest_api/paginators.py
@@ -5,10 +5,7 @@
 from edx_rest_framework_extensions.paginators import DefaultPagination  # type: ignore[import]
 
 # From this point, the tags begin to be paginated
-TAGS_THRESHOLD = 1000
-
-# From this point, search tags begin to be paginated
-SEARCH_TAGS_THRESHOLD = 200
+MAX_FULL_DEPTH_THRESHOLD = 10_000
 
 
 class TagsPagination(DefaultPagination):
@@ -29,5 +26,5 @@ class DisabledTagsPagination(DefaultPagination):
     It should be used if the number of tags within
     the taxonomy does not exceed `TAGS_THRESHOLD`.
     """
-    page_size = TAGS_THRESHOLD
-    max_page_size = TAGS_THRESHOLD + 1
+    page_size = MAX_FULL_DEPTH_THRESHOLD
+    max_page_size = MAX_FULL_DEPTH_THRESHOLD + 1
diff --git a/openedx_tagging/core/tagging/rest_api/v1/serializers.py b/openedx_tagging/core/tagging/rest_api/v1/serializers.py
@@ -4,6 +4,7 @@
 from __future__ import annotations
 
 from typing import Any
+from urllib.parse import urlencode
 
 from rest_framework import serializers
 from rest_framework.reverse import reverse
@@ -162,12 +163,17 @@ def get_sub_tags_url(self, obj: TagData | Tag):
         child_count = obj.child_count if isinstance(obj, Tag) else obj["child_count"]
         if child_count > 0 and "taxonomy_id" in self.context:
             value = obj.value if isinstance(obj, Tag) else obj["value"]
-            query_params = f"?parent_tag={value}"
             request = self.context["request"]
+            query_params = request.query_params
+            new_query_params = {"parent_tag": value}
+            if "full_depth_threshold" in query_params:
+                new_query_params["full_depth_threshold"] = query_params["full_depth_threshold"]
+            if "search_term" in query_params:
+                new_query_params["search_term"] = query_params["search_term"]
             url_namespace = request.resolver_match.namespace  # get the namespace, usually "oel_tagging"
             url = (
                 reverse(f"{url_namespace}:taxonomy-tags", args=[str(self.context["taxonomy_id"])])
-                + query_params
+                + "?" + urlencode(new_query_params)
             )
             return request.build_absolute_uri(url)
         return None

diff --git a/openedx_tagging/core/tagging/rest_api/v1/views.py b/openedx_tagging/core/tagging/rest_api/v1/views.py
@@ -30,7 +30,7 @@
 from ...import_export.parsers import ParserFormat
 from ...models import Taxonomy
 from ...rules import ObjectTagPermissionItem
-from ..paginators import TAGS_THRESHOLD, DisabledTagsPagination, TagsPagination
+from ..paginators import MAX_FULL_DEPTH_THRESHOLD, DisabledTagsPagination, TagsPagination
 from .permissions import ObjectTagObjectPermissions, TagObjectPermissions, TaxonomyObjectPermissions
 from .serializers import (
     ObjectTagListQueryParamsSerializer,
@@ -528,24 +528,28 @@ class TaxonomyTagsView(ListAPIView, RetrieveUpdateDestroyAPIView):
     """
     View to list/create/update/delete tags of a taxonomy.
 
-    If you specify ?root_only or ?parent_tag_value=..., only one "level" of the
-    hierachy will be returned. Otherwise, several levels will be returned, in
-    tree order, up to the maximum supported depth. Additional levels/depth can
-    be retrieved by using ?parent_tag_value to load more data.
+    **List Request Details**
 
-    Note: If the taxonomy is particularly large (> 1,000 tags), ?root_only is
-    automatically set true by default and cannot be disabled. This way, users
-    can more easily select which tags they want to expand in the tree, and load
-    just that subset of the tree as needed. This may be changed in the future.
+    By default, only one "level" of the hierachy will be returned. But for more efficient handling of small taxonomies,
+    you can set ?full_depth_threshold=1000 which means that if there are fewer than 1,000 tags in the result set, they
+    will be returned in a single page of results that has all the tags in tree order up to the maximum supported depth.
+
+    Additional levels/depth can be retrieved recursively by using the "sub_tags_url" field of any returned tag, or
+    using the ?parent_tag=value parameter manually.
 
     **List Query Parameters**
         * id (required) - The ID of the taxonomy to retrieve tags.
+        * search_term (optional) - Only return tags matching this term, plus their ancestors. Case insensitive.
         * parent_tag (optional) - Retrieve children of the tag with this value.
-        * root_only (optional) - If specified, only root tags are returned.
-        * include_counts (optional) - Include the count of how many times each
-          tag has been used.
+        * full_depth_threshold (optional) - If there are fewer than this many results, return the full (sub)tree of
+          results, up to maximum supported depth, in a single giant page of results. By default this is disabled
+          (equivalent to full_depth_threshold=0), which provides a more consistent and predictable response.
+          Using full_depth_threshold=1000 is recommended in general, but use lower values during development to ensure
+          compatibility with both large and small result sizes.
+        * include_counts (optional) - Include the count of how many times each tag has been used.
         * page (optional) - Page number (default: 1)
-        * page_size (optional) - Number of items per page (default: 10)
+        * page_size (optional) - Number of items per page (default: 30). Ignored when there are fewer tags than
+          specified by ?full_depth_threshold.
 
     **List Example Requests**
         GET api/tagging/v1/taxonomy/:id/tags                                        - Get tags of taxonomy
@@ -655,34 +659,42 @@ def get_queryset(self) -> TagDataQuerySet:
         taxonomy_id = int(self.kwargs.get("pk"))
         taxonomy = self.get_taxonomy(taxonomy_id)
         parent_tag_value = self.request.query_params.get("parent_tag", None)
-        root_only = "root_only" in self.request.query_params
         include_counts = "include_counts" in self.request.query_params
         search_term = self.request.query_params.get("search_term", None)
-
-        if parent_tag_value:
-            # Fetching tags below a certain parent is always paginated and only returns the direct children
-            depth = 1
-            if root_only:
-                raise ValidationError("?root_only and ?parent_tag cannot be used together")
+        try:
+            full_depth_threshold = int(self.request.query_params.get("full_depth_threshold", 0))
+        except ValueError:
+            full_depth_threshold = -1
+        if full_depth_threshold < 0 or full_depth_threshold > MAX_FULL_DEPTH_THRESHOLD:
+            raise ValidationError("Invalid full_depth_threshold")
+
+        if full_depth_threshold or search_term:
+            # If full_depth_threshold is set, default to maximum depth and later prune to depth=1 if needed.
+            # We also need to do a deep search if there is a search term, to ensure we return parent items with children
+            # that match.
+            depth = None
         else:
-            if root_only:
-                depth = 1  # User Explicitly requested to load only the root tags for now
-            elif search_term:
-                depth = None  # For search, default to maximum depth but use normal pagination
-            elif taxonomy.tag_set.count() > TAGS_THRESHOLD:
-                # This is a very large taxonomy. Only load the root tags at first, so users can choose what to load.
-                depth = 1
-            else:
-                # We can load and display all the tags in the taxonomy at once:
-                self.pagination_class = DisabledTagsPagination
-                depth = None  # Maximum depth
+            depth = 1  # Otherwise (default), load a single level of results only.
 
-        return taxonomy.get_filtered_tags(
+        results = taxonomy.get_filtered_tags(
             parent_tag_value=parent_tag_value,
             search_term=search_term,
             depth=depth,
             include_counts=include_counts,
         )
+        if depth == 1:
+            # We're already returning just a single level. It will be paginated normally.
+            return results
+        elif full_depth_threshold and len(results) < full_depth_threshold:
+            # We can load and display all the tags in this (sub)tree at once:
+            self.pagination_class = DisabledTagsPagination
+            return results
+        else:
+            # We had to do a deep query, but we will only return one level of results.
+            # This is because the user did not request a deep response (via full_depth_threshold) or the result was too
+            # large (larger than the threshold).
+            # It will be paginated normally.
+            return results.filter(parent_value=parent_tag_value)
 
     def post(self, request, *args, **kwargs):
         """