Skip to content

Commit

Permalink
fix: Fix bugs in the "get tags" REST API, make it more consistent (#119)
Browse files Browse the repository at this point in the history
* fix: Fix bugs in the "get tags" REST API, make it more consistent
* fix: shallow search was not returning parent tags with children that match
* docs: update the ADR that describes the "get tags" endpoint
* fix: don't throw an error when retrieving tags below a leaf tag
* feat: return the number of matching children only with ?search_term=...
* chore: Version bump
  • Loading branch information
bradenmacdonald authored Nov 23, 2023
1 parent 30260cb commit b98f091
Show file tree
Hide file tree
Showing 9 changed files with 305 additions and 185 deletions.
115 changes: 36 additions & 79 deletions docs/decisions/0014-single-taxonomy-view-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,7 @@
Context
--------

This view returns tags of a closed taxonomy (for MVP has not been implemented yet
for open taxonomies). It is necessary to make a decision about what structure the tags are going
to have, how the pagination is going to work and how will the search for tags be implemented.
It was taken into account that taxonomies commonly have the following characteristics:
This view returns tags of a taxonomy (works with closed, open, and system-defined). It is necessary to make a decision about what structure the tags are going to have, how the pagination is going to work and how will the search for tags be implemented. It was taken into account that taxonomies commonly have the following characteristics:

- It has few root tags.
- It may a very large number of children for each tag.
Expand All @@ -17,16 +14,18 @@ For the decisions, the following use cases were taken into account:

- As a taxonomy administrator, I want to see all the tags available for use with a closed taxonomy,
so that I can see the taxonomy's structure in the management interface.
- As a taxonomy administrator, I want to see the available tags as a lits of root tags
that can be expanded to show children tags.
- As a taxonomy administrator, I want to sort the list of root tags alphabetically: A-Z (default) and Z-A.
- As a taxonomy administrator, I want to expand all root tags to see all children tags.
- As a taxonomy administrator, I want to search for tags, so I can find root and children tags more easily.

- As a taxonomy administrator, I want to see the available tags as a list of root tags
that can be expanded to show children tags.
- As a taxonomy administrator, I want to sort the list of root tags alphabetically: A-Z (default) and Z-A.
- As a taxonomy administrator, I want to expand all root tags to see all children tags.
- As a taxonomy administrator, I want to search for tags, so I can find root and children tags more easily.
- As a course author, when I am editing the tags of a component, I want to see all the tags available
from a particular taxonomy that I can use.
- As a course author, I want to see the available tags as a lits of root tags
that can be expanded to show children tags.
- As a course author, I want to search for tags, so I can find root and children tags more easily.

- As a course author, I want to see the available tags as a list of root tags
that can be expanded to show children tags.
- As a course author, I want to search for tags, so I can find root and children tags more easily.

Excluded use cases:

Expand All @@ -41,107 +40,59 @@ Decision
Views & Pagination
~~~~~~~~~~~~~~~~~~~

Make one view:
We will have one REST API endpoint that can handle these use cases:

**get_matching_tags(parent_tag_id: str = None, search_term: str = None)**
**/tagging/rest_api/v1/taxonomies/:id/tags/?parent_tag=...**

that can handle this cases:

- Get the root tags of the taxonomy. If ``parent_tag_id`` is ``None``.
- Get the root tags of the taxonomy. If ``parent_tag`` is omitted.
- Get the children of a tag. Called each time the user expands a parent tag to see its children.
If ``parent_tag_id`` is not ``None``.
In this case, ``parent_tag`` is set to the value of the parent tag.

In both cases the results are paginated. In addition to the common pagination metadata, it is necessary to return:

- Total number of pages.
- Total number of root/children tags.
- Total number of tags in the result.
- Range index of current page, Ex. Page 1: 1-12, Page 2: 13-24.
- Total number of children of each tag.

The pagination of root tags and child tags are independent.
In order to be able to fulfill the functionality of "Expand-all" in a scalable way,
the following has been agreed:

- Create a ``TAGS_THRESHOLD`` (default: 1000).
- If ``taxonomy.tags.count < TAGS_THRESHOLD``, then ``get_matching_tags()`` will return all tags on the taxonomy,
roots and children.
- Otherwise, ``get_matching_tags()`` will only return paginated root tags, and it will be necessary
to use ``get_matching_tags()`` to return paginated children. Also the "Expand-all" functionality will be disabled.
**Optional full-depth response**

For search you can see the next section (Search tags)
In order to be able to fulfill the functionality of "Expand-all" in a scalable way, and allow users to quickly browse taxonomies that have lots of small branches, the API will accept an optional parameter ``?full_depth_threshold``. If specified (e.g. ``?full_depth_threshold=1000``) and there are fewer results than this threshold, the full tree of tags will be returned a a single giant page, including all tags up to three levels deep.

**Pros**

- It is the simplest way.
- This approach is simple and flexible.
- Paging both root tags and children mitigates the huge number of tags that can be found in large taxonomies.

Search tags
~~~~~~~~~~~~

Support tag search on the backend. Return a subset of matching tags.
We will use the same view to perform a search with the same logic:

**get_matching_tags(parent_tag_id: str = None, search_term: str = None)**

We can use ``search_term`` to perform a search on all taxonomy tags or children tags depending of ``parent_tag_id``.
The result will be a pruned tree with the necessary tags to be able to reach the results from a root tag.
Ex. if in the result there may be a child tag of ``depth=2``, but the parents are not found in the result.
In this case, it is necessary to add the parent and the parent of the parent (root tag) to be able to show
the child tag that is in the result.

For the search, ``SEARCH_TAGS_THRESHOLD`` will be used. (It is recommended that it be 20% of ``TAGS_THRESHOLD``).
It will work in the following way:

- If ``search_result.count() < SEARCH_TAGS_THRESHOLD``, then it will return all tags on the result tree without pagination.
- Otherwise, it will return the roots of the result tree with pagination. Each root will have the entire pruned branch.
The same API endpoint will support an optional ``?search_term=...`` parameter to support searching/filtering tags by keyword.

It will work in the same way of ``TAGS_THRESHOLD`` (see Views & Pagination)

**Pros**

- It is the most scalable way.
The API endpoint will work exactly as described above (returning a single level of tags by default, paginated, optionally listing only the tags below a specific parent tag, optionally returning a deep tree if the results are small enough) - BUT only tags that match the keyword OR their ancestor tags will be returned. We return their ancestor tags (even if the ancestors themselves don't match the keyword) so that the tree of tags that do match can be displayed correctly. This also allows the UI to load the matching tags one layer at a time, paginated, if desired.

Tag representation
~~~~~~~~~~~~~~~~~~~

Return a list of root tags and within a link to obtain the children tags
or the complete list of children tags depending of ``TAGS_THRESHOLD`` or ``SEARCH_TAGS_THRESHOLD``.
The list of root tags will be ordered alphabetically. If it has child tags, they must also
be ordered alphabetically.

**(taxonomy.tags.count < *_THRESHOLD)**::

{
"count": 100,
"tags": [
{
"id": "tag_1",
"value": "Tag 1",
"taxonomy_id": "1",
"sub_tags": [
{
"id": "tag_2",
"value": "Tag 2",
"taxonomy_id": "1",
"sub_tags": [
(....)
]
},
(....)
]
}
Return a list of root tags and within a link to obtain the children tags or the complete list of children tags depending on the requested ``?full_depth_threshold`` and the number of results.
The list of tags will be ordered in tree order (and alphabetically). If it has child tags, they must also be ordered alphabetically.


**Otherwise**::
**Example**::

{
"count": 100,
"tags": [
{
"id": "tag_1",
"value": "Tag 1",
"taxonomy_id": "1",
"sub_tags_link": "http//api-call-to-get-children.com"
"depth": 0,
"external_id": None,
"child_count": 15,
"parent_value": None,
"sub_tags_url": "http//api-call-to-get-children.com"
},
(....)
]
Expand All @@ -155,6 +106,12 @@ be ordered alphabetically.
- It is kept as a simple implementation.


Backend Python API
~~~~~~~~~~~~~~~~~~

On the backend, a very flexible API is available as ``Taxonomy.get_filtered_tags()`` which can cover all of the same use cases as the REST API endpoint (listing tags, shallow or deep, filtering on search term). However, the Python API returns a ``QuerySet`` of tag data dictionaries, rather than a JSON response.


Rejected Options
-----------------

Expand Down Expand Up @@ -199,4 +156,4 @@ can return all the tags in one page. So we can perform the tag search on the fro
**Cons:**

- It is not scalable.
- Sets limits of tags that can be created in the taxonomy.
- Sets limits of tags that can be created in the taxonomy.
2 changes: 1 addition & 1 deletion openedx_learning/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""
Open edX Learning ("Learning Core").
"""
__version__ = "0.3.5"
__version__ = "0.3.6"
4 changes: 3 additions & 1 deletion openedx_tagging/core/tagging/models/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -493,13 +493,15 @@ def _get_filtered_tags_deep(
if pk is not None:
matching_ids.append(pk)
qs = qs.filter(pk__in=matching_ids)
qs = qs.annotate(child_count=models.Count("children", filter=Q(children__pk__in=matching_ids)))
elif excluded_values:
raise NotImplementedError("Using excluded_values without search_term is not currently supported.")
# We could implement this in the future but I'd prefer to get rid of the "excluded_values" API altogether.
# It remains to be seen if it's useful to do that on the backend, or if we can do it better/simpler on the
# frontend.
else:
qs = qs.annotate(child_count=models.Count("children"))

qs = qs.annotate(child_count=models.Count("children"))
# Add the "depth" to each tag:
qs = Tag.annotate_depth(qs)
# Add the "lineage" as a field called "sort_key" to sort them in order correctly:
Expand Down
9 changes: 3 additions & 6 deletions openedx_tagging/core/tagging/rest_api/paginators.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,7 @@
from edx_rest_framework_extensions.paginators import DefaultPagination # type: ignore[import]

# From this point, the tags begin to be paginated
TAGS_THRESHOLD = 1000

# From this point, search tags begin to be paginated
SEARCH_TAGS_THRESHOLD = 200
MAX_FULL_DEPTH_THRESHOLD = 10_000


class TagsPagination(DefaultPagination):
Expand All @@ -29,5 +26,5 @@ class DisabledTagsPagination(DefaultPagination):
It should be used if the number of tags within
the taxonomy does not exceed `TAGS_THRESHOLD`.
"""
page_size = TAGS_THRESHOLD
max_page_size = TAGS_THRESHOLD + 1
page_size = MAX_FULL_DEPTH_THRESHOLD
max_page_size = MAX_FULL_DEPTH_THRESHOLD + 1
10 changes: 8 additions & 2 deletions openedx_tagging/core/tagging/rest_api/v1/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from __future__ import annotations

from typing import Any
from urllib.parse import urlencode

from rest_framework import serializers
from rest_framework.reverse import reverse
Expand Down Expand Up @@ -162,12 +163,17 @@ def get_sub_tags_url(self, obj: TagData | Tag):
child_count = obj.child_count if isinstance(obj, Tag) else obj["child_count"]
if child_count > 0 and "taxonomy_id" in self.context:
value = obj.value if isinstance(obj, Tag) else obj["value"]
query_params = f"?parent_tag={value}"
request = self.context["request"]
query_params = request.query_params
new_query_params = {"parent_tag": value}
if "full_depth_threshold" in query_params:
new_query_params["full_depth_threshold"] = query_params["full_depth_threshold"]
if "search_term" in query_params:
new_query_params["search_term"] = query_params["search_term"]
url_namespace = request.resolver_match.namespace # get the namespace, usually "oel_tagging"
url = (
reverse(f"{url_namespace}:taxonomy-tags", args=[str(self.context["taxonomy_id"])])
+ query_params
+ "?" + urlencode(new_query_params)
)
return request.build_absolute_uri(url)
return None
Expand Down
76 changes: 44 additions & 32 deletions openedx_tagging/core/tagging/rest_api/v1/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
from ...import_export.parsers import ParserFormat
from ...models import Taxonomy
from ...rules import ObjectTagPermissionItem
from ..paginators import TAGS_THRESHOLD, DisabledTagsPagination, TagsPagination
from ..paginators import MAX_FULL_DEPTH_THRESHOLD, DisabledTagsPagination, TagsPagination
from .permissions import ObjectTagObjectPermissions, TagObjectPermissions, TaxonomyObjectPermissions
from .serializers import (
ObjectTagListQueryParamsSerializer,
Expand Down Expand Up @@ -528,24 +528,28 @@ class TaxonomyTagsView(ListAPIView, RetrieveUpdateDestroyAPIView):
"""
View to list/create/update/delete tags of a taxonomy.
If you specify ?root_only or ?parent_tag_value=..., only one "level" of the
hierachy will be returned. Otherwise, several levels will be returned, in
tree order, up to the maximum supported depth. Additional levels/depth can
be retrieved by using ?parent_tag_value to load more data.
**List Request Details**
Note: If the taxonomy is particularly large (> 1,000 tags), ?root_only is
automatically set true by default and cannot be disabled. This way, users
can more easily select which tags they want to expand in the tree, and load
just that subset of the tree as needed. This may be changed in the future.
By default, only one "level" of the hierachy will be returned. But for more efficient handling of small taxonomies,
you can set ?full_depth_threshold=1000 which means that if there are fewer than 1,000 tags in the result set, they
will be returned in a single page of results that has all the tags in tree order up to the maximum supported depth.
Additional levels/depth can be retrieved recursively by using the "sub_tags_url" field of any returned tag, or
using the ?parent_tag=value parameter manually.
**List Query Parameters**
* id (required) - The ID of the taxonomy to retrieve tags.
* search_term (optional) - Only return tags matching this term, plus their ancestors. Case insensitive.
* parent_tag (optional) - Retrieve children of the tag with this value.
* root_only (optional) - If specified, only root tags are returned.
* include_counts (optional) - Include the count of how many times each
tag has been used.
* full_depth_threshold (optional) - If there are fewer than this many results, return the full (sub)tree of
results, up to maximum supported depth, in a single giant page of results. By default this is disabled
(equivalent to full_depth_threshold=0), which provides a more consistent and predictable response.
Using full_depth_threshold=1000 is recommended in general, but use lower values during development to ensure
compatibility with both large and small result sizes.
* include_counts (optional) - Include the count of how many times each tag has been used.
* page (optional) - Page number (default: 1)
* page_size (optional) - Number of items per page (default: 10)
* page_size (optional) - Number of items per page (default: 30). Ignored when there are fewer tags than
specified by ?full_depth_threshold.
**List Example Requests**
GET api/tagging/v1/taxonomy/:id/tags - Get tags of taxonomy
Expand Down Expand Up @@ -655,34 +659,42 @@ def get_queryset(self) -> TagDataQuerySet:
taxonomy_id = int(self.kwargs.get("pk"))
taxonomy = self.get_taxonomy(taxonomy_id)
parent_tag_value = self.request.query_params.get("parent_tag", None)
root_only = "root_only" in self.request.query_params
include_counts = "include_counts" in self.request.query_params
search_term = self.request.query_params.get("search_term", None)

if parent_tag_value:
# Fetching tags below a certain parent is always paginated and only returns the direct children
depth = 1
if root_only:
raise ValidationError("?root_only and ?parent_tag cannot be used together")
try:
full_depth_threshold = int(self.request.query_params.get("full_depth_threshold", 0))
except ValueError:
full_depth_threshold = -1
if full_depth_threshold < 0 or full_depth_threshold > MAX_FULL_DEPTH_THRESHOLD:
raise ValidationError("Invalid full_depth_threshold")

if full_depth_threshold or search_term:
# If full_depth_threshold is set, default to maximum depth and later prune to depth=1 if needed.
# We also need to do a deep search if there is a search term, to ensure we return parent items with children
# that match.
depth = None
else:
if root_only:
depth = 1 # User Explicitly requested to load only the root tags for now
elif search_term:
depth = None # For search, default to maximum depth but use normal pagination
elif taxonomy.tag_set.count() > TAGS_THRESHOLD:
# This is a very large taxonomy. Only load the root tags at first, so users can choose what to load.
depth = 1
else:
# We can load and display all the tags in the taxonomy at once:
self.pagination_class = DisabledTagsPagination
depth = None # Maximum depth
depth = 1 # Otherwise (default), load a single level of results only.

return taxonomy.get_filtered_tags(
results = taxonomy.get_filtered_tags(
parent_tag_value=parent_tag_value,
search_term=search_term,
depth=depth,
include_counts=include_counts,
)
if depth == 1:
# We're already returning just a single level. It will be paginated normally.
return results
elif full_depth_threshold and len(results) < full_depth_threshold:
# We can load and display all the tags in this (sub)tree at once:
self.pagination_class = DisabledTagsPagination
return results
else:
# We had to do a deep query, but we will only return one level of results.
# This is because the user did not request a deep response (via full_depth_threshold) or the result was too
# large (larger than the threshold).
# It will be paginated normally.
return results.filter(parent_value=parent_tag_value)

def post(self, request, *args, **kwargs):
"""
Expand Down
Loading

0 comments on commit b98f091

Please sign in to comment.