Fix paired aggregation behaviour #676

paul-butcher · 2023-07-17T13:53:57Z

paul-butcher · 2023-07-17T13:55:49Z

search/src/test/scala/weco/api/search/works/filtering/FilterAndAggregateByContributorTest.scala

@@ -59,8 +59,8 @@ class FilterAndAggregateByContributorTest
      "contributors.agent.label" -> Seq(
        (2, "Bath, Patricia"),
        (2, "Darwin \\\"Jones\\\", Charles"),
-        (1, "Jake Paul"),
-        (1, "Karl Marx")
+        (1, "Karl Marx"),


Now that order is enforced, Karl comes before Jake

paul-butcher · 2023-07-17T13:56:46Z

search/src/test/scala/weco/api/search/services/AggregationsTest.scala

@@ -148,35 +148,19 @@ class AggregationsTest
            List(WorkAggregationRequest.Format, WorkAggregationRequest.Subject),
          filters = List(
            FormatFilter(List("a")),
-            SubjectLabelFilter(Seq("pGkJTZWwn4"))


This subject does not appear to exist in the test dataset. I'm a bit surprised that this test has been passing. I suspect it's something to do with returning 0-count buckets.

paul-butcher · 2023-07-17T13:58:29Z

search/src/test/scala/weco/api/search/services/AggregationsTest.scala

          )
        )
        whenReady(aggregationQuery(index, searchOptions)) { aggs =>
          val buckets = aggs.format.get.buckets
-          buckets.length shouldBe works.length
+          buckets.length shouldBe 7


There was no subject in the existing dataset that was present in all works. However, this is an improvement in that it shows the filter being applied.

paul-butcher · 2023-07-17T14:06:14Z

search/src/main/scala/weco/api/search/services/FiltersAndAggregationsBuilder.scala

+      case agg => agg
+    }
+
+  def requestToOrderedAggregation(


This was another issue I was seeing at the same time - entries in the languages list would jump around when filters are applied.

Because it also makes the main problem a bit easier to test, I have fixed it at the same time.

jamieparkinson

Is there work to be done here to delete old code or will that come later?

jamieparkinson · 2023-07-17T14:13:15Z

search/src/main/scala/weco/api/search/models/Aggregation.scala

+    allBuckets flatMap {
+      _.asArray.get
+    } map (k => k.as[Bucket].right.get) distinct


are these gets both 100% safe?

Good point. I think they are both 99% safe. To break it, we would have to be passing in either a Json with a "buckets" property that does not contain a list, or that members of that list are not buckets.

I think it's guaranteed by the general context that the object being passed in conforms, but I've done nothing in here to ensure it.

It is my personal preference to be pedantic (quelle surprise) and only to do type-safe getting of things out of containers but up to you really

jamieparkinson · 2023-07-17T14:14:11Z

search/src/main/scala/weco/api/search/services/FiltersAndAggregationsBuilder.scala

+        val filterTerm = filterQuery match {
+          case TermsQuery(_, values, _, _, _, _, _) =>
+            values.map(value => s"""\\"$value\\"""").mkString("|")
+          case _ => ""
+        }
+        terms.minDocCount(0).includeRegex(s".*($filterTerm).*")


Think this all deserves a comment about what it does and why it does it

paul-butcher · 2023-07-17T14:47:23Z

Is there work to be done here to delete old code or will that come later?

There's some old code that is still in use by countWorkTypes. Once that is moved over to the new way, I'll be able to cut quite a few lines.

This reverts commit 7004f60.

* remove AnyFunSpec from base class * remove AnyFunSpec from base class * commit missed file * Do not return unwanted empty buckets * Do not return unwanted empty buckets * extend no-empties rule to images * avoid naming collision * better naming, as per review * Fix paired aggregation behaviour (#676) * Fix paired aggregation behaviour * improve safety * fix sorting even more * improve commentary * Karl and Jake * remove redundant test

* start gherkining the facets * start filling out the faceting features * Apply auto-formatting rules * messing about with portability of the features file * only return populated buckets * finish new faceting feature tests for works. * add todo about order * a bit more faceting test finessing * WorkFacet tests pass (todo: spread the filtering business to other fields) * aggregate properly on language * workType filtering and aggregation * subjects and contributors * availabilities * tidy up bucket matching * improve filter test output * license works properly * skip tests that are not yet ready * filters now only return the filtered value in aggregations * remove AnyFunSpec from base class * remove AnyFunSpec from base class * commit missed file * Do not return unwanted empty buckets * Do not return unwanted empty buckets * extend no-empties rule to images * avoid naming collision * new faceting paradigm * enforce order in aggregations * remove unnecessary ignored scenarios * fix test to correspond ot actual filtering * fix low-level tests that expected the old-style query * fix the "all other" test * improve commentary * tidying * revert irrelevant change * improve commentary * revert irrelevant change * revert irrelevant change * Fix paired aggregation behaviour * add test to ensure all filtered values are returned * improve safety * include parens in a test * fix sorting even more * improve commentary * better rule name * Karl and Jake * remove redundant test * start adding Image faceting * finish adding Image faceting tests * better naming, as per review * some fiddling to get round E4S Template limitations * add example to work towards * Apply auto-formatting rules * Inline templates work, now to put all the other bits in * still more messing about * don't query if no query term * Fix paired aggregation behaviour (#676) * Fix paired aggregation behaviour * improve safety * fix sorting even more * improve commentary * Karl and Jake * remove redundant test * It Works! * Apply auto-formatting rules * tidy whitespace * Apply auto-formatting rules * don't use E4s Indexes * Apply auto-formatting rules * tidying * Apply auto-formatting rules * tidy * Apply auto-formatting rules * more tidying * Apply auto-formatting rules * autoformat * Apply auto-formatting rules * remove redundant note * match ImageFilter search to main * Apply auto-formatting rules * minimise diff * Apply auto-formatting rules * switch off intellij autoformat and try again * Apply auto-formatting rules * bit more tidying * better template params * improve commentary * Update search/src/main/scala/weco/api/search/services/WorksRequestBuilder.scala Co-authored-by: Jamie Parkinson <[email protected]> * improve commentary --------- Co-authored-by: Buildkite on behalf of Wellcome Collection <[email protected]> Co-authored-by: Jamie Parkinson <[email protected]>

* Genre concept ids (#639) * add genre concept ids to image filters * Apply auto-formatting rules * update test docs * update test docs * add works filter test * new test docs * genre concept id filter on works * add genre concept ids to image filters * Apply auto-formatting rules * use new availability data in aggregation test * update images includes tests * update availabilities test data * update test for new data * update test for new data * update test for new data * remove .id from genre concept filters --------- Co-authored-by: Buildkite on behalf of Wellcome Collection <[email protected]> * Apply auto-formatting rules * Copy across the new example documents for images * Add filtering by production date on images * Add sorting by source date on images * Apply auto-formatting rules * Annotate diff tool (#651) * first stab * annotate on nano * Apply auto-formatting rules * try without the colours * term wrapping * annotation styling * annotation styling * double dollar * put it in the right pipeline * add artifact path --------- Co-authored-by: Buildkite on behalf of Wellcome Collection <[email protected]> * update the rank tests to use a changed work id * delete rogue m * corrected [search_docs] and [snapshots] URLs * mark test which searches for multiple reference numbers as a knownFailure * Remove references to _queryType * Apply auto-formatting rules * Bump scala-libs to v32.25.0 * Bump scala-libs to v32.26.0 * 5695-use new 2023-06-01 pipeline, now with added Accession Numbers * These are not index tests * Allow failures of e2e tests * new pipeline (#663) * update id for failing reference number test * Reduce the cost of running the catalogue API * Use Fargate Spot for running stage services * Turn off the stage APIs outside office hours Note: this check is deliberately a conservative check for the `stage` environment (opting in to cost-saving behaviour) rather than a loose check that we're not in the `prod` environment (opting out). This means we're less likely to implement cost-saving in an environment which is part of a production service. * Remove AnyFunSpec from base trait (#673) * remove AnyFunSpec from base class * remove AnyFunSpec from base class * commit missed file * removes mustQueries and QueryConfig * Do not return unwanted empty buckets (#674) * remove AnyFunSpec from base class * remove AnyFunSpec from base class * commit missed file * Do not return unwanted empty buckets * Do not return unwanted empty buckets * extend no-empties rule to images * avoid naming collision * better naming, as per review * Bump scala-libs to v32.27.0 * updates imagesParams to use RBGColor decoder + new ColorQuery * Fix paired aggregation (#679) * remove AnyFunSpec from base class * remove AnyFunSpec from base class * commit missed file * Do not return unwanted empty buckets * Do not return unwanted empty buckets * extend no-empties rule to images * avoid naming collision * better naming, as per review * Fix paired aggregation behaviour (#676) * Fix paired aggregation behaviour * improve safety * fix sorting even more * improve commentary * Karl and Jake * remove redundant test * bumps akka to match scala-libs * image filter test fails * Apply auto-formatting rules * Aggregation tests (#667) * start gherkining the facets * start filling out the faceting features * Apply auto-formatting rules * messing about with portability of the features file * only return populated buckets * finish new faceting feature tests for works. * add todo about order * a bit more faceting test finessing * WorkFacet tests pass (todo: spread the filtering business to other fields) * aggregate properly on language * workType filtering and aggregation * subjects and contributors * availabilities * tidy up bucket matching * improve filter test output * license works properly * skip tests that are not yet ready * filters now only return the filtered value in aggregations * remove AnyFunSpec from base class * remove AnyFunSpec from base class * commit missed file * Do not return unwanted empty buckets * Do not return unwanted empty buckets * extend no-empties rule to images * avoid naming collision * new faceting paradigm * enforce order in aggregations * remove unnecessary ignored scenarios * fix test to correspond ot actual filtering * fix low-level tests that expected the old-style query * fix the "all other" test * improve commentary * tidying * revert irrelevant change * improve commentary * revert irrelevant change * revert irrelevant change * Fix paired aggregation behaviour * add test to ensure all filtered values are returned * improve safety * include parens in a test * fix sorting even more * improve commentary * better rule name * Karl and Jake * remove redundant test * start adding Image faceting * finish adding Image faceting tests * better naming, as per review --------- Co-authored-by: Buildkite on behalf of Wellcome Collection <[email protected]> * Define the main Works search query in plain Json (#682) * start gherkining the facets * start filling out the faceting features * Apply auto-formatting rules * messing about with portability of the features file * only return populated buckets * finish new faceting feature tests for works. * add todo about order * a bit more faceting test finessing * WorkFacet tests pass (todo: spread the filtering business to other fields) * aggregate properly on language * workType filtering and aggregation * subjects and contributors * availabilities * tidy up bucket matching * improve filter test output * license works properly * skip tests that are not yet ready * filters now only return the filtered value in aggregations * remove AnyFunSpec from base class * remove AnyFunSpec from base class * commit missed file * Do not return unwanted empty buckets * Do not return unwanted empty buckets * extend no-empties rule to images * avoid naming collision * new faceting paradigm * enforce order in aggregations * remove unnecessary ignored scenarios * fix test to correspond ot actual filtering * fix low-level tests that expected the old-style query * fix the "all other" test * improve commentary * tidying * revert irrelevant change * improve commentary * revert irrelevant change * revert irrelevant change * Fix paired aggregation behaviour * add test to ensure all filtered values are returned * improve safety * include parens in a test * fix sorting even more * improve commentary * better rule name * Karl and Jake * remove redundant test * start adding Image faceting * finish adding Image faceting tests * better naming, as per review * some fiddling to get round E4S Template limitations * add example to work towards * Apply auto-formatting rules * Inline templates work, now to put all the other bits in * still more messing about * don't query if no query term * Fix paired aggregation behaviour (#676) * Fix paired aggregation behaviour * improve safety * fix sorting even more * improve commentary * Karl and Jake * remove redundant test * It Works! * Apply auto-formatting rules * tidy whitespace * Apply auto-formatting rules * don't use E4s Indexes * Apply auto-formatting rules * tidying * Apply auto-formatting rules * tidy * Apply auto-formatting rules * more tidying * Apply auto-formatting rules * autoformat * Apply auto-formatting rules * remove redundant note * match ImageFilter search to main * Apply auto-formatting rules * minimise diff * Apply auto-formatting rules * switch off intellij autoformat and try again * Apply auto-formatting rules * bit more tidying * better template params * improve commentary * Update search/src/main/scala/weco/api/search/services/WorksRequestBuilder.scala Co-authored-by: Jamie Parkinson <[email protected]> * improve commentary --------- Co-authored-by: Buildkite on behalf of Wellcome Collection <[email protected]> Co-authored-by: Jamie Parkinson <[email protected]> * Bump scala-libs to v32.28.0 * add new required parameter (#684) * knn color query works but only if defined * Apply auto-formatting rules * Bump scala-libs to v32.29.0 * optionally calls knn when building images request * Bump scala-libs to v32.30.0 * Bump scala-libs to v32.31.0 * Bump scala-libs to v32.32.0 * Bump scala-libs to v32.33.0 * Bump scala-libs to v32.34.0 * Fix the paths in the storage imports * Bump scala-libs to v32.35.0 * Bump scala-libs to v32.36.0 * Bump scala-libs to v32.37.0 * updated test-documents with paletteEmbedding * updated test-documents with paletteEmbedding take2 * updated expected_responses to match updated test-documents * updated test-documents with paletteEmbedding take3 * removes request with blended and colour similarities * lil test fixes * Apply auto-formatting rules * removes stray comment/unused import * removed unused test_documents * use the right pipeline * Revert "removed unused test_documents" This reverts commit 5b29ac2. * fixes decluttering gone too far * Revert "Image query with vector" * fixes works whose id has changed * Remove a couple of unused terraform pieces * Remove a now-unused terraform module * Point at the new location of the remote account state * Apply auto-formatting rules * fixes rank test: use reference number that is in sync in both pipelines * Bump scala-libs to v32.38.0 * Bump scala-libs to v32.39.0 * Revert "Revert "Image query with vector"" This reverts commit 27760ca. * fixes the rounding error when generating queryVector * adds test for combined query and knn colour search * fix the 10k max results problem (#705) * fix the 10k max results problem * remove debug code * Switch to using the official Black Docker image for Python formatting * sets number of bins to 10 and adds blurring to queryVector * code style and test fixes * start converting image search to a template * Bump to the newest Elasticsearch index * revert irrelevant changes * remove commented code * make template configs protected * better whitespace normalisation * better whitespace normalisation * remove KNN boost * sort out SearchTemplateResponse for images * reinstate change merged from main * Roll back to 06-09 * Bump to 06-10 * copies test data from catalogue-pipeline * updates colour filter tests + suitable test-documents * uses latest pipeline * remove extraneous dots from filter terms * remove extraneous dots from filter terms * fixes trailing . in test-docs for aggs * Run new rank in branches * We don't want a shell entrypoint * How can we run a command * Relative path * What if we put the command here * Break up command * Role needs to be assumed for everything * Propagate environment * Run new rank in branches * We don't want a shell entrypoint * How can we run a command * Relative path * What if we put the command here * Break up command * Role needs to be assumed for everything * Propagate environment * Test images too * Paired aggregation clarity (#714) * limit self aggregation to filter value list size * Improve clarity in the weird aggregation stuff * unmock tests to cope with stricter FAAB * join knn filter terms with AND (#716) * Use new rank for scheduled tests * New rank for post-deploy tests * Remove old rank * Don't need volumes when not using local query * Prefiltering in template queries (#717) * start prepping for prefiltering * join knn filter terms with AND * Distinguish between pairable and unpairable filters * escape optional regex characters (#719) * Bump scala-libs to v32.40.0 * Construct the client using an API key * new pipeline (#724) * Use the pipeline API key in the snapshot reporter * Apply auto-formatting rules * Bump lambdas to Python 3.10 * Changes for Python 3.10 * Apply auto-formatting rules * new pipeline (#728) * Bump pipeline date * Update test documents * New queries * Update index config for tests * Deal with moved feature vector * Use filterableValues * Test fixes/additions * Fix filter field names * Somewhat more appropriate knn filter for new query * partOf no longer does title as well * Create appropriate test documents in-memory too * Apply auto-formatting rules * Use correct file for rank tests * Fix search tests * Snapshot test data needed updating * Can we choose the cluster dynamically? * Bad copypaste * Remove query _names for actual application queries * Make the query respect all terms * Update index config etc in tests * Reinstate some query names * Add a tie breaker to the image query * Case insensitive identifiers filter * AWS CLI ECR BBQ login commmand has changed * Don't need eval now * I love bash scripting * Bump pipeline date to 2024-01-03 * Bump pipeline date to 2024-01-09 * Add option for HTTP healthcheck, enable for works * Apply auto-formatting rules * bool tcp_healthcheck test is backwards, fix it * matcher is required for http healthcheck * Bump scala-libs to v32.40.1 * Bump the SBT version from 1.4.1 to 1.9.8 The 1.4.1 version of SBT has an issue with the version of the JNA library is uses which prevents it from loading on M1 macs. See: sbt/io#320 The most recent version (1.9.8) of sbt works with this project, so upgrading. - [ ] Build and test locally, does it succeed? - [ ] Does this PR get a green tick when running in CI? * Bump scala-libs to v32.40.2 * Add addDependencyTreePlugin plugin This change adds a plugin to generate and search dependency graphs with sbt. This is useful when we see errors caused by transitive dependencies, so we can work out which libraries we directly specify pull these in. See https://github.com/sbt/sbt-dependency-graph for the history of this plugin (new built into sbt after version 1.4. * Bump scala-libs to v32.40.3 * Bump scala-libs to v32.40.4 * Removes soft fails for e2es as not flaky no mo * Adds a simple healthcheck endpoint for the items service This change follows #736, and adds an HTTP healthcheck to the items API to ensure the scala service has started before it is registered healthy at the NLB and starts serving requests. * enable the http healthcheck for items api * Apply auto-formatting rules * Add default code owners * Add wider net of approvers for changes to codeowners and buildkite config * add requests api healthcheck endpoint * Add healthcheck controller & endpoint This change adds a healthcheck endpoint to provide to a loadbalancer for the purposes of checking application health at deployment and under normal service. * remove optional tcp healthcheck, no longer used * Update docs/requesting_flow.md Co-authored-by: Paul Butcher <[email protected]> * remove some bad tf around healthchecks [skip ci] * Add @wellcomecollection/developers to /terraform folder in CODEOWNERS ## What is this change? This change allows reviews from all developers to be sufficient for terraform changes. At present an approval to merge a PR for a terraform change must come from someone in the developers group, we probably want to widen that. ## How to test? Merge it and see if developers outside the scala-reviewers group can approve terraform changes. * Update requesting_flow.md Mermaid on Github doesn't like it when an arrow is unlabelled in a sequence diagram, even when it's just a response to the previous message in the opposite direction. I have added an "OK" to each of these and turned them all into dashed arrows to signify they are responses. * Add a diagram showing what the request flow looks like * Update docs/requesting_flow.md Co-authored-by: Paul Butcher <[email protected]> * Update requesting_flow.md Mermaid on Github doesn't like it when an arrow is unlabelled in a sequence diagram, even when it's just a response to the previous message in the opposite direction. I have added an "OK" to each of these and turned them all into dashed arrows to signify they are responses. --------- Co-authored-by: Paul Butcher <[email protected]> Co-authored-by: Buildkite on behalf of Wellcome Collection <[email protected]> Co-authored-by: Harrison Pim <[email protected]> Co-authored-by: harrisonpim <[email protected]> Co-authored-by: georgiaewhitney <[email protected]> Co-authored-by: Raphaelle Cantin <[email protected]> Co-authored-by: Raphaelle Cantin <[email protected]> Co-authored-by: Agnes Garoux <[email protected]> Co-authored-by: Jamie Parkinson <[email protected]> Co-authored-by: agnesgaroux <[email protected]> Co-authored-by: jamie <[email protected]> Co-authored-by: Robert Kenny <[email protected]> Co-authored-by: Robert Kenny <[email protected]>

Fix paired aggregation behaviour

c66cdac

paul-butcher commented Jul 17, 2023

View reviewed changes

jamieparkinson approved these changes Jul 17, 2023

View reviewed changes

paul-butcher added 6 commits July 18, 2023 11:12

improve safety

d424b9f

Merge branch 'no-empty-buckets' into fix-paired-filtering

9ffd6ac

fix sorting even more

feed3d4

improve commentary

ecb7d65

Karl and Jake

0d83ab3

remove redundant test

0afb25d

paul-butcher merged commit 7004f60 into no-empty-buckets Jul 26, 2023

paul-butcher added a commit that referenced this pull request Jul 26, 2023

Revert "Fix paired aggregation behaviour (#676)"

d00e21e

This reverts commit 7004f60.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix paired aggregation behaviour #676

Fix paired aggregation behaviour #676

paul-butcher commented Jul 17, 2023

paul-butcher Jul 17, 2023

paul-butcher Jul 17, 2023

paul-butcher Jul 17, 2023

paul-butcher Jul 17, 2023

jamieparkinson left a comment

jamieparkinson Jul 17, 2023

paul-butcher Jul 17, 2023

jamieparkinson Jul 18, 2023

jamieparkinson Jul 17, 2023

paul-butcher commented Jul 17, 2023

Fix paired aggregation behaviour #676

Fix paired aggregation behaviour #676

Conversation

paul-butcher commented Jul 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamieparkinson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-butcher commented Jul 17, 2023