Pattern De-Duplication based on Subsequence Detection #1031

miles-grant-ibigroup · 2023-10-12T21:56:43Z

Description:

Possibly a slightly overbuilt way to use subsequence detection to remove patterns that are simply a subsequence of another, larger pattern.

There are definitely a few optimizations possible here. I am looking for feedback on some of these algorithms and if there's a way to make the more efficient.

I believe this should be possible in O(n) time, right now it's O(n^2) although reversing the sorted array should help remove the most useless of these comparisons

amy-corson-ibigroup

What do we think of this possible cleanup? Approving because this does work as-is!!

amy-corson-ibigroup · 2023-10-17T22:00:28Z

lib/actions/apiV2.js

+              .filter((pattern) => {
+                // Compare to all other patterns TODO: make this beat O(n^2)
+                return !patternsSortedByLength.find((p) => {
+                  // Don't compare against ourself
+                  if (p.id === pattern.id) return false
+
+                  // If our pattern is longer, it's not a subset
+                  if (p.stops.length < pattern.stops.length) return false
+
+                  return isValidSubsequence(
+                    p.stops.map((s) => s.id),
+                    pattern.stops.map((s) => s.id)
+                  )
+                })
+              })


What about

Suggested change

.filter((pattern) => {

// Compare to all other patterns TODO: make this beat O(n^2)

return !patternsSortedByLength.find((p) => {

// Don't compare against ourself

if (p.id === pattern.id) return false

// If our pattern is longer, it's not a subset

if (p.stops.length < pattern.stops.length) return false

return isValidSubsequence(

p.stops.map((s) => s.id),

pattern.stops.map((s) => s.id)

)

})

})

.filter((pattern, patternIndex) => {

// Compare to all other patterns larger than the current pattern

return !patternsSortedByLength.find((p, pIndex) => {

if (pIndex >= patternIndex) return false

return isValidSubsequence(

p.stops.map((s) => s.id),

pattern.stops.map((s) => s.id)

)

})

})

You're already sorting by length so checking anything above a certain index number feels like a waste of time, and you could remove the if (p.id === pattern.id) block because p and pattern should have the same index. This gets the same result for 1-line patterns but might need more testing!!

I think we need to check the pattern id because the indexes don't always line up. I agree the length check is not required but to do that we need to start checking the second array at the right index and that logic is currently not present

daniel-heppner-ibigroup

Neat algorithm, I don't see a way to make it faster immediately. Good start!

initial attempt at pattern-de-duplication based on subsequence detection

4244d41

miles-grant-ibigroup assigned amy-corson-ibigroup Oct 12, 2023

amy-corson-ibigroup approved these changes Oct 17, 2023

View reviewed changes

amy-corson-ibigroup assigned miles-grant-ibigroup and unassigned amy-corson-ibigroup Oct 17, 2023

miles-grant-ibigroup assigned daniel-heppner-ibigroup and unassigned miles-grant-ibigroup Oct 18, 2023

Merge branch 'dev' into pattern-de-duplication

725a32e

daniel-heppner-ibigroup approved these changes Oct 23, 2023

View reviewed changes

daniel-heppner-ibigroup assigned miles-grant-ibigroup and unassigned daniel-heppner-ibigroup Oct 23, 2023

Merge branch 'dev' into pattern-de-duplication

2fdb46b

miles-grant-ibigroup merged commit c628631 into dev Oct 23, 2023
7 checks passed

miles-grant-ibigroup deleted the pattern-de-duplication branch October 23, 2023 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pattern De-Duplication based on Subsequence Detection #1031

Pattern De-Duplication based on Subsequence Detection #1031

miles-grant-ibigroup commented Oct 12, 2023

amy-corson-ibigroup left a comment

amy-corson-ibigroup Oct 17, 2023

miles-grant-ibigroup Oct 18, 2023

daniel-heppner-ibigroup left a comment

Pattern De-Duplication based on Subsequence Detection #1031

Pattern De-Duplication based on Subsequence Detection #1031

Conversation

miles-grant-ibigroup commented Oct 12, 2023

amy-corson-ibigroup left a comment

Choose a reason for hiding this comment

amy-corson-ibigroup Oct 17, 2023

Choose a reason for hiding this comment

miles-grant-ibigroup Oct 18, 2023

Choose a reason for hiding this comment

daniel-heppner-ibigroup left a comment

Choose a reason for hiding this comment