Replies: 5 comments 4 replies
-
@timbray It looks like I tried replacing type FieldsSort []Field
func (a FieldsSort) Len() int {
return len(a)
}
func (a FieldsSort) Less(i, j int) bool {
return bytes.Compare(a[i].Path, a[j].Path) < 0
}
func (a FieldsSort) Swap(i, j int) {
a[i], a[j] = a[j], a[i]
}
sort.Sort(FieldsSort(fields)) And now there is no calls to runtime I ran cl2 tests and city-lots to compare if it actually improves the situation:
Baseline: #167 (33d2255)
|
Beta Was this translation helpful? Give feedback.
-
OK, I replaced sort.Slice with sort.Sort as in your code and cl2_test increased from 153K/second to 159K/second. Not huge but every little bit helps. Now to look at the profiler again… |
Beta Was this translation helpful? Give feedback.
-
!!!!!!! OK, that does it, I must get an M1. I have a 2019 intel Mac. BTW which configuration is yours? |
Beta Was this translation helpful? Give feedback.
-
Also it's really weird that you're getting much faster time on EXACT than ANYTHING-BUT. This is on the upstream origin or on my PR branch? |
Beta Was this translation helpful? Give feedback.
-
OK, pardon, I was being stupid. I used to run the TestCityLots all the time and I was used to seeing numbers like 150K-180K on TestCityLots and so I interpreted 15xxxxx as 15xxxx. So we are in the same region, although the M1 is considerably faster. TestCityLots is a lot slower because it is matching the numeric fields so it has to read all those big floating-point arrays. So the very nice performance of TestCL2 is probably due mostly to your work on the flattener so it can skip most of the data. |
Beta Was this translation helpful? Give feedback.
-
I'm building a big PR in support of #153 and one of the things I decided to bring over from Event Ruler is the CL2 benchmark, see https://github.com/aws/event-ruler/blob/main/src/test/software/amazon/event/ruler/Benchmarks.java#L469.
I just pulled in the first test (look for EXACT_RULES and EXACT_MATCHES) and noticed that Ruler was a quite a bit faster than Quamina, 200K matches/sec as opposed to 150K or so. So I profiled it and the majority of time is in
CoreMatcher.matchesForFields()
as one would expect, but more or less 100% of the elapsed time in that func seems to beruntime.convTslice()
. I looked at the source for that (https://go.dev/src/runtime/iface.go) and found it fairly puzzling, so I guess the next step is to look at the generated code and see where that's being called?If the profiler is right, if we could remove this routine (looks like housekeeping) Quamina could become much faster. At the back of my mind I'm wondering if it's related to the
X
type beingany
and if that forces silly slice reprocessing to some more specific type.Anyhow, just posting this to capture state of mind as I knock off for the day.
Beta Was this translation helpful? Give feedback.
All reactions