Replies: 3 comments 11 replies
-
So yes, hammerdb is scripted so you can change the workloads to do anything you wish. Also yes vector databases in general are becoming popular and I agree that the use of vector queries in regular RDBMS databases rather than standalone databases is the direction things appear to be heading, there is a lot of existing data already in PostgreSQL, MySQL/MariaB e.g. https://mariadb.org/projects/mariadb-vector/ For recall metrics it would be best to provide an example of what you are doing and what the metrics would look like?. Such as the percentage of relevant results returned, and latency? For latency we already have the xtprof timeprofiler which overloads proc. If you look in the module xtprof-1.0.tm you can see we limit this overload to TPROC-C stored procedures neword, payment etc however if you modified this and added eg a proc called "recall" to the list in the xtprof module and the driver script then hammerdb would also report the latencies for this proc as well. There is also an original more basic timing module called etprof that can be explored. It is worth noting that we still have an open issue to add the TPROC-CH workload which is a mix of OLTP and OLAP #123 that it sounds like this would work well with. |
Beta Was this translation helpful? Give feedback.
-
@sm-shaw As part of implementing the above for Postgres, I'm trying to find out a good value for ramp up time. Since this is a mixed workload, there is naturally contention between the two different workloads for memory (shared_buffer in case of postgres). I was looking for some guidance around what should be the criteria for selecting the ramp up in such a scenario? |
Beta Was this translation helpful? Give feedback.
-
Hello, As a contributor to this repo and the core contributor to a Postgres extension repo that does ANN, I have a question: are the OLTP/OLAP and Vector runs isolated other than the fact that they happen on the same machine? In other words, other than the fact that these runs happen on the same machine, is one benchmark expected to have an impact on the other benchmark? |
Beta Was this translation helpful? Give feedback.
-
Recent enhancement in the postgres space has added multiple database extensions that allow for vector search using postgres. One such example is pgvector. Although there are dedicated benchmarks that are used for semantic workloads only (such as ann-benchmark), I'm interested in using HammerDB and adding some Vector queries to the mix, since this is one of the pros of using PG (i.e. getting both OLTP/OLAP and Vector capabilities in a single database). For example, I use the tpc-c benchmark scripts and add a vector query in the mix.
The above seems doable, however, in addition to TPS there is another metric that is important in the space of benchmarking vector dataset - which is "recall". Recall requires post processing on the response of the SQL select queries and then comparing it to some precomputed data for that query. This is where I'm unsure if HammerDB has support (or examples) where something similar has been done before, or even the option to add a new metric.
Beta Was this translation helpful? Give feedback.
All reactions