-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Realtime Query Logging without Database #1433
Comments
Sounds reasonable. I even started implementing it a long time ago but didn't finish it since I stopped using InfluxDB... 🤔 |
Awesome, happy to test a branch or help however I can. I run influxdb(1.8) for a few other things, so Id prefer to avoid having to run another DB just for query logging. It would be great to not need to run prometheus either, but one step at a time 😉 |
I can test with influxdb2. |
Ok I looked into it again and it seems that for each InfluxDB version a different client is required. O.o I'm a bit taken back by this. Cost vs benefits of implementing 3 clients for 1 database type is quite unreasonable... |
That is annoying(re different clients). I dont know a huge amount about Prometheus vs Telegraf, but I know blocky currently supports Prometheus, is there no way to expose this information via Prometheus? And then on the flip side what about doing so via Telegraf? |
Maybe we should externalize the query log into a separate service via gRPC? We can define streaming contract and provide implementation for postgres/mariadb/csv. People can create own implementation, since grpc is platform agnostic, it could be done in different language (for example java)? |
edit: totally overlooked the request here is for query logging. Leaving this here regarless in case someone searches for metrics. This would require use of telegraf, but not necessarily on the same host as blocky. This basically amounts to a scraper (via telegraf) with an output to influx. https://docs.influxdata.com/influxdb/cloud/write-data/developer-tools/scrape-prometheus-metrics/ https://github.com/influxdata/telegraf/blob/release-1.30/plugins/inputs/prometheus/README.md
|
I think doing something like that may be a bit overkill honestly. Honestly from my POV Id be happy with just a log file that contains all the requests in close to/near real time. The issue for me is really that Id like to avoid running a database, but using csv for an output means the data will be stale by at most 1 day. |
I dont want to derail this issue too much, but yes your correct, but this doesnt actually solve the issue of getting query logs into influx. Initially I actually did the same thing as you mentioned just to see what it looked like and if it would work(which it does), but I dont have enough grafana knowledge to adjust the blocky dashboard to get it working with telegraf instead of Prometheus |
I actually like the idea since the querylog is one of the main external interfaces it would enable alternative storage solutions like InfluxDB without pollution of the blocky code itself. |
Re InfluxDB having one client per version: Also I'm not against adding some kind of RPC/websocket for query logging, though I think implementing a new backend out of blocky might be more work than inside since you're starting from scratch. |
Generating a client from proto files(grpc) is fairly easy and can be done in most modern languages with the help of auto generated wrappers. 😉 I would have been interested in InfluxDB if the support of an implementation would have covered older backbends as well. A similar behavior(entry TTL and grafana support) could be achieved with redis where we already have a client(that only needs a minor extension to store query logs). 🤷♂️ |
+1 I don't think supporting just the latest influxdb would be ideal. To reiterate my last comment I think just writing the query log to a file in near real time would be sufficient and is the most flexible as anyone can parse a file. This could even be a CSV still just faster then once a day. The issue currently is really just that there is no way to get near realtime query log without running a database |
Alright then I think we should:
About the query logging: I'm not sold on gRPC + protobuf: I think HTTP Server Sent Events (SSE) + JSON would be easier to use for clients since they're even more ubiquitous, don't require a protocol build step, and most important IMO, we already use them and could make this "just" another service after #1427. The JSON format could be something very similar to what we use for structured logging in querylog.LoggerWriter, or maybe even the same if we change |
I've updated the original post to now be having a realtime query log without database. Apologies for originally tying the request too tightly with influx. While having a way to do this over the network would be useful I still think the lowest hanging option here is really just a near real time CSV file. Another alternative could be just writing to a remote syslog if we want something that can do it over the network |
Wouldn't that be already possible if you enable querylog to console and pipe the binary output to a remote syslog target? 🤔 |
No worries, I think it's better for everyone to bring up multiple possibilities :) Yeah I think Unix socket would be nicer to avoid going through actual storage, especially if you're not trying to save the data. I think the existing CSV option is actually already near real time: the "one per day" in the docs means the file name is suffixed with the date. |
I'm not familiar with SSE to be honest. 🫣 |
Hmm not sure. How would I do that exactly in a docker container? |
I have no issue with a Unix socket but I assume that whatever reads that socket would need to be in the same docker container? Oh yeah the CSV file confused me. Sounds like it could work then. The question is just how to handle the file name changing. I'll look into that with telegraf otherwise maybe we just need an option that simple appends to the same file rather then rotating Edit: It does seem that telegraf has a way to read files with a glob. So it may be possible to just say read any files in a folder. But I'll have to test and see what happens |
It's basically just a HTTP request that the server never closes and writes data by chunks. It's an alternative to WebSockets for when you only need events going one way. The nice thing is it's compatible with lots of clients out of the box because the protocol is basic HTTP.
Docker has options for how to collect logs, and I'm sure there's a way to send the container to something else.
Since it's a file, you can put it in a volume that both the host and container have access to! |
Imho, sse is good for webbrowser - server communication ang gRPC is more universal. Also from the performance point of view, gRPC should be better (http2,multiplexing,binary message format). |
I don't know if the performance really matters for a querylog stream, but SSE works fine with HTTP 2 and even 3. You can also multiplex other requests over the same connection if your client supports it. In the sense of what you were proposing to have gRPC API for blocky to basically have plugins, I'm not necessarily against that ant think it makes more sense since you'll likely need calls both ways. |
In theory we could even let the user configure it by switching between Json & messagepack (for example). Both can be serialized from structs pretty easily and this would enable optional binary compression by sacrificing readability.🤔 |
To summarize my point of view: I like gRPC because of its clear structure and two way communication but find the idea tempting to let those logs stream through curl(especially for debugging purposes). 😅 |
Sending the logs & querry logs to a remote syslog target would be awsome, so i could ship them e.g. to my Grafana Loki instance and could have the logs aswell in a Grafana Dashboard. I am running blocky on a ARM host, so currently the Grafana Loki docker driver plugin is still not supported (only linux/amd64 ) to ship logs directly from a container to Grafana Loki :( |
So I tried playing a bit with telegraf and the csv in order to get telegraf to send the csv data into influx. And while I can get to push the data in Im struggling with actually being able to query on the data and get useful metrics out of it. Hopefully somewhere here who is a bit more familiar with telegraf can point in the right direction and others can also make use of it. Currently this is what I have:
|
Currently its possible to push metrics from blocky into grafana via Prometheus. And while this works fine, its currently not possible to also have the query log in grafana without making use of another database such as mysql, or postgres.
Edit:
It would be nice if there was a way to instead push the query log to influxdb, or alternatively, expose it somehow so it can be grabbed by telegraf and pushed into influx. It would also be nice if all metrics could be pushed into influxdb natively or via telegraf so there isnt a need to run prometheus and/or telegraf and/or grafana.The text was updated successfully, but these errors were encountered: