-
Notifications
You must be signed in to change notification settings - Fork 43
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update PyFunc and support publishing log from PyFunc (#489)
<!-- Thanks for sending a pull request! Here are some tips for you: 1. Run unit tests and ensure that they are passing 2. If your change introduces any API changes, make sure to update the e2e tests 3. Make sure documentation is updated for your PR! --> **What this PR does / why we need it**: <!-- Explain here the context and why you're making the change. What is the problem you're trying to solve. ---> To onboard to model observability, we need to gather features and prediction value of the model, per current condition in PyFunc model we can't get all the features needed for the model and the prediction that generated by the model, since the input and output of the PyFunc model is not necessary features and prediction value. This PR try to solve that by adding new PyFunc model that can identify which one is features and prediction value. Once the data is identified, then it will be published to kafka for later processing Modification: * `python/sdk/merlin/pyfunc.py` * Adding new `PyFuncV3Model` to differentiate features, and prediction value * Introducing `PyFuncOutput` as the single output for realtime Pyfunc (`PyFuncModel` and `PyFuncV3Model) * `python/pyfunc-server/pyfuncserver/config.py` - Adding configuration for kafka publishing and sampling ratio * `python/pyfunc-server/pyfuncserver/protocol/rest/handler.py` - Add async publishing after get pyfunc model output * `python/pyfunc-server/pyfuncserver/protocol/rest/server.py` * Create subclass of tornado web application that will hold kafka producer instance * `python/pyfunc-server/pyfuncserver/publisher/publisher.py` - Adding asyncio publisher code * `python/pyfunc-server/pyfuncserver/sampler/sampler.py` - Random sampling method base on the given ration * `python/pyfunc-server/pyfuncserver/publisher/kafka.py` - Kafka producer implementation given PyFuncOutput **Which issue(s) this PR fixes**: <!-- *Automatically closes linked issue when PR is merged. Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> Fixes # **Does this PR introduce a user-facing change?**: <!-- If no, just write "NONE" in the release-note block below. If yes, a release note is required. Enter your extended release note in the block below. If the PR requires additional action from users switching to the new release, include the string "action required". For more information about release notes, see kubernetes' guide here: http://git.k8s.io/community/contributors/guide/release-notes.md --> ```release-note ``` **Checklist** - [x] Added unit test, integration, and/or e2e tests - [x] Tested locally - [ ] Updated documentation - [ ] Update Swagger spec if the PR introduce API changes - [ ] Regenerated Golang and Python client if the PR introduce API changes
- Loading branch information
1 parent
95a6183
commit 9cc5432
Showing
29 changed files
with
904 additions
and
110 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
import uuid | ||
|
||
from pyfuncserver.config import Publisher as PublisherConfig, ModelManifest | ||
from pyfuncserver.utils.converter import build_prediction_log | ||
|
||
from confluent_kafka import Producer | ||
from merlin.pyfunc import PyFuncOutput | ||
|
||
|
||
class KafkaProducer(Producer): | ||
def __init__(self, publisher_config: PublisherConfig, model_manifest: ModelManifest) -> None: | ||
conf = { | ||
"bootstrap.servers": publisher_config.kafka.brokers, | ||
"acks": publisher_config.kafka.acks, | ||
"linger.ms": publisher_config.kafka.linger_ms | ||
} | ||
conf.update(publisher_config.kafka.configuration) | ||
self.producer = Producer(**conf) | ||
self.topic = publisher_config.kafka.topic | ||
self.model_manifest = model_manifest | ||
|
||
def produce(self, data: PyFuncOutput): | ||
prediction_log = build_prediction_log(pyfunc_output=data, model_manifest=self.model_manifest) | ||
serialized_data = prediction_log.SerializeToString() | ||
self.producer.produce(topic=self.topic, value=serialized_data) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
from pyfuncserver.sampler.sampler import Sampler | ||
from merlin.pyfunc import PyFuncOutput | ||
from abc import ABC, abstractmethod | ||
import asyncio | ||
|
||
class Producer(ABC): | ||
|
||
@abstractmethod | ||
def produce(self, data: PyFuncOutput): | ||
pass | ||
|
||
class Publisher: | ||
def __init__(self, producer: Producer, sampler: Sampler) -> None: | ||
self.producer = producer | ||
self.sampler = sampler | ||
|
||
async def publish(self, output: PyFuncOutput): | ||
if not self.sampler.should_sample(): | ||
return | ||
|
||
self.producer.produce(output) |
Empty file.
Oops, something went wrong.