You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(A clear and concise description of what the bug is.)
I deployed mistral-7b v0.1 to a sagemaker endpoint with '763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.29.0-lmi11.0.0-cu124'; I custmized the handle function and return Output() object but it always gave me java.lang.IllegalStateException: Read chunk timeout. error.
Expected Behavior
(what's the expected behavior?)
I wanted to get my customized result (e.g. {predictions:{"positive":0.8}})
Error Message
(Paste the complete error message, including stack trace.)
`
[WARN ] InferenceRequestHandler - Chunk reading interrupted
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
My inference code (sorry i put many print statement for debugging): `
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from djl_python import Input, Output
def get_test_prompt(sample):
"""
get prompt for test data to generate template
"""
text_row = f""""You are given a chat transcript that between customer service and customer: {sample}.
Can you help analyze the customer's sentiment from the transcript? please only answer with Positive or Negative! Answer:"""
return text_row
probs = torch.softmax(next_word_logits, dim=-1)
top_probs, top_indices = torch.topk(probs, 2)
mp = {"pos":"positive", 'positive':'positive','posit':"positive", \
'neg':'negative', 'negative':'negative','negat':"negative"}
for i, (ix, p) in enumerate(zip(top_indices, top_probs)):
next_word = tokenizer.decode(ix)
next_word = next_word.split()
prob_dict = {}
for word, p in zip(next_word, p):
if word.lower() in mp:
prob_dict[word.lower()] = p.item()
#prob_dict: {'Pos': 0.6447348594665527, 'Neg': 0.1729944795370102}
prob_dict = {mp[k]:v for k, v in prob_dict.items()}
if "positive" in prob_dict and "negative" in prob_dict:
sentiment = ["positive", prob_dict["positive"]] if prob_dict["positive"] > prob_dict["negative"] \
else ["negative", prob_dict["negative"]]
elif "positive" in prob_dict:
sentiment = ["positive", prob_dict["positive"]]
elif "negative" in prob_dict:
sentiment = ["negative", prob_dict["negative"]]
else:
sentiment = ["no_sentiment", 0]
output = Output()
output.add_as_json([{'predictions':{sentiment[0].capitalize():f"{sentiment[1]:.2f}"}}])
print(f"out is: {output}")
return output
def input_fn(input_data: Input):
data = input_data.get_as_json()
text = data["feedbackEvent_customerFeedbackText"]
return text
def handle(inputs: Input) -> None:
global model, tokenizer
if model == None:
print(f"what we have: {inputs.get_properties()}")
properties = inputs.get_properties()
model_name = properties['model_id']
print(f"model name: {model_name}")
model, tokenizer = get_model(model_name)
if inputs.is_empty():
# Model server makes an empty call to warmup the model on startup
return None
print(f"we have inputs: {inputs} and {inputs.get_as_json()}")
text = input_fn(inputs)
print(f"we get text: {text}")
o = Output()
print(f"out is: {o}")
o = o.add(inputs.get_as_json(), key="data")
print(f"out2 is: {o}")
return o#Output().add(inputs.get_as_json(),key=)#inference(text)
and my serving.properties:engine=Python
option.model_id=s3://tattletale-multistacking-all-prod-execution/cs/mistral-7b-1
option.tensor_parallel_degree=1
option.max_rolling_batch_size=16
option.rolling_batch=auto
option.enable_streaming=true
option.entryPoint=inference.py`
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
I tried all the methods from internet
The text was updated successfully, but these errors were encountered:
Description
(A clear and concise description of what the bug is.)
I deployed mistral-7b v0.1 to a sagemaker endpoint with '763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.29.0-lmi11.0.0-cu124'; I custmized the handle function and return Output() object but it always gave me java.lang.IllegalStateException: Read chunk timeout. error.
Expected Behavior
(what's the expected behavior?)
I wanted to get my customized result (e.g. {predictions:{"positive":0.8}})
Error Message
(Paste the complete error message, including stack trace.)
`
[WARN ] InferenceRequestHandler - Chunk reading interrupted
java.lang.IllegalStateException: Read chunk timeout.
#011at ai.djl.inference.streaming.ChunkedBytesSupplier.next(ChunkedBytesSupplier.java:79) ~[api-0.29.0.jar:?]
#011at ai.djl.inference.streaming.ChunkedBytesSupplier.nextChunk(ChunkedBytesSupplier.java:93) ~[api-0.29.0.jar:?]
#011at ai.djl.serving.http.InferenceRequestHandler.sendOutput(InferenceRequestHandler.java:414) ~[serving-0.29.0.jar:?]
#011at ai.djl.serving.http.InferenceRequestHandler.lambda$runJob$5(InferenceRequestHandler.java:309) ~[serving-0.29.0.jar:?]
#011at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) [?:?]
#011at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) [?:?]
#011at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483) [?:?]
#011at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?]
#011at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?]
#011at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?]
#011at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?]
#011at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) [?:?]
`
How to Reproduce?
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
My inference code (sorry i put many print statement for debugging): `
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from djl_python import Input, Output
model = None
tokenizer = None
def get_model(model_name):
#model_name = properties['model_id']
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
return model, tokenizer
def get_test_prompt(sample):
"""
get prompt for test data to generate template
"""
text_row = f""""You are given a chat transcript that between customer service and customer: {sample}.
Can you help analyze the customer's sentiment from the transcript? please only answer with Positive or Negative! Answer:"""
return text_row
def inference(text):
text_data = get_test_prompt(text)
input_ids = tokenizer.encode(text_data, return_tensors='pt')
output = model.generate(input_ids, do_sample=True, top_k=2, max_new_tokens=1, pad_token_id = 11)
next_word_logits = model(input_ids).logits[:, -1, :]
def input_fn(input_data: Input):
data = input_data.get_as_json()
text = data["feedbackEvent_customerFeedbackText"]
return text
def handle(inputs: Input) -> None:
global model, tokenizer
and my serving.properties:
engine=Pythonoption.model_id=s3://tattletale-multistacking-all-prod-execution/cs/mistral-7b-1
option.tensor_parallel_degree=1
option.max_rolling_batch_size=16
option.rolling_batch=auto
option.enable_streaming=true
option.entryPoint=inference.py`
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
The text was updated successfully, but these errors were encountered: