fix: Fix various tests caused by cases byte size validation not handled properly #364

rmccorm4 · 2024-06-04T03:02:31Z

What does the PR do?

Recent byte size validation checks were failing the L0_http HttpTest.test_byte unit test which sends a raw binary tensor.
Recent input byte size validation checks are segfaulting for the CUDA Shared Memory case because it assumes it is in CPU memory and dereferences an invalid pointer.
Currently, L0_trt_shape_tensor and L0_trt_reformat_free tests are failing because TensorRT infer requests should not check byte size.

The raw binary tensor in the failing test case consists of two memory chunks by the time it gets validated:

The size of the bytes element
The bytes element

This PR makes some generic refactoring to try to handle any series of buffers, where each element is not guaranteed to fit within a single buffer, and each buffer is not guaranteed to contain a single element. Also it skips byte size validation if input memory type is GPU or input platform is TensorRT.

Caveats:

GPU buffers are currently skipped, as they may need some special handling to be dereferenced/checked compared to CPU buffers in the existing code. This will likely cause the element count checks to fail as well.
TensorRT inputs are currently skipped, as format-free IO tensors require communication with the backend to determine the byte size.
A check is made that the 4-byte byte_size indicator for a given element is contained within a single buffer, and is not split across buffers. If a byte_size is split across buffers, then an error is returned. This was just easier to implement, open to suggestions if anyone has a slick solution. Technically the buffer APIs should support doing this, but I assume it is unlikely to be done. Example of this currently unsupported case:
- buffer1=[<size1><element1><size2_partial>]
- buffer2=[<size2_partial><element2>...]
- <size2_partial> is split across buffers, and is currently rejected.

Open Items

Verify test plan passes in CI
Decide what to do about GPU tensors - probably need to handle them
Decide what to do about TensorRT, e.g. shape tensors and reformat-free I/O tensors.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

triton-inference-server/server#7326

Where should the reviewer start?

N/A

Test plan:

L0_http
L0_input_validation
L0_infer_cudashm
L0_sequence_batcher_cudashm
L0_backend_output_detail
L0_request_cancellation
L0_trt_reformat_free
L0_trt_shape_tensors
Nightly pipeline

CI Pipeline ID: 15635924

Background

None

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

None

…e split across buffers

…U buffers

src/infer_request.cc

GuanLuo · 2024-06-04T18:17:30Z

src/infer_request.cc

+  int64_t buffer_memory_id;
+
+  // Validate elements until all buffers have been fully processed.
+  while (remaining_buffer_size || buffer_idx < buffer_count) {


Why not like below to keep variables in scope

for (buf : buffers) { buffer_size = ...; for (checked_size = 0; checked_size < buffer_size) { .. } }

rmccorm4 · 2024-06-05T01:40:49Z

Re-assigning to @yinggeh as this involves a lot of refactoring to the original validation logic feature he wrote and it seems like he has some ideas for further refactoring.

src/infer_request.cc

GuanLuo

See new comments in opened discussion.

GuanLuo · 2024-06-06T18:22:03Z

Please update the description with the latest information and pipeline ID, I need to see the test status

yinggeh · 2024-06-06T21:04:16Z

Please update the description with the latest information and pipeline ID, I need to see the test status

I believe I have updated with the latest info and pipeline ID. Is there anything particular you think is missing?

rmccorm4

I can't approve this PR because I originally opened it, but I am verbally approving this if all the affected CI tests in the test plan description are passing. I'd like to get these fixes in ASAP to fix the broken tests in CI and reduce cherry-picks.

I added a follow-up ticket DLIS-6833 to address some of the logical refactoring brought up by Guan after code freeze.

yinggeh · 2024-06-06T23:26:29Z

Please also review related server PR github.com/triton-inference-server/server/pull/7326

…ed properly (#364)

Fix byte size handling for raw binary requests where size and data ar…

2f4bfae

…e split across buffers

rmccorm4 requested review from GuanLuo and yinggeh June 4, 2024 03:12

Be more explicit about unhandled case and just skip validation for GP…

0a79085

…U buffers

yinggeh reviewed Jun 4, 2024

View reviewed changes

src/infer_request.cc Outdated Show resolved Hide resolved

src/infer_request.cc Outdated Show resolved Hide resolved

src/infer_request.cc Outdated Show resolved Hide resolved

yinggeh reviewed Jun 4, 2024

View reviewed changes

src/infer_request.cc Outdated Show resolved Hide resolved

rmccorm4 mentioned this pull request Jun 4, 2024

test: Update byte size validation test to match core logic triton-inference-server/server#7322

Closed

10 tasks

GuanLuo reviewed Jun 4, 2024

View reviewed changes

Remove warning log, don't return early

7022001

rmccorm4 assigned yinggeh Jun 5, 2024

Update input calls

ef9daa2

yinggeh marked this pull request as draft June 5, 2024 07:54

update function ValidateBytesInputs

6a171fc

yinggeh requested review from tanmayv25, GuanLuo and yinggeh June 5, 2024 15:23

skip input byte-size checks for TensorRT

e8656fa

yinggeh changed the title ~~fix: Fix byte size validation for string tensors when size and data are split across buffers~~ fix: Fix various failed tested caused by cases not handled by byte size validation Jun 5, 2024

yinggeh changed the title ~~fix: Fix various failed tested caused by cases not handled by byte size validation~~ fix: Fix various failed tests caused by cases not handled by byte size validation Jun 5, 2024

yinggeh marked this pull request as ready for review June 5, 2024 17:04

yinggeh added the bug Something isn't working label Jun 5, 2024

rmccorm4 commented Jun 5, 2024

View reviewed changes

src/infer_request.cc Outdated Show resolved Hide resolved

Rename byte_size_valid to byte_size_invalid

a40429b

GuanLuo reviewed Jun 5, 2024

View reviewed changes

src/infer_request.cc Show resolved Hide resolved

src/infer_request.cc Outdated Show resolved Hide resolved

src/infer_request.cc Outdated Show resolved Hide resolved

yinggeh mentioned this pull request Jun 5, 2024

test: Update error messages to comply with core change triton-inference-server/server#7326

Merged

20 tasks

yinggeh changed the title ~~fix: Fix various failed tests caused by cases not handled by byte size validation~~ fix: Fix various tests caused by cases not handled by byte size validation Jun 5, 2024

yinggeh changed the title ~~fix: Fix various tests caused by cases not handled by byte size validation~~ fix: Fix various tests caused by cases not handled by byte size validation properly Jun 5, 2024

yinggeh changed the title ~~fix: Fix various tests caused by cases not handled by byte size validation properly~~ fix: Fix various tests caused by cases byte size validation not handled properly Jun 5, 2024

yinggeh requested a review from GuanLuo June 6, 2024 11:45

GuanLuo reviewed Jun 6, 2024

View reviewed changes

yinggeh added 2 commits June 6, 2024 11:51

Minor updates

6037ab8

Update header

272ef29

yinggeh requested a review from GuanLuo June 6, 2024 20:45

Remove redundant checks

0e3b63b

rmccorm4 commented Jun 6, 2024

View reviewed changes

GuanLuo approved these changes Jun 6, 2024

View reviewed changes

yinggeh merged commit 7fbe13d into main Jun 7, 2024
1 check passed

indrajit96 pushed a commit that referenced this pull request Jun 11, 2024

fix: Fix various tests caused by cases byte size validation not handl…

48029ca

…ed properly (#364)

This was referenced Jun 25, 2024

test: Add input byte size tests using C APIs triton-inference-server/server#7372

Merged

test: Add input byte size tests using C APIs #374

Merged

pskiran1 mentioned this pull request Jul 14, 2024

ci: Fix shape and reformat free tensor handling in the input byte size check triton-inference-server/server#7444

Merged

20 tasks

yinggeh deleted the rmccormick-L0_http-fix branch August 29, 2024 22:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix various tests caused by cases byte size validation not handled properly #364

fix: Fix various tests caused by cases byte size validation not handled properly #364

rmccorm4 commented Jun 4, 2024 •

edited by yinggeh

Loading

GuanLuo Jun 4, 2024

rmccorm4 commented Jun 5, 2024

GuanLuo left a comment

GuanLuo commented Jun 6, 2024

yinggeh commented Jun 6, 2024

rmccorm4 left a comment •

edited

Loading

yinggeh commented Jun 6, 2024

fix: Fix various tests caused by cases byte size validation not handled properly #364

fix: Fix various tests caused by cases byte size validation not handled properly #364

Conversation

rmccorm4 commented Jun 4, 2024 • edited by yinggeh Loading

What does the PR do?

Caveats:

Open Items

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

GuanLuo Jun 4, 2024

Choose a reason for hiding this comment

rmccorm4 commented Jun 5, 2024

GuanLuo left a comment

Choose a reason for hiding this comment

GuanLuo commented Jun 6, 2024

yinggeh commented Jun 6, 2024

rmccorm4 left a comment • edited Loading

Choose a reason for hiding this comment

yinggeh commented Jun 6, 2024

rmccorm4 commented Jun 4, 2024 •

edited by yinggeh

Loading

rmccorm4 left a comment •

edited

Loading