add rx threashold to avoid infinite loop condition #58

AndrewCramp · 2024-06-04T19:22:36Z

Previously it was possible to be stuck in an infinite loop when using full duplex transmission. The read side previously required 1024 bytes before it would continue. This can lead to a situation where both devices are stuck in the read loop as more bytes are required to meet the exit condition. The tx side does not always send 1024 bytes, particularly if the tx-bytes option is used. This can also prevent uni-directional transfer tests from completing if the transmitter stops before the receiver.

The rx_bytes_threash, allows for the condition to be adjusted to avoid entering the infinite loop condition.

Previously it was possible to be stuck in an infinite loop when using full duplex transmission. The read side previously required 1024 bytes before it would continue. This can lead to a situation where both devices are stuck in the read loop as more bytes are required to meet the exit condition. The tx side does not always send 1024 bytes, particularly if the tx-bytes option is used. This can also prevent uni-directional transfer tests from completing if the transmitter stops before the receiver. The rx_bytes_threash, allows for the condition to be adjusted to avoid entering the infinite loop condition. Signed-off-by: Andrew Cramp <[email protected]>

andy-shev · 2024-06-06T13:52:30Z

Hmm... It doesn't sound right (*). What is the returned code by read() in this case? (It seems also we badly handle EINTR.)

*) read() returns 0 for EOF AFAIR (yes, https://www.man7.org/linux/man-pages/man2/read.2.html), and we have exit condition on that. Your second sentence is slightly incorrect, we require up to 1024 bytes.

AndrewCramp · 2024-06-06T14:54:26Z

Hmm... It doesn't sound right (*). What is the returned code by read() in this case? (It seems also we badly handle EINTR.)

*) read() returns 0 for EOF AFAIR (yes, https://www.man7.org/linux/man-pages/man2/read.2.html), and we have exit condition on that. Your second sentence is slightly incorrect, we require up to 1024 bytes.

When in this state the read() returns errno 11 (EAGAIN). The file is opened by the program in non blocking mode (O_NONBLOCK). When in non blocking mode read() returns EAGAIN when no data is available (https://www.kernel.org/doc/html/v4.8/media/uapi/v4l/func-read.html). So when trying to use this program in a full duplex test, if both devices enter the read loop they will be stuck, as neither side will be able transmit again until a sufficient count is reached to satisfy the while condition.

andy-shev · 2024-06-08T14:24:29Z

Hmm... It doesn't sound right (*). What is the returned code by read() in this case? (It seems also we badly handle EINTR.)
*) read() returns 0 for EOF AFAIR (yes, https://www.man7.org/linux/man-pages/man2/read.2.html), and we have exit condition on that. Your second sentence is slightly incorrect, we require up to 1024 bytes.

When in this state the read() returns errno 11 (EAGAIN). The file is opened by the program in non blocking mode (O_NONBLOCK). When in non blocking mode read() returns EAGAIN when no data is available (https://www.kernel.org/doc/html/v4.8/media/uapi/v4l/func-read.html).

How is this URL relevant?

So when trying to use this program in a full duplex test, if both devices enter the read loop they will be stuck, as neither side will be able transmit again until a sufficient count is reached to satisfy the while condition.

Then you need to timeout the whole thingy, or try to transmit in the same loop if there is anything to transmit.

With given threshold there is no guarantee at all that it won’t stuck in the same way.

AndrewCramp · 2024-06-10T21:44:57Z

Hmm... It doesn't sound right (*). What is the returned code by read() in this case? (It seems also we badly handle EINTR.)
*) read() returns 0 for EOF AFAIR (yes, https://www.man7.org/linux/man-pages/man2/read.2.html), and we have exit condition on that. Your second sentence is slightly incorrect, we require up to 1024 bytes.

When in this state the read() returns errno 11 (EAGAIN). The file is opened by the program in non blocking mode (O_NONBLOCK). When in non blocking mode read() returns EAGAIN when no data is available (https://www.kernel.org/doc/html/v4.8/media/uapi/v4l/func-read.html).

How is this URL relevant?

This URL is relevant to provide a reference for the read() EAGAIN error, and that when read is used in nonblocking mode it will return EAGAIN when no data is available.

So when trying to use this program in a full duplex test, if both devices enter the read loop they will be stuck, as neither side will be able transmit again until a sufficient count is reached to satisfy the while condition.

Then you need to timeout the whole thingy, or try to transmit in the same loop if there is anything to transmit.

With given threshold there is no guarantee at all that it won’t stuck in the same way.

I think a timeout would be not optimal, as any timeout would slow the test significantly. Transmitting in the same loop would be a significant change to the structure of the code. The while loop was only implemented in a recent commit. Using the rx_bytes_threash option you can maintain the previous behavior if you set rx_bytes_threash sufficiently low. Although it is possible to have a case where rx_bytes_threash is set to 0, and then it would not enter the loop, but this can be avoided if the condition is modified to be <=. This would also guarantee that you never get stuck in an infinite loop.

Looking at the commit that introduced this additional while condition (#53), I was wondering if there is a bit more clarification on it's purpose, as there is already a while condition (https://github.com/cbrake/linux-serial-test/blob/master/linux-serial-test.c#L861) that will run the read as long as _cl_no_rx is 0. Although I am unfamiliar with the short packages mention in the commit message.

andy-shev · 2024-06-11T15:48:29Z

Hmm... It doesn't sound right (*). What is the returned code by read() in this case? (It seems also we badly handle EINTR.)
*) read() returns 0 for EOF AFAIR (yes, https://www.man7.org/linux/man-pages/man2/read.2.html), and we have exit condition on that. Your second sentence is slightly incorrect, we require up to 1024 bytes.

When in this state the read() returns errno 11 (EAGAIN). The file is opened by the program in non blocking mode (O_NONBLOCK). When in non blocking mode read() returns EAGAIN when no data is available (https://www.kernel.org/doc/html/v4.8/media/uapi/v4l/func-read.html).

How is this URL relevant?

This URL is relevant to provide a reference for the read() EAGAIN error, and that when read is used in nonblocking mode it will return EAGAIN when no data is available.

On which function? I though we are not discussing some V4L2 special APIs...

So when trying to use this program in a full duplex test, if both devices enter the read loop they will be stuck, as neither side will be able transmit again until a sufficient count is reached to satisfy the while condition.

Then you need to timeout the whole thingy, or try to transmit in the same loop if there is anything to transmit.
With given threshold there is no guarantee at all that it won’t stuck in the same way.

I think a timeout would be not optimal, as any timeout would slow the test significantly. Transmitting in the same loop would be a significant change to the structure of the code. The while loop was only implemented in a recent commit. Using the rx_bytes_threash option you can maintain the previous behavior if you set rx_bytes_threash sufficiently low. Although it is possible to have a case where rx_bytes_threash is set to 0, and then it would not enter the loop, but this can be avoided if the condition is modified to be <=. This would also guarantee that you never get stuck in an infinite loop.

I still think that read() is not the best call on itself, we should really poll() / select() on the file descriptor(s) and only then read.

Looking at the commit that introduced this additional while condition (#53), I was wondering if there is a bit more clarification on it's purpose, as there is already a while condition (https://github.com/cbrake/linux-serial-test/blob/master/linux-serial-test.c#L861) that will run the read as long as _cl_no_rx is 0. Although I am unfamiliar with the short packages mention in the commit message.

AndrewCramp force-pushed the add_max_read_size branch from d869d31 to 17be975 Compare June 5, 2024 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add rx threashold to avoid infinite loop condition #58

add rx threashold to avoid infinite loop condition #58

AndrewCramp commented Jun 4, 2024

andy-shev commented Jun 6, 2024 •

edited

Loading

AndrewCramp commented Jun 6, 2024

andy-shev commented Jun 8, 2024

AndrewCramp commented Jun 10, 2024

andy-shev commented Jun 11, 2024

add rx threashold to avoid infinite loop condition #58

Are you sure you want to change the base?

add rx threashold to avoid infinite loop condition #58

Conversation

AndrewCramp commented Jun 4, 2024

andy-shev commented Jun 6, 2024 • edited Loading

AndrewCramp commented Jun 6, 2024

andy-shev commented Jun 8, 2024

AndrewCramp commented Jun 10, 2024

andy-shev commented Jun 11, 2024

andy-shev commented Jun 6, 2024 •

edited

Loading