Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rx threashold to avoid infinite loop condition #58

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AndrewCramp
Copy link

Previously it was possible to be stuck in an infinite loop when using full duplex transmission. The read side previously required 1024 bytes before it would continue. This can lead to a situation where both devices are stuck in the read loop as more bytes are required to meet the exit condition. The tx side does not always send 1024 bytes, particularly if the tx-bytes option is used. This can also prevent uni-directional transfer tests from completing if the transmitter stops before the receiver.

The rx_bytes_threash, allows for the condition to be adjusted to avoid entering the infinite loop condition.

Previously it was possible to be stuck in an infinite loop when using full duplex transmission.
The read side previously required 1024 bytes before it would continue. This can lead to a
situation where both devices are stuck in the read loop as more bytes are required to meet the
exit condition. The tx side does not always send 1024 bytes, particularly if the tx-bytes option
is used. This can also prevent uni-directional transfer tests from completing if the transmitter
stops before the receiver.

The rx_bytes_threash, allows for the condition to be adjusted to avoid entering the infinite
loop condition.

Signed-off-by: Andrew Cramp <[email protected]>
@andy-shev
Copy link
Contributor

andy-shev commented Jun 6, 2024

Hmm... It doesn't sound right (*). What is the returned code by read() in this case? (It seems also we badly handle EINTR.)

*) read() returns 0 for EOF AFAIR (yes, https://www.man7.org/linux/man-pages/man2/read.2.html), and we have exit condition on that. Your second sentence is slightly incorrect, we require up to 1024 bytes.

@AndrewCramp
Copy link
Author

Hmm... It doesn't sound right (*). What is the returned code by read() in this case? (It seems also we badly handle EINTR.)

*) read() returns 0 for EOF AFAIR (yes, https://www.man7.org/linux/man-pages/man2/read.2.html), and we have exit condition on that. Your second sentence is slightly incorrect, we require up to 1024 bytes.

When in this state the read() returns errno 11 (EAGAIN). The file is opened by the program in non blocking mode (O_NONBLOCK). When in non blocking mode read() returns EAGAIN when no data is available (https://www.kernel.org/doc/html/v4.8/media/uapi/v4l/func-read.html). So when trying to use this program in a full duplex test, if both devices enter the read loop they will be stuck, as neither side will be able transmit again until a sufficient count is reached to satisfy the while condition.

@andy-shev
Copy link
Contributor

Hmm... It doesn't sound right (*). What is the returned code by read() in this case? (It seems also we badly handle EINTR.)
*) read() returns 0 for EOF AFAIR (yes, https://www.man7.org/linux/man-pages/man2/read.2.html), and we have exit condition on that. Your second sentence is slightly incorrect, we require up to 1024 bytes.

When in this state the read() returns errno 11 (EAGAIN). The file is opened by the program in non blocking mode (O_NONBLOCK). When in non blocking mode read() returns EAGAIN when no data is available (https://www.kernel.org/doc/html/v4.8/media/uapi/v4l/func-read.html).

How is this URL relevant?

So when trying to use this program in a full duplex test, if both devices enter the read loop they will be stuck, as neither side will be able transmit again until a sufficient count is reached to satisfy the while condition.

Then you need to timeout the whole thingy, or try to transmit in the same loop if there is anything to transmit.

With given threshold there is no guarantee at all that it won’t stuck in the same way.

@AndrewCramp
Copy link
Author

Hmm... It doesn't sound right (*). What is the returned code by read() in this case? (It seems also we badly handle EINTR.)
*) read() returns 0 for EOF AFAIR (yes, https://www.man7.org/linux/man-pages/man2/read.2.html), and we have exit condition on that. Your second sentence is slightly incorrect, we require up to 1024 bytes.

When in this state the read() returns errno 11 (EAGAIN). The file is opened by the program in non blocking mode (O_NONBLOCK). When in non blocking mode read() returns EAGAIN when no data is available (https://www.kernel.org/doc/html/v4.8/media/uapi/v4l/func-read.html).

How is this URL relevant?

This URL is relevant to provide a reference for the read() EAGAIN error, and that when read is used in nonblocking mode it will return EAGAIN when no data is available.

image

So when trying to use this program in a full duplex test, if both devices enter the read loop they will be stuck, as neither side will be able transmit again until a sufficient count is reached to satisfy the while condition.

Then you need to timeout the whole thingy, or try to transmit in the same loop if there is anything to transmit.

With given threshold there is no guarantee at all that it won’t stuck in the same way.

I think a timeout would be not optimal, as any timeout would slow the test significantly. Transmitting in the same loop would be a significant change to the structure of the code. The while loop was only implemented in a recent commit. Using the rx_bytes_threash option you can maintain the previous behavior if you set rx_bytes_threash sufficiently low. Although it is possible to have a case where rx_bytes_threash is set to 0, and then it would not enter the loop, but this can be avoided if the condition is modified to be <=. This would also guarantee that you never get stuck in an infinite loop.

Looking at the commit that introduced this additional while condition (#53), I was wondering if there is a bit more clarification on it's purpose, as there is already a while condition (https://github.com/cbrake/linux-serial-test/blob/master/linux-serial-test.c#L861) that will run the read as long as _cl_no_rx is 0. Although I am unfamiliar with the short packages mention in the commit message.

@andy-shev
Copy link
Contributor

Hmm... It doesn't sound right (*). What is the returned code by read() in this case? (It seems also we badly handle EINTR.)
*) read() returns 0 for EOF AFAIR (yes, https://www.man7.org/linux/man-pages/man2/read.2.html), and we have exit condition on that. Your second sentence is slightly incorrect, we require up to 1024 bytes.

When in this state the read() returns errno 11 (EAGAIN). The file is opened by the program in non blocking mode (O_NONBLOCK). When in non blocking mode read() returns EAGAIN when no data is available (https://www.kernel.org/doc/html/v4.8/media/uapi/v4l/func-read.html).

How is this URL relevant?

This URL is relevant to provide a reference for the read() EAGAIN error, and that when read is used in nonblocking mode it will return EAGAIN when no data is available.

On which function? I though we are not discussing some V4L2 special APIs...

So when trying to use this program in a full duplex test, if both devices enter the read loop they will be stuck, as neither side will be able transmit again until a sufficient count is reached to satisfy the while condition.

Then you need to timeout the whole thingy, or try to transmit in the same loop if there is anything to transmit.
With given threshold there is no guarantee at all that it won’t stuck in the same way.

I think a timeout would be not optimal, as any timeout would slow the test significantly. Transmitting in the same loop would be a significant change to the structure of the code. The while loop was only implemented in a recent commit. Using the rx_bytes_threash option you can maintain the previous behavior if you set rx_bytes_threash sufficiently low. Although it is possible to have a case where rx_bytes_threash is set to 0, and then it would not enter the loop, but this can be avoided if the condition is modified to be <=. This would also guarantee that you never get stuck in an infinite loop.

I still think that read() is not the best call on itself, we should really poll() / select() on the file descriptor(s) and only then read.

Looking at the commit that introduced this additional while condition (#53), I was wondering if there is a bit more clarification on it's purpose, as there is already a while condition (https://github.com/cbrake/linux-serial-test/blob/master/linux-serial-test.c#L861) that will run the read as long as _cl_no_rx is 0. Although I am unfamiliar with the short packages mention in the commit message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants