Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exchange failed: unexpected EOF (AX.25 on Linux) #191

Closed
hwdornbush opened this issue Apr 4, 2020 · 26 comments
Closed

Exchange failed: unexpected EOF (AX.25 on Linux) #191

hwdornbush opened this issue Apr 4, 2020 · 26 comments

Comments

@hwdornbush
Copy link

I am running PAT v0.9.0 on Raspberry Pi Buster with a TT4 TNC and a Yaesu FT1 radio with an external J-Pole antenna.
I am connecting via ax.25 packet to a gateway that is running PiGate RMS which is running BPQ and I checked that it is running the latest release of BPQ.

I am trying to get INQUIRY to work. I can send the INQUIRY fine with Telnet and get the response message.
When I use the radio, after the connection is made, and the dialog starts the receiving of the response, I get
Exchange failed: unexpected EOF

The syslog file shows:
Apr 3 10:30:45 PiPAT pat[446]: 2020/04/03 10:30:45 Connecting to W6SON-10 (ax25)...
Apr 3 10:30:46 PiPAT pat[446]: 2020/04/03 10:30:46 Connected to W6SON-10 (AX.25)
Apr 3 10:30:47 PiPAT pat[446]: Trying ec2-52-1-178-80.compute-1.amazonaws.com
Apr 3 10:30:49 PiPAT pat[446]: *** AA6BD Connected to CMS
Apr 3 10:30:49 PiPAT pat[446]: [WL2K-5.0-B2FWIHJM$]
Apr 3 10:30:49 PiPAT pat[446]: ;PQ: 80062540
Apr 3 10:30:50 PiPAT pat[446]: CMS via W6SON >
Apr 3 10:30:50 PiPAT pat[446]: >FF
Apr 3 10:30:54 PiPAT pat[446]: ;PM: AA6BD WHQ56V8JN1LZ 2332 [email protected] INQUIRY - ftp://tgftp.nws.noaa.gov/data/raw/fp/fpus66.kmtr.sft.mtr.txt
Apr 3 10:30:56 PiPAT pat[446]: FC EM WHQ56V8JN1LZ 9644 2332 0
Apr 3 10:30:56 PiPAT pat[446]: F> 0F
Apr 3 10:30:56 PiPAT pat[446]: 1 proposal(s) received
Apr 3 10:30:56 PiPAT pat[446]: Accepting WHQ56V8JN1LZ
Apr 3 10:31:01 PiPAT pat[446]: Receiving [INQUIRY - ftp://tgftp.nws.noaa.gov/data/raw/fp/fpus66.kmtr.sft.mtr.txt] [offset 0]
Apr 3 10:31:01 PiPAT CRON[1486]: (pi) CMD (/var/www/html/movetopat)
Apr 3 10:32:01 PiPAT CRON[1500]: (pi) CMD (/var/www/html/movetopat)
Apr 3 10:32:29 PiPAT pat[446]: #015INQUIRY - ftp://tgftp.nws.noaa.gov/data/raw/fp/fpus66.kmtr.sft.mtr.txt: 14%#015INQUIRY - ftp://tgftp.nws.noaa.gov/data/raw/fp/fpus66.kmtr.sft.mtr.txt: 14%#015INQUIRY - ftp://tgftp.nws.noaa.gov/data/raw/fp/fpus66.kmtr.sft.mtr.txt: 31%#015INQUIRY - ftp://tgftp.nws.noaa.gov/data/raw/fp/fpus66.kmtr.sft.mtr.txt: 39%#015INQUIRY - ftp://tgftp.nws.noaa.gov/data/raw/fp/fpus66.kmtr.sft.mtr.txt: 48%#015INQUIRY - ftp://tgftp.nws.noaa.gov/data/raw/fp/fpus66.kmtr.sft.mtr.txt: 56%#015INQUIRY - ftp://tgftp.nws.noaa.gov/data/raw/fp/fpus66.kmtr.sft.mtr.txt: 57%#015INQUIRY - ftp://tgftp.nws.noaa.gov/data/raw/fp/fpus66.kmtr.sft.mtr.txt: 65%
Apr 3 10:32:29 PiPAT pat[446]: 2020/04/03 10:32:29 Exchange failed: unexpected EOF

After that, If I try to connect again, I get

Connecting to W6SON-10 (ax25)...
Unable to establish connection to remote: connection timed out

and I notice that the transmit light on the radio never comes on.
If I try again, I get the same result. The only way to get it going again seems to be to reboot.

I also found that after I reboot to restore normal operation, if I send a small email during the same session where I receive the response to the Inquiry, then both the sending and receiving work successfully. I have been able to repeat this behavior.

@martinhpedersen martinhpedersen changed the title Exchange failed: unexpected EOF Exchange failed: unexpected EOF (AX.25 on Linux) Apr 11, 2020
@martinhpedersen
Copy link
Member

Hi there!

I've been monitoring the thread regarding this on pat-users, but have had limited time to respond unfortunately.

To me, it sounds like you're experiencing some issues with the AX.25 code in the Linux kernel. I've seen similar issues on older versions of the kernel. Could you please compile a list of kernel versions affected by this issue?

The error "Unexpected EOF" simply means that the connection was terminated prematurely. The issue is therefore out of Pat's control, as the link is maintained (and terminated) by the AX.25 stack which is part of the kernel.

@dranch
Copy link

dranch commented Apr 11, 2020

Can you monitor the AX.25 using Linux's "listen" program and see what's happening on the AX.25 stack side? Is there anything in the LInux system logs?

@hwdornbush
Copy link
Author

hwdornbush commented Apr 12, 2020 via email

@hwdornbush
Copy link
Author

hwdornbush commented Apr 12, 2020 via email

@dranch
Copy link

dranch commented Apr 13, 2020

I did a little experimentation on this Winlink INQUIRY feature and it creates a MASSIVE textfile. Being compressed will help but it was still big. Anyway, looking at your listen output, you got into a loop between AA6BD and AA6BD-10 and ultimately AA6BD waived the white flag and quit. I've seen this happen when the remote system is a pretty tough copy and the systems just can't work each other reliably. How far / close apart are they?

@hwdornbush
Copy link
Author

hwdornbush commented Apr 13, 2020 via email

@hwdornbush
Copy link
Author

hwdornbush commented Apr 13, 2020 via email

@dranch
Copy link

dranch commented Apr 13, 2020

Hmmm.. weird! There is something to be said of being too close but obviously something bad is happening here. There are some known bugs with newer kernels until around 4.1.21 I think it was. Running older kernels really isn't an option for Raspberry Pis though.

@dranch
Copy link

dranch commented Apr 13, 2020

Which system needs the reboot? The Pat/Linux AX.25 station or the PiGate/BPQ system? If on the Linux system in a bad state, run the command:

netstat -A ax25 -an

It should look like this when there are no connections:

netstat -A ax25 -an
Active AX.25 sockets
Dest Source Device State Vr/Vs Send-Q Recv-Q

  •      KI6ZHD-3   ax0     LISTENING    000/000  0       0
    
  •      KI6ZHD-2   ax0     LISTENING    000/000  0       0
    
  •      KI6ZHD-1   ax0     LISTENING    000/000  0       0
    
  •      KI6ZHD-0   ax0     LISTENING    000/000  0       0
    
  •      KI6ZHD-7   ax0     LISTENING    000/000  0       0
    

This is my station running Linpac and UroNode. If you have an unexpected session on your machine, this is probably a symptom of a buggy kernel. We've been trying to get people to help fix this bug for a while.

@hwdornbush
Copy link
Author

hwdornbush commented Apr 13, 2020 via email

@dranch
Copy link

dranch commented Apr 13, 2020

When you run "netstat -A ax25 -an", that shows you all the "servers" that remote systems could connect to on your system. Since you don't show any, that's fine as your system is only a client. What will be helpful is to run this command after you run your INQUIRY test and your client is broken. I suspect a leftover session will be shown.

Btw, your point about "send an email energies the receive" probably has to do with the CRC algorithm in the Linux kernel's kiss module. This setting can be changed a few ways but I'd recommend in your start up script where you run kissattach, add another line that says "kissparms -c 1 <your ax25 interface name in /etc/ax25/axports>".

@hwdornbush
Copy link
Author

hwdornbush commented Apr 13, 2020 via email

@dranch
Copy link

dranch commented Apr 13, 2020

It's all here:

https://github.com/la5nta/pat/wiki/AX25-Linux

and it depends on how you installed Pat. Since you're not familiar with things starting up, I have to assume things are starting up the Systemd way. If so, check for /usr/share/pat/ax25/install-systemd-ax25-unit.bash and possibly add the command in there.

@hwdornbush
Copy link
Author

hwdornbush commented Apr 13, 2020 via email

@hwdornbush
Copy link
Author

hwdornbush commented Apr 13, 2020 via email

@martinhpedersen
Copy link
Member

Sure! You can run kissattach manually after stopping the systemd service.

@martinhpedersen
Copy link
Member

@hwdornbush Did you resolve this issue?

@hwdornbush
Copy link
Author

hwdornbush commented May 27, 2020 via email

@dranch
Copy link

dranch commented May 27, 2020

If you using the Linux AX.25 stack, the next question is which Linux kernel version are you using? There are known AX.25 kernel bugs that are being tracked but there isn't an ETA for fixes

@martinhpedersen
Copy link
Member

I am not 100% sure, but I think LinBPQ has it's own AX.25 implementation. Maybe someone could verify?

As @dranch mentions, you may be experiencing a bug in the kernel. I've seen some kernels behave in a similar manner. It's hard to tell without being able to reproduce the issue :/

Have you tried different versions of the kernel?

@dranch
Copy link

dranch commented May 27, 2020

LinBPQ, JNOS, and other NOSes have their own AX.25 stack and I understand they are fine if run standalone. If I remember correctly kernel versions equal or older than 4.1.17 are OK for AX.25 but versions after this have various issues (blocking AX.25 sessions after disconnect, one specific NetRom issue, a Rose issue, 6pack spattach issue, etc.)

For those users who can compile and install their own kernel, there are 2-3 kernel patches that evidently resolve most of these issues and have have good reports on a Linux 5.5.4 kernel. I believe some of these patches could be requested from the likes of ftp://ftp.n1uro.com/pub/linux/ax25-patch.tgz , yo2loj, KE6I, etc. There have been attempts to get these fixes upstreamed but they keep getting rejected for whatever reasons. This is ironic since the patches that caused the issues receive no scrutiny and were just merge in.

@martinhpedersen
Copy link
Member

Thank you @dranch! I appreciate the help 👍

@hwdornbush
Copy link
Author

hwdornbush commented May 27, 2020 via email

@martinhpedersen
Copy link
Member

Hmm.. sounds to me like you're running one of the affected kernels then. (@dranch?)

I think you might be able to install older kernel versions through apt on Raspbian, but I am unsure. I am fairly certain that I have had some success on Debian Stretch (kernel 4.9).

Kernel bugs are unfortunately out of our control.


PS: I will probably prioritize to implement support for the AGW interface in Direwolf (See la5nta/wl2k-go#57) soon. That would allow us to utilize Direwolf's AX.25 internal implementation on all platforms. Won't help with hardware TNCs though :/

@dranch
Copy link

dranch commented May 27, 2020

As I understand it, ANY kernels newer than 4.1.17 will have bugs. Downgrading kernels is sometimes possible but if you're using say a Raspberry Pi 4, you MUST be running a 4.19.x or newer kernel to get the required driver support (see https://en.wikipedia.org/wiki/Raspbian ). The other major downside is that if you run older versions, you're opening up your system to potential security risks which is never a good thing.

Compiling a kernel isn't too difficult and you can read about it here:
https://www.raspberrypi.org/documentation/linux/kernel/

Martin: Providing an AGW interface is a great idea and will simplify user's setups for those who don't need the full power of the Linux in-kernel AX.25 approach.

@martinhpedersen
Copy link
Member

Thanks for helping out @dranch.

I'm closing this one now. I find it is likely that this is due to some kernel bug rather than a bug in Pat or the (very thin) wrapper around the kernel's ax25 socket routines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants