Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xmpp4r throws the REXML::ParseException on Cyrillic characters #3

Open
dustalov opened this issue Jun 16, 2009 · 22 comments
Open

xmpp4r throws the REXML::ParseException on Cyrillic characters #3

dustalov opened this issue Jun 16, 2009 · 22 comments

Comments

@dustalov
Copy link

When simplest Jabber client (like at http://dumpz.org/9810/) receives the UTF-8 string which contains Cyrillic (e.g. Russian) characters, xmpp4r fails with REXML::ParseException and won't work anymore.

Backtrace here: http://dumpz.org/9806/

Thanks.

@lnussbaum
Copy link
Owner

The dumpz links don't work anymore

@lnussbaum
Copy link
Owner

erm, could you try to write a simple testcase for that bug, based on one of the existing testcases? According to the exception, REXML receives ASCII 8bit characters, not UTF-8 characters.

@kidoz
Copy link

kidoz commented Feb 15, 2010

Same situation. http://dumpz.org/17011/
ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux]
xmpp4r-0.5
Tested for [email protected]

@lnussbaum
Copy link
Owner

Yes. I would need a test case.

@vicpo
Copy link

vicpo commented May 23, 2010

http://megabytov.net/293 - test case.

@dotdoom
Copy link

dotdoom commented Jan 12, 2011

that could be a rexml problem. Since current version of ejabberd opens an xml stream with xml header missing encoding attribute, rexml fails to retrieve encoding and baseparser sets source.encoding to nil, which falls back to ASCII8BIT. That's wrong behaviour according to http://www.opentag.com/xfaq_enc.htm and I'm also unsure about force_utf8 detection algorithm.
An immediate fix for me was to patch rexml's:

Although nothing related seemed to break for now, I only used it in the development environment, not production. Regards

@ajsharp
Copy link

ajsharp commented Aug 5, 2011

I'm having this same issue. Is there any support for swapping out the xml parsing backend for something like nokogiri?

@dotdoom
Copy link

dotdoom commented Aug 5, 2011

I don't know. The last thing I came up with was

if RUBY_VERSION < "1.9"
# ...
else
    # Encoding patch
    require 'socket'
    class TCPSocket
        def external_encoding
            Encoding::BINARY
        end
    end

    require 'rexml/source'
    class REXML::IOSource
        alias_method :encoding_assign, :encoding=
        def encoding=(value)
            encoding_assign(value) if value
        end
    end

    begin
        # OpenSSL is optional and can be missing
        require 'openssl'
        class OpenSSL::SSL::SSLSocket
            def external_encoding
                Encoding::BINARY
            end
        end
    rescue
    end
end

monkey patch. But then more and more problems appeared (not affiliated with encoding but nvl) so I decided to rewrite the whole thing

@ajsharp
Copy link

ajsharp commented Aug 5, 2011

Which part did you rewrite? Is it open source?

@dotdoom
Copy link

dotdoom commented Aug 5, 2011

I've just recently started and have not much to commit yet. Will create the repository later.

@bsl
Copy link

bsl commented Sep 17, 2011

Hi, this seems related. Is there some workaround?

#<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)>
/usr/lib/ruby/1.9.1/rexml/source.rb:212:in 'match'
/usr/lib/ruby/1.9.1/rexml/source.rb:212:in 'match'
/usr/lib/ruby/1.9.1/rexml/parsers/baseparser.rb:425:in 'pull'
/usr/lib/ruby/1.9.1/rexml/parsers/sax2parser.rb:92:in 'parse'
/home/brian/.gem/gems/xmpp4r-0.5/lib/xmpp4r/streamparser.rb:79:in 'parse'
/home/brian/.gem/gems/xmpp4r-0.5/lib/xmpp4r/stream.rb:75:in 'block in start'
...
Exception parsing
Line: 
Position: 0
Last 80 unconsumed characters:
范</body>

@ajsharp
Copy link

ajsharp commented Sep 19, 2011

@bsl not to my knowledge, unfortunately. I think what we need at this point is an actively maintained fork of the xmpp4r library.

@dotdoom
Copy link

dotdoom commented Sep 19, 2011

btw, didn't the patch from #3 (comment) help?

@bsl
Copy link

bsl commented Sep 19, 2011

@dotdoom It does seem to fix it. Thank you!

@vosechu
Copy link

vosechu commented Oct 3, 2011

@dotdoom, works great for me. Thanks for the patch.

@gutenye
Copy link

gutenye commented Feb 29, 2012

@DoDoom it works. thanks. only wait to be merged in. Is this project dead?

@romanbsd
Copy link

The latest update was more than one year ago, so yes, it looks like it's dead... Maybe one of the forks is in better shape.

@hoxworth
Copy link

This recently bit me, so I started working on a fork with better 1.9 compatibility at https://github.com/hoxworth/xmpp4r. Right now I simply replaced the stream parser with a Nokogiri SAX parser, which handles character encodings far better. I'd like to pull out as much REXML as possible, but XMPP4R was pretty tightly coupled to REXML.

@whitehat101
Copy link

@hoxworth A while back I also started working on modernizing xmpp4r, https://github.com/whitehat101/xmpp4r, I merged most of the pulls and hacked some. I've been using that branch in "production" for months, and the only unresolved issue I'm bothered by is the utf-8 crashes. I'll check your stuff out, when I get a chance, and you might want to see mine.

I missed the monkeypatch in this issue b/c I only looked at pulls.

@hoxworth
Copy link

awesome, @whitehat101, I'll definitely take a look. didn't really like the monkey patch myself, and only worked for half of my use cases. we've been using my nokogiri patch for a while now with numerous utf-8 xmpp sources, and haven't had a crash since.

@csfmeridian
Copy link

Wow, @dotdoom, your code saved my life, thanks! 👍

deeeki referenced this issue in deeeki/xmpp4r Oct 4, 2013
@sang2087
Copy link

@dotdoom oh!!! Good!!! you are so generous!! oh you save me and my money. Thank you so much!

kimh pushed a commit to gruis/starbot-xmpp that referenced this issue Jan 23, 2014
REXML fails to parse some xml response from xmpp server and get the following error.

  REXML::ParseException
  #<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)>
  .rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/rexml/source.rb:212:in `match'

This causes bot to fail to join some rooms.

See lnussbaum/xmpp4r#3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests