-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xmpp4r throws the REXML::ParseException on Cyrillic characters #3
Comments
The dumpz links don't work anymore |
erm, could you try to write a simple testcase for that bug, based on one of the existing testcases? According to the exception, REXML receives ASCII 8bit characters, not UTF-8 characters. |
Same situation. http://dumpz.org/17011/ |
Yes. I would need a test case. |
http://megabytov.net/293 - test case. |
that could be a rexml problem. Since current version of ejabberd opens an xml stream with xml header missing encoding attribute, rexml fails to retrieve encoding and baseparser sets source.encoding to nil, which falls back to ASCII8BIT. That's wrong behaviour according to http://www.opentag.com/xfaq_enc.htm and I'm also unsure about force_utf8 detection algorithm.
Although nothing related seemed to break for now, I only used it in the development environment, not production. Regards |
I'm having this same issue. Is there any support for swapping out the xml parsing backend for something like nokogiri? |
I don't know. The last thing I came up with was if RUBY_VERSION < "1.9"
# ...
else
# Encoding patch
require 'socket'
class TCPSocket
def external_encoding
Encoding::BINARY
end
end
require 'rexml/source'
class REXML::IOSource
alias_method :encoding_assign, :encoding=
def encoding=(value)
encoding_assign(value) if value
end
end
begin
# OpenSSL is optional and can be missing
require 'openssl'
class OpenSSL::SSL::SSLSocket
def external_encoding
Encoding::BINARY
end
end
rescue
end
end monkey patch. But then more and more problems appeared (not affiliated with encoding but nvl) so I decided to rewrite the whole thing |
Which part did you rewrite? Is it open source? |
I've just recently started and have not much to commit yet. Will create the repository later. |
Hi, this seems related. Is there some workaround?
|
@bsl not to my knowledge, unfortunately. I think what we need at this point is an actively maintained fork of the xmpp4r library. |
btw, didn't the patch from #3 (comment) help? |
@dotdoom It does seem to fix it. Thank you! |
@dotdoom, works great for me. Thanks for the patch. |
@DoDoom it works. thanks. only wait to be merged in. Is this project dead? |
The latest update was more than one year ago, so yes, it looks like it's dead... Maybe one of the forks is in better shape. |
This recently bit me, so I started working on a fork with better 1.9 compatibility at https://github.com/hoxworth/xmpp4r. Right now I simply replaced the stream parser with a Nokogiri SAX parser, which handles character encodings far better. I'd like to pull out as much REXML as possible, but XMPP4R was pretty tightly coupled to REXML. |
@hoxworth A while back I also started working on modernizing xmpp4r, https://github.com/whitehat101/xmpp4r, I merged most of the pulls and hacked some. I've been using that branch in "production" for months, and the only unresolved issue I'm bothered by is the utf-8 crashes. I'll check your stuff out, when I get a chance, and you might want to see mine. I missed the monkeypatch in this issue b/c I only looked at pulls. |
awesome, @whitehat101, I'll definitely take a look. didn't really like the monkey patch myself, and only worked for half of my use cases. we've been using my nokogiri patch for a while now with numerous utf-8 xmpp sources, and haven't had a crash since. |
Wow, @dotdoom, your code saved my life, thanks! 👍 |
@dotdoom oh!!! Good!!! you are so generous!! oh you save me and my money. Thank you so much! |
REXML fails to parse some xml response from xmpp server and get the following error. REXML::ParseException #<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)> .rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/rexml/source.rb:212:in `match' This causes bot to fail to join some rooms. See lnussbaum/xmpp4r#3
When simplest Jabber client (like at http://dumpz.org/9810/) receives the UTF-8 string which contains Cyrillic (e.g. Russian) characters, xmpp4r fails with REXML::ParseException and won't work anymore.
Backtrace here: http://dumpz.org/9806/
Thanks.
The text was updated successfully, but these errors were encountered: