Skip to content
This repository has been archived by the owner on May 8, 2024. It is now read-only.

Parsing is very slow #25

Open
ferrous26 opened this issue Sep 17, 2011 · 2 comments
Open

Parsing is very slow #25

ferrous26 opened this issue Sep 17, 2011 · 2 comments

Comments

@ferrous26
Copy link

There are a few reasons why:

  1. Handsoap::XmlQueryFront::NokogiriDriver#to_s is very inefficient

The method uses a lot of literal strings that are constant and do not need to be modified. Using literal strings means they need to be #duped every time they are used in the method. Also, there are several #gsub calls where #gsub! could be called instead.

There is also a note about Nokogiri APIs being unstable, I'm not sure if this is the case anymore, but I overrode this method to just call #content on the backing Nokogiri node. I have something like this as a solution:

diff --git a/lib/handsoap/xml_query_front.rb b/lib/handsoap/xml_query_front.rb
index 3df435c..742d7e1 100644
--- a/lib/handsoap/xml_query_front.rb
+++ b/lib/handsoap/xml_query_front.rb
@@ -168,9 +168,8 @@ module Handsoap
       # Returns the underlying native element.
       #
       # You shouldn't need to use this, since doing so would void portability.
-      def native_element
-        @element
-      end
+      attr_reader :native_element
+
       # Returns the node name of the current element.
       def node_name
         raise NotImplementedError.new
@@ -350,13 +349,34 @@ module Handsoap
           element = @element.children.first
         end
         return if element.nil?
+        string = element.content
+
         # This looks messy because it is .. Nokogiri's interface is in a flux
         if element.kind_of?(Nokogiri::XML::CDATA)
-          element.serialize(:encoding => 'UTF-8').gsub(/^<!\[CDATA\[/, "").gsub(/\]\]>$/, "")
+          stirng.gsub!(EBEGIN_CDATA, BLANK_STRING)
+          string.gsub!(EEND_CDATA,   BLANK_STRING)
         else
-          element.serialize(:encoding => 'UTF-8').gsub('&lt;', '<').gsub('&gt;', '>').gsub('&quot;', '"').gsub('&apos;', "'").gsub('&amp;', '&')
+          string.gsub!(ELT,   LT)
+          string.gsub!(EGT,   GT)
+          string.gsub!(EQUOT, QUOT)
+          string.gsub!(EAPOS, APOS)
+          string.gsub!(EAMP,  AMP)
         end
-      end
+        string
+      end
+      EBEGIN_CDATA = /^<!\[CDATA\[/
+      EEND_CDATA   = /\]\]>$/
+      BLANK_STRING = ''
+      ELT          = '&lt;'
+      LT           = '<'
+      EGT          = '&gt;'
+      GT           = '>'
+      EQUOT        = '&quot;'
+      QUOT         = '"'
+      EAPOS        = '&apos;'
+      APOS         = "'"
+      EAMP         = '&amp;'
+      AMP          = '&'
     end
   end
 end
  1. All the data transformers use #to_s

This is expensive since calling #to_s is expensive, but even if #to_s is fixed I do not think the other transformers need to unescape the escape sequences, do they?

I don't really have the time to fix this right now and also make sure I don't break the other drivers. :(

  1. Using XPath is not very efficient for large data structures

Rewalking the XML subtree is expensive for big data structures. I'm not sure if this is a problem for Handsoap, but maybe a notice in the documentation should be added.

I have worked around all of these issues in a gem that uses handsoap: http://github.com/Marketcircle/jiraSOAP.

@troelskn
Copy link
Contributor

Hi,

Thanks for this. Most seems like quite good suggestions. Unfortunately I don't really have the time to maintain Handsoap at the moment, as the issue tracker here shows. If you'll provide a patch and a pull request, I'll get it merged into the main branch.

Or if you're interested in taking over maintenance of the library, let me know.

@ferrous26
Copy link
Author

Hi,

Unfortunately, I'm overcommitted as it is, so I can't take over maintainership. I will try to cleanup fixes for the points I've raised and then open a pull request in the not too distant future.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants