* Improved removal of fragments from URLs (now handles other characters in fragment) * Changed SQLite3 timeout to allow for other clients to read from database * Changed for wmonk: allow restarting of spider by not checking starting URLs * Added Anemone::Resource to provide for spidering of resources other than HTML pages
Bug fixes
Fix bug causing anchor links to have ‘#’ converted to ‘%23’
Minor enhancements
Switch from robots gem (which people reported problems with) to new robotex gem
Bug fixes
Fix incorrect default file extension for KyotoCabinet
Major enhancements
Added support for SQLite3 and Kyoto Cabinet storage
Minor enhancements
Added Page#base to use base HTML element
Use bundler for development dependencies
Bug fixes
Encode characters in URLs
Fix specs to run under rake
Fix handling of redirect_to in storage adapters
Bug fixes
Fix a bug preventing SSL connections from working
Major enhancements
Added support for HTTP Basic Auth with URLs containing a username and password
Added support for anonymous HTTP proxies
Minor enhancements
Added read_timeout option to set the HTTP request timeout in seconds
Bug fixes
Don’t fatal error if a page request times out
Fix double encoding of links containing %20
Major enhancements
Added page storage engines for MongoDB and Redis
Minor enhancements
Use xpath for link parsing instead of CSS (faster) (Marc Seeger)
Added skip_query_strings option to skip links with query strings (Joost Baaij)
Bug fixes
Only consider status code 300..307 a redirect (Marc Seeger)
Canonicalize redirect links (Marc Seeger)
Major enchancements
Cookies can be accepted and sent with each HTTP request.
Bug fixes
Fixed issue that allowed following redirects off the original domain
Minor enhancements
Added an attr_accessor to Page for the HTTP response body
Bug fixes
Fixed incorrect method calls in CLI scripts
Major enchancements
Option for persistent storage of pages during crawl with TokyoCabinet or PStore
Minor enhancements
Options can be set via methods on the Core object in the crawl block
Minor enhancements
Options are now applied per-crawl, rather than module-wide.
Bug fixes
Fixed a bug which caused deadlock if an exception occurred when crawling the last page in the queue.
Minor enhancements
When the :verbose option is set to true, exception backtraces are printed to aid debugging.
Major enhancements
Added HTTPS support.
CLI program ‘anemone’, which is a frontend for several tasks.
Minor enhancements
HTTP request response time recorded in Page.
Use of persistent HTTP connections.