GitHub - anirvan/html-extractmain: Perl module to extract the main content of a web page

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
lib/HTML		lib/HTML
t		t
xt		xt
.gitignore		.gitignore
Build.PL		Build.PL
Changes		Changes
MANIFEST		MANIFEST
MANIFEST.SKIP		MANIFEST.SKIP
META.yml		META.yml
Makefile.PL		Makefile.PL
README		README

Repository files navigation

HTML::ExtractMain

HTML::ExtractMain takes HTML content, and extracts the HTML section
representing the main body of the page, skipping headers, footers,
navigation, etc.

HTML::ExtractMain's Readability algorithm is ported from Arc90's
JavaScript-based Readability application, online at
http://lab.arc90.com/experiments/readability/

INSTALLATION

To install this module, run the following commands:

	perl Build.PL
	./Build
	./Build test
	./Build install

SUPPORT AND DOCUMENTATION

After installing, you can find documentation for this module with the
perldoc command.

    perldoc HTML::ExtractMain

You can also look for information at:

    RT, CPAN's request tracker
        http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-ExtractMain

    AnnoCPAN, Annotated CPAN documentation
        http://annocpan.org/dist/HTML-ExtractMain

    CPAN Ratings
        http://cpanratings.perl.org/d/HTML-ExtractMain

    Search CPAN
        http://search.cpan.org/dist/HTML-ExtractMain/


COPYRIGHT AND LICENCE

Copyright (C) 2009-2010 Anirvan Chatterjee http://www.chatterjee.net/
Copyright (C) 2013 Rupert Lane http://www.rupert-lane.org/
Copyright (C) 2013 kryde https://github.com/kryde


This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.