PHP-MetaParser

Inspired by Facebook's link sharing flow, this abstractly accessed class attempts to parse a document (x/html), and retrieve it's meta-information. I emphasize attempts, as x/html documents are exceptionally tough to parse, and data is often lost due to the content structuring delivered.

This class, as seen by the example below, works very well when coupled with the PHP-Curler class.
The following is an example of how data is returned, using http://www.bbc.com/ as an example:

Array
(
    [base] => http://www.bbc.com/
    [favicon] => http://www.bbc.co.uk/favicon.ico
    [meta] => Array
        (
            [description] => Breaking news, sport, ...
            [keywords] => Array
                (
                    [0] => BBC
                    [1] => bbc.co.uk
                    ...
                    [6] => BBCi
                )

        )

    [images] => Array
        (
            [0] => http://sa.bbc.co.uk/bbc/bbc/s?name=home.page&amp;geo_edition=us&amp;ml_name=barlesque&amp;app_type=web&amp;language=en-GB&amp;ml_version=0.6.3
            [1] => http://static.bbc.co.uk/frameworks/barlesque/1.21.3/desktop/3/img/blocks/light.png
            [2] => http://static.bbc.co.uk/wwhomepage-3.5/ic/news/432-259/57632000/jpg/_57632639_013603124-1.jpg
            [3] => http://static.bbc.co.uk/wwhomepage-3.5/ic/news/432-259/57626000/jpg/_57626527_57626526.jpg
            ...
            [25] => http://me.effectivemeasure.net/em_image
        )

    [openGraph] => Array
        (
            [title] => BBC - Homepage
            [type] => website
            [image] => http://static.bbc.co.uk/wwhomepage-3.5/1.0.29/img/iphone.png
            [url] => http://www.bbc.co.uk/
        )

    [title] => BBC - Homepage
    [url] => http://www.bbc.com/
)

Parsing Example

The following code uses the PHP-Curler class to curl the BBC site, store it's content, and pass it along to a MetaParser instance. The URL is passed along as well to ensure any paths (favicons, images) are rewritten relative to the path of the document that was parsed.

<?php

    // booting
    require_once APP . '/vendors/PHP-Curler/Curler.class.php';
    require_once APP . '/vendors/PHP-MetaParser/MetaParser.class.php';
    
    // curling
    $curler = (new Curler());
    $url = 'http://www.bbc.com/';
    $body = $curler->get($url);
    $parser = (new MetaParser($body, $url));
    print_r($parser->getDetails());

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitignore		.gitignore
LICENSE		LICENSE
MetaParser.class.php		MetaParser.class.php
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PHP-MetaParser

Parsing Example

About

Releases

Packages

Languages

License

rafasashi/PHP-MetaParser

Folders and files

Latest commit

History

Repository files navigation

PHP-MetaParser

Parsing Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages