emluque / MD_Extract

A Microdata Extractor for PHP 5

Home Page:http://www.metonymie.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

To test this:

1. Make sure you have PHP 5 with Tidy ( http://www.php.net/manual/en/tidy.installation.php ) and MB_String ( http://ar.php.net/manual/en/mbstring.installation.php ) support.
2. Move the folder to an apache dir and access the /examples folder with your browser.

This has been tested so far in PHP 5.3.4.

Known Issues:

. There might be problems with non-ASCII-extending Characted Encodings. If you find this issue report it with a clear explanation of how the problem encoding works or even better yet, some code.
. Base treating of http://www.example.com/dir/ and http://www.example.com/dir is the same. Meaning href="a.jpg" will be translated to http://www.example.com/dir/a.jpg in both cases. I don't know if this is correct or not.

Future:

. Add a construct_by_uri() .. (I need some live pages with microdata to test this).
. Add a vocabulary validator framework on top of this.
. I might (I said MIGHT not WILL) port it to C or C++.

Emiliano MartĂ­nez Luque
http://www.metonymie.com

PS: This is way clearer than the Microformat parser I did a couple of years ago, that's because Microdata has way clearer syntax than microformats. I think that Microformats was a great idea but there were some design ideas that were overly complex and it took a lot of code gymnastics to implement them, and I really believe that Microdata is a better spec.   

PS2: If you need to contact me for whatever reason use the contact form in http://www.metonymie.com 

About

A Microdata Extractor for PHP 5

http://www.metonymie.com


Languages

Language:PHP 68.8%Language:HTML 31.0%Language:CSS 0.2%