WebOrganics / TransFormr

The Microformat Transformer Toolkit

Home Page:http://microform.at/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Microformat Fragment by id

williamka opened this issue · comments

Hi,

Great piece of code been using it for a long, long time (from the very begining), but decided to upgrade as no longer dependant on html tidy and therefore a little more tollerant.

Whilst it works like a dream for converting a page to the various formats, I was intrigued by the convert fragment by passing a html id for a given html snippet.

This does not appear to work, which ever way you try to dubmit it, even if using the direct functionality to pass it does.

Can you have a look and make sure it works as expected?

On a sperate note, on the assumption that search engines are at some point capable of reading and interpreting the information converted my transformr, we should be cautios of canocal urls : e.g /referer, or /id . My site uses this functionality on almost everypage and to avoid what appears to be the same url delivering something different on each call we should look to include the url in every call to transromr, which means that eacu url remains unique. (Just a thought)

Great work, it definately makes a big difference to me.

William

Hello Thanks for the detailed info of your issue.
The fist thing I should ask is are you escaping the '#' hash character you should escape it using '%23' when using direct url input?
One thing you could try to confirm that fragments are not working for you is to use the detect bookmarklet, then go to the url you want to parse eg: http://weborganics.co.uk/#first-post and hit detect. The url being parsed should appear in the source link including the fragment.
The Next thing to try is enter the fragment url directly into your transformr using the url input eg: http://weborganics.co.uk/#first-post
lastly you can use JSON as your input eg: index.php?q={ "url" : "http://weborganics.co.uk/%23first-post", "type" : "detect" } (note the escaped '#')
If none of the above work. I can then look at how to fix it for you.

Thanks.

I can confirm that parsing by fragment id does not work using dom parsing and XHTML documents, seems to work fine on HTML though... working on a fix.

Hi

Ok so let me expand as it must be my site specifically or maybe hcalendar,
, below is a page, the snippet passes in direct more but if you try using
the url it simply fails even though you can pass the page as a whole.

Does the url below pass for you?

http://www.dancefederation.com/uk/salsa-dancing/clubs-classes/midlands-b60-salsa.php#vevent1

actual html snippet hcalendar--->

Wednesday - B60 Salsa @ Stoke Prior Sports & Country Club - Stoke Prior Bromsgrove

    <ul class="pillboxes button group">


                                                <li class="gold

scriptEnabled" id="scriptEnabledEdit1">Edit

  • Copy
  •         <!--li class="gold"><a href="
    

    http://www.google.com/calendar/event?action=TEMPLATE&amp;text=B60%20Salsa%20Wednesday%20%40%20Stoke%20Prior%20Sports%20%26%20Country%20Club%20-%20Stoke%20Prior%20Bromsgrove&amp;dates=20110518%2F20110518&amp;details=Classes%20resume%20on%20Wednesday%2012%20January%202011%0D%0A%0D%0AWeekly%20salsa%20classes%20with%20experienced%20and%20qualified%20tuition%0D%0A%0D%0A7.15pm%20-%20Arrival%20and%20registration%0D%0A7.30pm%20-%20Beginners%20Class%0D%0A8.30pm%20-%20Practice%20%28for%20both%20classes%29%0D%0A8.45pm%20-%20Improvers%20Class%0D%0A9.45pm%20-%20dance%20until%20closing%0D%0A%0D%0AFull%20details%20are%20available%20on%20our%20website%2C%20or%20please%20call%200779%20400%2081456%3Cbr%2F%3E%3Cbr%2F%3E%3Ch4%3ETimes%3A%20%3C%2Fh4%3E%3Cbr%2F%3E7.15pm%20-%2011.00pm%3Cbr%2F%3E%3Cbr%2F%3E%3Ch4%3EPrice%3A%20%3C%2Fh4%3E%3Cbr%2F%3E%C2%A36%3Cbr%2F%3E%3Cbr%2F%3EProvided%20by%20The%20Dance%20Federation%20-%20www.dancefederation.com&amp;location=Stoke%20Prior%20Sports%20%26%20Country%20Club%20Stoke%20Prior%20Bromsgrove%2C%20Worcestershire%2C%20B60%204AL&amp;trp=false&amp;sprop=www.dancefederation.com&amp;sprop=name:DanceFederation"
    target="_blank" title="Add this B60 Salsa lesson to your Google
    Calendar">+ Google Calendar


                                        <a style="display:none"
    

    href="https://github.com/uk/salsa-dancing/clubs-classes/midlands-b60-salsa.php" target="_blank" rel="nofollow"
    class="url">Dance Federation


    Weekly


    20110518

                                        <!--span class="dtend"
    

    style="display:none">20110518

    Classes resume on Wednesday 12 January 2011



    Weekly salsa classes with experienced and qualified tuition



    7.15pm - Arrival and registration

    7.30pm - Beginners Class

    8.30pm - Practice (for both classes)

    8.45pm - Improvers Class

    9.45pm - dance until closing



    Full details are available on our website, or please call 0779 400 81456



    Times:
    7.15pm -
    11.00pm

    Price :
    6



                                        <!-- google_ad_section_start -->
                                        <span
    

    class="vcard_highlight">Location (

                                                          ) :
                                        </span>
                                        <br>
                                        <span class="location vcard">
                                          <span class="url fn org">
                    Stoke Prior Sports &amp; Country Club
                      </span>
                                          <span style="display:inline"
    

    class="adr">

    Weston Hall Road,



    Stoke
    Prior Bromsgrove




    Worcestershire




    B60 4AL



    UK




    52.295
    -2.08142





                                    </div>
    
                                </div>
    

    On 15 May 2011 15:32, WebOrganics <
    reply@reply.github.com>wrote:

    Hello Thanks for the detailed info of your issue.
    The fist thing I should ask is are you escaping the '#' hash character you
    should escape it using '%23' when using direct url input?
    One thing you could try to confirm that fragments are not working for you
    is to use the detect bookmarklet, then go to the url you want to parse eg:
    http://weborganics.co.uk/#first-post and hit detect. The url being parsed
    should appear in the source link including the fragment.
    The Next thing to try is enter the fragment url directly into your
    transformr using the url input eg: http://weborganics.co.uk/#first-post
    lastly you can use JSON as your input eg: index.php?q={ "url" : "
    http://weborganics.co.uk/%23first-post", "type" : "detect" } (note the
    escaped '#')
    If none of the above work. I can then look at how to fix it for you.

    Thanks.

    Reply to this email directly or view it on GitHub:
    #4 (comment)

    William

    Ok so that explains the reason, I think the best solution is to add a
    aparameter to my url to strip these out something like b60.php?xhtml=clean

    This can part of the url passed to transformr hopefully it wont do somehing
    funny with them.

    Thanks for your help, one last point, in your form post you have the post
    location set to "/" you may want to set it to the same page incase like me
    the transformr and the root directory are not the same.

    Thanks again, keep up the good work it is appreaciated.

    William

    On 15 May 2011 17:03, WebOrganics <
    reply@reply.github.com>wrote:

    No sorry it doesn't pass i'm afraid, the specific reason for that is a bug
    that I have encountered before, it is caused by the add this code on line
    258

    apart from there are no such attributes addthis:templates and tw:count in
    either HTML or XHTML, when parsing XML/XHTML using the dom parsing is very
    strict, in XML the dom parser expects namespaces to be set up for both
    addthis and tw example xmlns:tw="http://somenamespace/", if namespaces do
    not exist then simply the parser will choke/fail and deliver a blank result.
    The easiest way to fix this is remove the code and look at javascript to add
    the addthis markup to your document, transformr cannot as yet read the
    results of javascript, only the hard coded stuff.

    hope that all helps.

    Best wishes.

    Reply to this email directly or view it on GitHub:
    #4 (comment)

    William

    Well, truthfully I don't think that will solve your problem:( I have a
    copy of the page in question saved locally on my dev server, even with
    all the scripts stripped out an invalid attributes, I still couldn't
    parse fragments. I will take a more in depth look at your code sometime
    today and see if I can come up with something.

    Best wishes.

    Martin.

    On 15/05/11 17:13, williamka wrote:

    Ok so that explains the reason, I think the best solution is to add a
    aparameter to my url to strip these out something like b60.php?xhtml=clean

    This can part of the url passed to transformr hopefully it wont do somehing
    funny with them.

    Thanks for your help, one last point, in your form post you have the post
    location set to "/" you may want to set it to the same page incase like me
    the transformr and the root directory are not the same.

    Thanks again, keep up the good work it is appreaciated.

    William

    On 15 May 2011 17:03, WebOrganics <
    reply@reply.github.com>wrote:

    No sorry it doesn't pass i'm afraid, the specific reason for that is a bug
    that I have encountered before, it is caused by the add this code on line
    258

    apart from there are no such attributes addthis:templates and tw:count in
    either HTML or XHTML, when parsing XML/XHTML using the dom parsing is very
    strict, in XML the dom parser expects namespaces to be set up for both
    addthis and tw example xmlns:tw="http://somenamespace/", if namespaces do
    not exist then simply the parser will choke/fail and deliver a blank result.
    The easiest way to fix this is remove the code and look at javascript to add
    the addthis markup to your document, transformr cannot as yet read the
    results of javascript, only the hard coded stuff.

    hope that all helps.

    Best wishes.

    Reply to this email directly or view it on GitHub:
    #4 (comment)

    no more reports on this issue closed for now