yagopv / AzureCrawler

Take HTML Snapshots for your Angular, Ember, Durandal or any JavaScript applications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Am I missing something, implementing this and attempting to index my ajax pages

jmwnoble opened this issue · comments

Hi

Not an issue with your code, but a question about implementation that I can't solve:

I've implemented this as you've recommended, and tested that 'fetch as google' returns the html snaphots of pages such as: http://hermitagepartners.com.au/home/frameworks/#!growth-share-matrix

I've submitted a site map with all the urls, but google refuses to index any of the ajax links, although there are no crawl errors. Can you see something obvious that I've missed in terms of setting up urls for google to find them?

Thanks for sharing your work with others - I've found this and DurandalAuth immensely helpful - gracias!!

http://stackoverflow.com/questions/26547058/google-not-indexing-my-ajax-hashbang-urls

Hi,

If I check your site links indexed by google I can see the one you are talking about among others

Check your site links by entering site:http://hermitagepartners.com.au in chrome

Did you solve the issue yourself?

I can see as well a console error in Chrome DevTools related to jQuery ...

Hi,

It seems to have indexed some pages, but I can't see why it would have done a handful of the dynamic pages with hashbangs, but not the others (there are over 100 in total). It's definitely not indexing them from the site map submission I did (where I added all the dynamic pages with hashbangs) because webmaster tools is saying 5 of 111 pages indexed.

Any thoughts?

Appreciate any help you can give

Jeremy

PS If there isn't an obvious solution that I can fix quickly, would you or anyone you know be willing to undertake a few hours work at consulting rates to help me solve it?

On 25 Oct 2014, at 9:55 pm, Yago Pérez Vázquez notifications@github.com wrote:

Hi,

If I check your site links indexed by google I can see the one you are talking about among others

site:http://hermitagepartners.com.au

Did you solve the issue yourself?

I can see as well a console error in Chrome DevTools related to jQuery ...


Reply to this email directly or view it on GitHub.

Hi,

Where is your sitemap published?

Can´t see any in http://hermitagepartners.com.au/sitemap.xml

It's at hermitagepartners.com.au\sitemap\sitemap

Jeremy

On 29 Oct 2014, at 6:42 pm, Yago Pérez Vázquez notifications@github.com wrote:

Hi,

Where is your sitemap published?

Can´t see any in http://hermitagepartners.com.au/sitemap.xml


Reply to this email directly or view it on GitHub.

Hi,

First of all, I think it would be good idea to have a robots.txt in the root of your site allowing all user agents and indicating the sitemap url

User-agent: *
Sitemap: https://hermitagepartners.com.au\sitemap\sitemap

More ideas ...

When I navigate to the ajax links in the sitemap, the sidebar at left is disabled. Why? This can cause crawlers not followings links when accesing the url you submitted.

Even with that bar enabled, for example when accesing to http://hermitagepartners.com.au/home/frameworks, there are a lot of links without href tag so crawlers will not follow them.

I think the no href attrs could be a problem.

Try to fix this and let me know

Hi

Took me a while to get round to implementing your suggestions - have done now, including altering the snapshots so they don't include any of the nav links or duplicate content. Fetch as Google returns the correct content every time. However, the problem remains - only indexing non-Ajax pages, and giving no indication as to what the problem is. Do you have any further ideas? As mentioned before, happy to contract the work to fix this...

Thanks
Jeremy

On 30 Oct 2014, at 7:06 pm, Yago Pérez Vázquez notifications@github.com wrote:

Hi,

First of all, I think it would be good idea to have a robots.txt in the root of your site allowing all user agents and indicating the sitemap url

User-agent: *
Sitemap: https://hermitagepartners.com.au\sitemap\sitemap
More ideas ...

When I navigate to the ajax links in the sitemap, the sidebar at left is disabled. Why? This can cause crawlers not followings links when accesing the url you submitted.

Even with that bar enabled, for example when accesing to http://hermitagepartners.com.au/home/frameworks, there are a lot of links without href tag so crawlers will not follow them.

I think the no href attrs could be a problem.

Try to fix this and let me know


Reply to this email directly or view it on GitHub.

Hi,

When I tell you about the disabled bar I meant your links should always have the href attr included for let the crawlers navigate through them.

If you remove the links from your ajax pages snapshots then the crawlers won´t follow them even you included that links in your sitemap.

If I simply do a $("a") in one of your ajax pages I can see most of the links without the href. It can be really a problem. The links always must point to a valid url through its href attribute.

hermitage

When I navigate to the http://hermitagepartners.com.au/home/frameworks I can see a list of frameworks and I can navigate to the concrete framework. What I can´t see is the link to that framework so I suppose you are navigating using javascript code.

You have to show crawlers your links for being able to follow them. Sitemap is a complement for the crawler but that crawler should be able to navigate your app though your page links.

I recommend you to take a read to https://developers.google.com/webmasters/ajax-crawling/ because aqll the information you need is there.

I would like to help your more in deep but at this moment for job reasons it is imposible for me.

Bye!!