Web scrapper for HTML and sitemap.xml content analysis.
This Node.js webapp is a small tool for SEO with 2 functionalities which can be discovered on this free-dynoed Heroku app, which may be subject to downtimes if overused:
This tab parses the content of a given url (http://www.lucsorel.com/
for example) and displays the words by a decreasing order of importance, according to some weights assigned to different html tags. Being displayed in a h1
tag brings more weight to a word than being displayed in a h2
tag, and so on). The weighs are (rather arbitrarily) defined on the front-end side.
In the result of the analysis of the http://www.lucsorel.com/
page, you can interpret:
virtual: 33
a: 3 h2: 2 b: 1
as:
- a total weight of 23 for the word
virtual
- which appears 3 times in a
<a>
tag, 2ce in a<h2>
tag and 1ce in a<b>
tag
A sitemap is an XML file, often located at the root of a website along the robots.txt
file, listing the URLs of a website to ease the work of indexation engines. Its format is explained on sitemaps.org. Each URL can be optionally characterized with:
- a
priority
describing the importance of the page in the site - an
update frequency
to let indexation engines know how often the content is updated - a
last edition date
For example, the www.sitemaps.org/sitemap.xml only describes the URLs and their last edition date (when this doc was written).
The sitemap analysis is done in two steps (see the example of the www.sitemaps.org/sitemap.xml analysis):
- the first step lists the URLs along with their optional characteristics and highlights duplicated URLs
- on this result screen, you can select the URLs to check their existence, HTML title, HTTP status and possible redirection
- the Node.js back-end
express
app uses: - the
AngularJS
front-end app uses:- ui-router and the ui-router-menu-service I designed to help routing and menu-highlighting in ng1.x apps
- server-client communication is done via socket.io and uses the socket-io-ng-service module I packaged