A SQL inspired query language for HTML documents.
I have two goals for this project:
-
Learn/practice Python I am just now learning Python and wanted a project to do with it. I figured this might be a good one since it has the opportunity for data analysis.
-
Create a new language for fun I thought it'd be an interesting experiment to create a new language.
These two things being the main reasonings, I'm expecting some big diffs, big re-writes, and probably a few start overs. We'll see how this goes.
I honestly do not know how you are supposed to install this package... Supposedly the below will do it, but I'm not so sure...
If you don't use pipsi
, you're missing out.
Here are installation instructions.
Simply run:
$ pipsi install .
$ hql https://www.python.org
Start up the app by running hql
. You also may add an optional URL to connect and request an html file in one go.
There are several commands you can run once inside the HQL environment.
- exit - to exit the environment
- help - to see the help menu (it looks a lot like this list)
- refresh - to re-query the currently set url
- response - to see the HTML response
- url - to retrieve and set the current url
The url
command can be use two ways. Just typing it alone will print the currently set url:
url
It also takes an optional second argument that can be used to set the url.
url https://pypi.python.org/pypi
Querying is currently set up to mimic SQL queries, but please know that it is no where near as feature-filled at the moment. The four query sections accepted are (in order):
- SELECT: (required) followed by the properties and attributes that you'd like to select from the returned html
- WHERE: followed by the selection criteria for the query.
- LIMIT: max number of responses
- OFFSET: how many responses to skip in the beginning
You can select any number of values, but currently you cannot send a *
to get them all (yet).
SELECT name class id
Currently, the where clause only accepts one equality match.
SELECT name WHERE class = btn
Both accept one number. They can be used in conjunction or separately.
SELECT name LIMIT 10 OFFSET 5
Currently you can query and select these properties of the HTML element
- attributes (like
class
,id
,href
, etc. exceptname
for now) - children
- name (the tag name like
div
) - parent
- text