superkeka / artykul-instructions

The small format for describing web pages and their content.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Artykul Instructions ๐Ÿ“œ

Artykul Instructions is the small format for describing web pages and their content. With the help of instructions, you can modify web pages before converting and provide more meta information from the HTML.

Filename

Every instruction in this repository should be named like [hostname].json.

If a hostname starts with www., please remove it. Artykul drops subdomains to the top-level known domain name.

www.blog.com        ->       blog.com.json
movies.blog.com     ->       blog.com.json
www.movies.blog.com ->       blog.com.json
news.blog.com       ->  news.blog.com.json

How to activate the debugger?

Open "Settings" and scroll down to the bottom. Tap 10 times to the version string and hold down a finger on it to open the developer menu.

You can use the debugger to validate instructions without importing them to the server.

How to use the debugger?

We recommend using macOS with Universal Clipboard feature enabled. Developing instructions on the phone is not a good idea.

Disable JavaScript in your browser; Artykul does not execute JS. The standard developer panel in Safari or Chrome is enough to explore markup. If something goes completely wrong, then install the following user-agent:

Artykul/1.0 ALCore/1.0 (like TwitterBot)

Structure

{
    meta?: {
        title?: ALFunctionText
        description?: ALFunctionText
        image?: ALFunctionText
        author?: {
            name?: ALFunctionText
            twitter?: ALFunctionText
        }
        site?: {
            title?: ALFunctionText
            description?: ALFunctionText
            image?: ALFunctionText
            twitter?: ALFunctionText
            telegram?: ALFunctionText
        }
        publishedTime?: ALFunctionText
        modifiedTime?: ALFunctionText
    }
    exists?: (ALFunction = isElementExists)
    content?: []ALFunctionElement
}
  • Fields that end with ? are optional.
  • ALFunctionText is the function that returns a string.
  • ALFunctionElement is the function that returns an element.

Please, don't describe elements that Artykul already finds in a document. Only if it is a content element. Here is an example of a basic template with all required fields ๐Ÿ™ƒ

{}

exists

The field only accepts isElementExists function. If it returns false, Artykul viewer will be disabled for a page.

Artykul Viewer is designed for articles; it's not a web browser.

content

The field only accepts array of getElement functions. The first exists selector will be converted.

You can use multiple getElement for processing different types of pages from the web site.

Functions

You can find examples of using any of the listed functions in the repository.

We only use the CSS selector in queries. Queries work relative to the parent.

You can use "self" selector in some cases. This selector is often needed in element conversion tasks. Example.

getText

{
    @: "getText"
    query: string 
}

getJoinedText

You can use this function to merge multiple elements in one string joined by a separator.

{
    @: "getJoinedText"
    separator: string
    query: string 
}

getAttribute

{
    @: "getAttribute"
    query: string 
    attribute: string
}

getElement

The function returns the first element and is identical to querySelector() in JavaScript. Use beforeReturn for removing and replacing elements.

{
    @: "getElement"
    query?: string 
    queries?: []string 
    beforeReturn?: []ALFunction
}

getBackgroundImage

The function returns background-image from inline CSS.

{
    @: "getBackgroundImage"
    query: string 
}

isElementExists

{
    @: "isElementExists"
    query?: string
    constant?: bool 
}

removeElement

{
    @: "removeElement"
    query: string
}

constant

Use it only for well-known sites when Artykul detects meta-value incorrectly.

{
    @: "constant"
    value: string
}

replaceElement

{
    @: "replaceElement"
    query: string
    newElements: []ALFunctionElement
}

createElement

{
    @: "createElement"
    tagName: string
    attributes: [](ALFunction = createAttribute || ALFunction = getAttribute)
    content: []ALFunctionElement
}

createAttribute

{
    @: "createElement"
    key: string
    value?: []ALFunctionText
}

Good to know

Order of execution

Getting Meta -> Getting Content Element -> Converting Content Element to Artykul Format

beforeReturn (getElement) rules are executed in the order in which they are specified.

Markers

Currently, Artykul supports only .artykul-content selector. If you see a page with this element, don't write content rules for it. It's the right of a web developer to define the content or even disable processing.

Ignored elements

Artykul already contains a dictionary of elements to be removed from the page. Here is the list:

comment, related, tags, date, share, author, promo, favorite, rating

If there are such elements in the article, but they are important, you will have to rename them in the rules.

When will my changes take effect?

Usually, we import changes to the server after commits. The user must restart the application to receive updated instructions.

Fix lazy image

<p>...</p>
<div class="lazy-image" style="..." data-image-src="...">
    <div class="inner" style="..."></div>
</div>
<p>...</p>
{
    "@": "replaceElement",
    "query": ".lazy-image",
    "newElements": [
        {
            "@": "createElement",
            "tagName": "img",
            "attributes": [
                {
                    "@": "createAttribute",
                    "key": "src",
                    "value": [
                        {
                            "@": "getAttribute",
                            "query": "self",
                            "attribute": "data-image-src"
                        }
                    ]
                }
            ]
        }
    ]
}

About

The small format for describing web pages and their content.