XML mut

XML mut (XML mutation) - a simple XML mutation definition language resembling SQL. Define your simple XML transformations in an easy and readable way.

Example

Let us say you have some simple XML, but you are still not happy and you would like to simplify it a bit more. In the bellow XML, you know that the PackageReference sub-node's Version text could just be placed directly as an attribute of PackageReference. But you are too lazy to do it manually 😒.

<Project>
    <ItemGroup>
        <PackageReference Include="System.Text.Json">
            <Version>7.0.2</Version>
        </PackageReference>
        <PackageReference Include="Mono.Cecil">
            <Version>0.11.4</Version>
        </PackageReference>
    </ItemGroup>
</Project>

So instead you use xml-mut to do the work for you. You have seen some SQL in the past, and this task should be trivial. You brace yourself 💪 and write a simple XML mutation.

GET Project/ItemGroup/PackageReference
SET [@Version] = Version[text]
DELETE Version

Here we are saying we need to get an XML node having a path of Project, ItemGroup, and PackageReference (PackageReference node will be the target of the mutation). For that node, we set the Version attribute ([@Version]) with a value from the Version sub-node text (Version[text]). Since the Version sub-node is no longer needed we delete it. Applying an XML mutation we get XML shown below 🥧.

<Project>
    <ItemGroup>
        <PackageReference Include="System.Text.Json" Version="7.0.2"/>
        <PackageReference Include="Mono.Cecil" Version="0.11.4"/>
    </ItemGroup>
</Project>

Your eyes hurt much less looking into XML above 😎.

Mutation syntax

The syntax reminds SQL. It is plain and simple. It all starts with GET.

GET

Get syntax is expressed as shown below:

GET {node_path}

GET is a mandatory clause containing only the path of the node you intend to mutate. A simple example is below:

GET Candy/Sweet

Here it will try to find all Sweet XML nodes having a parent node Candy. So if we had XML as below, we would match on 2 nodes. <Sweet name="Caramel"/> and <Sweet name="Lolipop" />

<KidsJoy>
    <Candy>
        <Sweet name="Caramel"/>
        <Sweet name="Lolipop" />
        <Salty name="Lacris" />
    </Candy>
    <Sweet name="Potato"/>
</KidsJoy>

<Sweet name="Potato"/> does not match because it does not have a parent of Candy. <Salty name="Lacris" /> is just too salty.

WHERE

WHERE {predicate} and {predicate} and ...

Optional where clause allows filtering down desired nodes when node name match is not enough. You can have multiple predicates and you have to separate them with and. There are 2 kinds of predicates. EXISTS and EQUALS.

Exists

Exists predicate syntax is expressed as shown below:

EXISTS {node_path}

A simple example where clause with exists predicate:

WHERE EXISTS Sprinkles/Round

Here it will require that the node would contain sub-node Sprinkles and then Sprinkles would contain sub node Round. This is similar to how GET works but instead of requiring a path up to the node (parent path), it requires a path down the node (child path). So if we would combine it with the previous GET we would get this:

GET Candy/Sweet
WHERE EXISTS Sprinkles/Round

and if we had XML like the below we would match on 1 node <Sweet name="Lolipop">.

<KidsJoy>
    <Candy>
        <Sweet name="Caramel" />
        <Sweet name="Lolipop">
            <Sprinkles>
                <Round />
            </Sprinkles>
        </Sweet>
        <Salty name="Lacris" />
    </Candy>
    <Sweet name="Potato"/>
</KidsJoy>

Equals

equals predicate syntax is expressed as shown below:

{value_path} == {value_variant}
{value_path} ::= {node_path}{value_selector}
{value_variant} ::= {value_path} | {value_literal}
{value_literal} ::= {dis_just_a_string}

Oh boy, it is so unreadable above. Let us look at a simple example of where clause with equals predicate instead:

WHERE Sprinkles/Round[@color] == "pink"

Here we are saying that our desired node should have child path Sprinkles/Round. That is it should have a Sprinkles sub node and that Sprinkles sub-node should contain a Round sub-node. Then we say that this Round node should have an attribute color with a literal value of pink. Let us combine our previous GET with this where clause and explore.

GET Candy/Sweet
WHERE Sprinkles/Round[@color] == "pink"

Here we say that we want a Sweet node with parent node Candy and with child node path Sprinkles/Round where the Round attribute color is equal to pink. So the below XML would match on 1 node <Sweet name="Caramel">.

<KidsJoy>
    <Candy>
        <Sweet name="Caramel">
            <Sprinkles>
                <Round color="pink"/>
            </Sprinkles>
        </Sweet>
        <Sweet name="Lolipop">
            <Sprinkles>
                <Round color="jade"/>
            </Sprinkles>
        </Sweet>
    </Candy>
</KidsJoy>

Of course, you can always match on direct node attributes.

GET Candy/Sweet
WHERE [@name] == "Lolipop"

This will instead pick <Sweet name="Lolipop"> node from the XML above. To see what kind of value selectors are possible refer to the value selectors section.

Mixing and matching

You can include as many predicates as you need so where clause like below is valid.

WHERE [@name] == "Lolipop" 
    and EXISTS Sprinkles
    and EXISTS Filler

SET

The set is where the mutation part begins. set clause syntax is expressed as shown below:

SET {value_assignment}, {value_assignment}, ...
{value_assignment} ::= {value_path} = {value_variant}

The only difference from the EQUALS predicate is it uses a single = sign to denote the assignment (and confuse people). The main difference is how it behaves. Let us look at some examples.

SET [@name] = "ToughCaramel"

Here we are saying we want to set the nodes name attribute with the literal value ToughCaramel.

If we had a full mutation as below.

GET Candy/Sweet
WHERE [@name] == "Caramel"
SET [@name] = "ToughCaramel"

and would apply it to XML like below.

<KidsJoy>
    <Candy>
        <Sweet name="Caramel"/>
        <Sweet name="Lolipop" />
    </Candy>
</KidsJoy>

we would get the following.

<KidsJoy>
    <Candy>
        <Sweet name="ToughCaramel"/>
        <Sweet name="Lolipop" />
    </Candy>
</KidsJoy>

A simple syntax for a simple task.

Value selectors

You might notice that both the equals and value assignment end with a square bracket indexer []. Currently, it supports 4 types of value selectors.

Attribute

The attribute selector starts with @ sign and looks like this: [@name]. Here we are saying we want to pick a value from the node name attribute. Example in a where clause:

GET Project/ItemGroup/PackageReference
WHERE [@Include] == "Mono.Cecil"

if we had XML like the below:

<Project>
    <ItemGroup>
        <PackageReference Include="System.Text.Json" Version="7.0.2"/>
        <PackageReference Include="Mono.Cecil" Version="0.11.4"/>
    </ItemGroup>
</Project>

A single node <PackageReference Include="Mono.Cecil" Version="0.11.4"/> would match the predicate.

Text

Text selector must always contain text literal inside. it always looks like this: [text]. Here we are saying we want to pick nodes text as a value. Example in a set clause:

GET Project/ItemGroup/PackageReference
SET [@Version] = Version[text]

so if we had XML like the below:

<Project>
    <ItemGroup>
        <PackageReference Include="System.Text.Json">
            <Version>7.0.2</Version>
        </PackageReference>
        <PackageReference Include="Mono.Cecil">
            <Version>0.11.4</Version>
        </PackageReference>
    </ItemGroup>
</Project>

Applying the mutation would result in XML like the below:

<Project>
    <ItemGroup>
        <PackageReference Include="System.Text.Json" Version="7.0.2">
            <Version>7.0.2</Version>
        </PackageReference>
        <PackageReference Include="Mono.Cecil" Version="0.11.4">
            <Version>0.11.4</Version>
        </PackageReference>
    </ItemGroup>
</Project>

Here we are missing a delete so most probably that is not an intended result. But it demonstrates the transfer of text value from the Version sub-node to the Version attribute.

Tail

Like with text, the tail will always look the same: [tail]. Here we are saying we want to pick node tail as a value. Node tail is a text after the nodes closing tag. Example tail usage below.

GET Project/ItemGroup
SET PackageReference[tail] = ""

If we had XML like below.

<Project>
    <ItemGroup>
        <PackageReference Include="System.Text.Json"/>
        <PackageReference Include="Mono.Cecil"/>
    </ItemGroup>
</Project>

and would apply the mutation the result would be like the below.

<Project>
    <ItemGroup>
        <PackageReference Include="System.Text.Json"/><PackageReference Include="Mono.Cecil"/>
    </ItemGroup>
</Project>

Name

The name will also always look the same: [name]. It gets the node name as a value. Example name usage below.

GET Project/ItemGroup/PackageReference
SET [name] = "ProjectReference"

If we had XML like below.

<Project>
    <ItemGroup>
        <PackageReference Include="System.Text.Json"/>
        <PackageReference Include="Mono.Cecil"/>
    </ItemGroup>
</Project>

And would apply the mutation the result would be like the below.

<Project>
    <ItemGroup>
        <ProjectReference Include="System.Text.Json"/>
        <ProjectReference Include="Mono.Cecil"/>
    </ItemGroup>
</Project>

DELETE

DELETE {path_variant}, {path_variant}, ...

The delete clause allows removing parts of an XML. A path variant can be either a node path (target a specific XML to be removed) or a value selector (target a specific value to be removed). An example was already presented in the beginning.

DELETE Version

Here we are saying that we want to delete the Version sub-node. So if we had a full Mutation like below.

GET Project/ItemGroup/PackageReference
DELETE Version

And applied it to the XML like below.

<Project>
    <ItemGroup>
        <PackageReference Include="System.Text.Json">
            <Version>7.0.2</Version>
        </PackageReference>
        <PackageReference Include="Mono.Cecil">
            <Version>0.11.4</Version>
        </PackageReference>
    </ItemGroup>
</Project>

The result would be like the one below.

<Project>
    <ItemGroup>
        <PackageReference Include="System.Text.Json"/>
        <PackageReference Include="Mono.Cecil"/>
    </ItemGroup>
</Project>

CLI Instalation

Now that you know the language lets put it into use with some CLI. You can install the CLI tool using cargo install. In case you have not installed the Rust language yet do it first.

 cargo install xml-mut-cli --git https://github.com/tomuxmon/xml-mut

Some day it willl be mature enough, and we will be able to ommit the --git part.

CLI Usage Examples

When using CLI you need to supply a path to your XML mutation file. A single XML mutation file can contain multiple mutation definitions. Next, you decide if you want to specify XML files one by one and use the include command or scan the directory and use the scan command. You can consult how to use CLI with a help call.

xml-mut --help

include command

You can consult how to use the include command with a help call:

xml-mut include --help

The usage is as follows:

xml-mut <XML_MUT_PATH> include --xml-path <XML_PATH>

Here <XML_MUT_PATH> is a path to your XML mutation file. You can give it .xmlmut extension but it is not mandatory so far. --xml-path or -x argument can be repeated allowing you to include multiple XML files to be mutated. So a call could look something like this:

xml-mut ~/pref-version-fix.xmlmut include -x ~/code/awesome.csproj -x ~/code/amazing.fsproj

scan command

You can consult how to use the scan command with a help call:

xml-mut scan --help

The usage is as follows:

xml-mut <XML_MUT_PATH> scan --extension <EXTENSION> <BASE_PATH>

Here <XML_MUT_PATH> is a path to your XML mutation file. You can give it .xmlmut extension but it is not mandatory so far. --extension or -e allows specifying what file extensions to include when scanning the directory. You can specify multiple extensions. <BASE_PATH> defines a path you want to scan. So a call could look something like this:

xml-mut ~/pref-version-fix.xmlmut scan -e csproj -e fsproj ~/code

Architexture

There are 4 crates so far. xml-mut-data - is where all data structures of the XML mutation language reside. xml-mut-parse crate uses nom to parse XML mutation definition. xml-mut-impl uses roxmltree (read-only XML tree) and extends it to be able to process XML with mutation definitions. xml-mut-cli uses clap to combine both XML mutation parsing and read-only XML extensions to process XML with mutation definitions. All of this is just in a proof of concept stage and will likely change in the future.

Is it stable

it is still version 0.0.0. But you can try it out and report any issues you had.

But tell me why

Currently, the only option to transform your XML is using XSLT. Most of the time it is overkill. Let us compare with an example we had in the beggining. What we actually wanted is a "simple xslt to get version sub node and place it as an attribute of the node". It has been a long long time since I wrote XSLT. So instead we will cheat and ask AI to write it for us.

<xsl:template match="node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="version">
    <xsl:attribute name="version">
        <xsl:value-of select="."/>
    </xsl:attribute>
</xsl:template>

This does not look simple. We have to learn the complicated XSLT language and the syntax. Also, XSLT is same old XML. Instead, for simple transformations, simple readable definitions should be enough. Above XSLT could be expressed in a much simpler way with xml-mut below.

GET Project/ItemGroup/PackageReference
SET [@Version] = Version[text]
DELETE Version

So xml-mut is trying to bring the simplicity of SQL (only the simple parts 😅) into XML transformation land.

License

MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)

tomuxmon / xml-mut