ImJeremyHe / xmlserde

A user-friendly Rust library for serializing or deserializing the XML files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support tags containing child, text or both

Shohaii opened this issue · comments

Hi Jeremy, thanks for great crate.

I am trying to deserialize following xml:

<office:document-content>
    <office:body>
        <office:text text:use-soft-page-breaks="true">
            <text:p text:style-name="Normal">
                <text:span text:style-name="T2">Progress:<text:s/>
                </text:span>
                <text:span text:style-name="T3">
                    <office:annotation office:name="0" xml:id="1825723351">
                        <dc:creator>Name Surname</dc:creator>
                        <dc:date>2024-03-12T10:30:00</dc:date>
                        <meta:creator-initials>NS</meta:creator-initials>
                        <text:p text:style-name="CommentText">progress indicator</text:p>
                    </office:annotation>100%</text:span>
                <text:span text:style-name="CommentReference">
                    <office:annotation-end office:name="0"/>
                </text:span>
            </text:p>
            <text:p text:style-name="P4">Task completed!</text:p>
            <text:p text:style-name="P5"/>
        </office:text>
    </office:body>
</office:document-content>

Yes, I know, it has nothing in common with LogiSheets, but your crate works almost perfectly even for ODT file.

The problem is with preparation of rust structs and enums for deserialization of:

  • <text:p> - can have a children or can have a text or none of it
  • <text:span> - can have a children <office:annotation>, <office:annotation-end>, <text:s/> and text

I tried to implement it like this (ignoring attributes):

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct OfficeText {
    #[xmlserde(name = b"text:p", ty = "child")]
    pub text_p: TextP,
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct TextP {
    #[xmlserde(ty = "untag")]
    pub text_p_content: Option<TextPContent>,
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub enum TextPContent {
    #[xmlserde(ty = "text")]
    Text(String),
    #[xmlserde(name = b"text:span")]
    TextSpans(Vec<TextSpan>)
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct TextSpan {
    #[xmlserde(ty = "untag")]
    pub text_span_content: Vec<TextSpanContent>,
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub enum TextSpanContent {
    #[xmlserde(ty = "text")]
    Text(String),
    #[xmlserde(name = b"text:s", ty = "sfc")]
    TextS,
    #[xmlserde(name = b"office:annotation")]
    OfficeAnnotation(OfficeAnnotation),
    #[xmlserde(name = b"office:annotation-end", ty = "sfc")]
    OfficeAnnotationEnd(OfficeAnnotationEnd),
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct OfficeAnnotation {
    // whatever
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct OfficeAnnotationEnd {
    // whatever
}

Unfortunately, this is not compilable, but I wonder if there is a way, how to make it work with xmlserde crate.
Sorry if I made some silly mistake in code, I am new to Rust.

There might be 3 options how to make it work:

  1. I could try to make custom implementation of XmlSerialize and XmlDeserialize traits for these situations
  2. You might consider adding support for these situations to xmlserde crate
  3. You might consider to make members of "Unparsed" struct public, so it would be possible to do manual parsing/deserializing for these tricky parts

I would like to know your opinion, thanks.

@Shohaii Thank you for your report.
If I understand correctly, you want a feature that an enum type that can have variants from child type or text type, right?
like:

pub enum TestEnum {
      #[xmlserde(ty="text")]
       V1(string)
      #[xmlserde(ty="child")]
       V2(StructA)
}

I think this is a good idea. I would like to add a feature like that.

@ImJeremyHe Thank you for your quick reply.
Yes, your suggestion looks great.

I hope that then, the implementation like this:

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub struct TextP {
    #[xmlserde(ty = "untag")]
    pub text_p_content: Vec<TextPContent>,
}

#[derive(Debug, XmlSerialize, XmlDeserialize)]
pub enum TextPContent {
    #[xmlserde(ty = "text")]
    Text(String),
    #[xmlserde(name = b"text:span", ty = "child")]
    TextSpan(TextSpan),
}

would solve all my issues.

  1. vector of children or text or both
  2. empty vector for <text:p text:style-name="P5"/> case

or maybe you can think of a better way.
Anyway something like this would be great.

Thank you for your time.

I will work on this later. Thanks for your confirmation

@Shohaii Please have a look at this unit test.

Hi @ImJeremyHe, the feature works great. thank you.
Long live the crate.