sgsinclair / Voyant

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Need help for spliting corpus in multiple documents with XPATH

wilcar opened this issue · comments

I have press corpus with multiples articles from different newspapers that I consider as authors. I want to perform a text mining by understanding the different authors.
I have an XML file and I am a beginner : can you help to complete the importation options ?
Thank you for helping

image

Here the begining of my xml file :

  <?xml version="1.0" encoding="UTF-8"?>
      <root encoding="UTF-8">
        <record>
          <content>
      EVENEMENT, jeudi 12 mars 1998 555 mots, p. 4&#13;
      "Le plus complexe, c'est l'information du malade". Un médecin réanimateur a mené une&#13;
      enquête sur les attentes des patients.&#13;                                                      
         </content>
          <author>Libération</author>
          <dates>jeudi 12 mars 1998</dates>
          <publication_date>1998-03-12</publication_date>
          <longueur>5129</longueur>
        </record>
    </root>

Try the following XPATHs:
contenu: //contents
auteur: //author
documents: //record
date de publication: //publication_date

Thank you for helping. It works great.