Need help for spliting corpus in multiple documents with XPATH
wilcar opened this issue · comments
Wilfrid Cariou commented
I have press corpus with multiples articles from different newspapers that I consider as authors. I want to perform a text mining by understanding the different authors.
I have an XML file and I am a beginner : can you help to complete the importation options ?
Thank you for helping
Here the begining of my xml file :
<?xml version="1.0" encoding="UTF-8"?>
<root encoding="UTF-8">
<record>
<content>
EVENEMENT, jeudi 12 mars 1998 555 mots, p. 4
"Le plus complexe, c'est l'information du malade". Un médecin réanimateur a mené une
enquête sur les attentes des patients.
</content>
<author>Libération</author>
<dates>jeudi 12 mars 1998</dates>
<publication_date>1998-03-12</publication_date>
<longueur>5129</longueur>
</record>
</root>
Andrew MacDonald commented
Try the following XPATHs:
contenu: //contents
auteur: //author
documents: //record
date de publication: //publication_date
Wilfrid Cariou commented
Thank you for helping. It works great.