hou2zi0 / csv-to-epidoc

Converts a character-separated value file into basic TEI-XML EpiDoc files that are used in epigraphic research in the Humanities.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

csv-to-epidoc

Many scholars — in epigraphy as well as in other fields associated with scholarly editions — tend to record and organize their data, compiled on field or archive trips, as tabular data, before starting to compose a historical narrative or a printable scholarly edition.

This tool wants to provide an easy path to map the columns of the tabular data to specific fields in a basic EpiDoc template, therefore giving the means to easily convert CSV files into basic EpdiDoc. Starting from these basic EpiDoc files the scholar may dive further into XML based approaches to digital scholarly editing or just simply fill in missing values and provide basic EpiDoc versions of her scholarship as reusable research data.

Mapping

Converts a character-separated value file into TEI-XML EpiDoc files. The conversion is based on a basic EpiDoc template and – currently very basic – conversion functions associated with different sections of the EpiDoc template.

The user loads up a CSV-file and selects the mapping of columns to EpiDoc section by usage of the dropdown menus generated based on the CSV file’s columns. Subsequently, the user mapping is applied to each row of the CSV. The generated EpiDoc files are bundled into an teiCorpus and downloaded; because not all attribute values can be set automatically, the file will be well formed, but not valid.

Upload files

Mapping of column names to   EpiDoc sections

Filled in EpiDoc template

Example file

An example csv file with pipe separators may be found here.

The example file was provided by Thomas Kollatz (GitHub, ORCID) and is licensed as follows:

Steinheim-Institut

http://www.steinheim-institut.de:80/cgi-bin/epidat?id=ffb lizenziert unter einer Creative Commons Lizenzvertrag Creative Commons Namensnennung 4.0 Internationale Lizenz.

Conversion process and customizability

The in-file conversion procedure is controlled by a switch statement having one arm for each EpiDoc element or section, that needs further processing Thus it is easily customizable. See below:

[]
switch (element) {
  case 'person':
    return text.split('\n')
      .map((textblock, index) => {
        return `<person xml:id="${generateID(textblock)}" sex="1">
                    <persName>
                    ${textblock.trim()}
                    </persName>
                    <birth/>
                    <death/>
                    <floruit/>
                </person>`
      })
      .join('\n');
    break;
  case 'lb':
    return text.split('\n')
      .map((textblock, index) => {
        return `<${element} n="${index+1}"/>${textblock.trim()}`
      })
      .join('\n');
    break;
[]

Specific conversions

Paragraphs and linebreaks

EpiDoc section that contain linebreaks or paragraphs will try to split incoming textparts on a newline (“enter key”) and subsequently try to apply basic conversion to produce the required EpiDoc XML markup for this section.

Other list-like structures

Other “list-like structures”, like handNote or persList will try to split incoming textparts on a newline (“enter key”) and subsequently try to apply basic conversion to produce the required EpiDoc XML markup for this section.

Dimensions

The dimensions section expects one string height x width x depth and trys to split it on x (this feature will be customizable, see To Do).

Software used

The tool is based on the JavaScript libraries D3 and _lodash. It was inspired by reading the Learn JS Data tutorial.

To Do

  • Make mor customization possible, e.g. for splitting of textparts.
  • Provide possibility to add a CSV file containing data about the persons referenced in persList.
  • Better support for none/NaN values or missing values.
  • Adding dropdown for facsimile section.

License

MIT License

Copyright (c) 2018 Max Grüntgens (猴子)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

Converts a character-separated value file into basic TEI-XML EpiDoc files that are used in epigraphic research in the Humanities.

License:MIT License


Languages

Language:HTML 71.8%Language:JavaScript 28.2%