francoisforster / gedcom-cleanup

A simple Kotlin library that compares GEDCOM files, cleans them and performs limited validation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A simple Kotlin library that cleans up GEDCOM (https://en.wikipedia.org/wiki/GEDCOM) files by:

  • removing unreachable individuals and families (from a given starting individual)
  • canonicalizing Notes and Sources and removing duplicates and unreachable ones

It can also compares 2 GEDCOM files from a specified root individual, providing information on differences of names and event places or dates, as well as events and individuals (parents, children, spouses) missing from either file. Optionally it copies missing individuals/families/events found in the diffs, if specifying the -createMissingFrom option

It can also perform some validation on individuals, such as ensuring that sources are specified for all events (birth, death, marriage) that took place at a particular place.

For convenience, executable jars are provided to perform cleanup and comparison (only java runtime is needed to run):

  • java -jar gedcom-cleanup.jar <Input GEDCOM filename> <Starting Individual Reference Id> <Output GEDCOM filename>
  • java -jar gedcom-compare.jar <Input GEDCOM filename> <Starting Individual Reference Id> <Other Input GEDCOM filename> <Starting Individual Reference Id from Other GEDCOM file> [-createMissingFrom {LEFT|RIGHT} <output GEDCOM filename>]

where individual reference id is in the form "@I35@"

The library assumes a well-formed GEDCOM file, doesn't enforce the format and doesn't implement all its specifications.

Library Usage example:

import java.io.FileWriter

fun main() {
    // clean up GEDCOM file
    val gedcom = Gedcom()
    gedcom.parseFile("<Input GEDCOM filename>")
    val rootIndividual = gedcom.getIndividual("<Starting Individual Reference Id>")
    if (rootIndividual != null) {
        gedcom.removeUnreachable(rootIndividual)
    }
    gedcom.cleanUpReferences(SOURCE_TAG, SOURCE_REFERENCE_PREFIX)
    gedcom.cleanUpReferences(NOTE_TAG, NOTE_REFERENCE_PREFIX)
    val writer = FileWriter("<Output GEDCOM filename>")
    gedcom.write(writer)
    writer.close()
 
    // validate sources
    gedcom.validateEvents(::selectByPlaceAndYear, ::validateSource)

    // compare GEDCOM files
    val otherGedcom = Gedcom()
    otherGedcom.parseFile("<Other GEDCOM filename>")
    val gedcomCompare = GedcomCompare(gedcom, otherGedcom)
    gedcomCompare.compareFrom("<Starting Individual Reference Id in GEDCOM file>", "<Starting Individual Reference Id in other GEDCOM file>")
}

fun selectByPlaceAndYear(event: Event): Boolean {
    val year = event.getYear()
    return event.place?.contains("<PLACE>") == true && year != null && year >= <FROM YEAR> && year <= <TO YEAR>
}

fun validateSource(event: Event, gedcom: Gedcom): Boolean {
    val source = gedcom.getSource(event.source)
    return source != null && source.getSourceText()?.contains("http://...") == true
}

About

A simple Kotlin library that compares GEDCOM files, cleans them and performs limited validation


Languages

Language:Kotlin 100.0%