inbo / data-publication

🔓 Open biodiversity data publication by the INBO

Home Page:https://ipt.inbo.be

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot republish Florabank

peterdesmet opened this issue · comments

I cannot republish the Florabank dataset.

The dataset is currently set to monthly auto-publishing. But due to an unknown error, auto-publication has failed and the dataset is indicated in red in the list of resources. The last successful publication date is 2015-08-17 (also listed as such on the resource page and on GBIF). The last publication date listed in the list of resources is 2015-09-09 however. The difference between those dates might be the reason why the resource page indicates:

Please be aware, this is an old version of the dataset.

... even though it is the latest version.

Steps I took to republish:

  1. I select select interval to turn of the auto-publishing
  2. I click Publish & turn off auto-publishing
  3. A dialog box pops up
  4. I enter a summary of the changes
  5. I click Yes to publish
  6. Nothing happens

@kbraak, could this be caused to the double publication date? How to fix this?

This issue is related to #84 and #103

It's symptomatic of IPT issue #1223. Please try to follow the instructions here to try and fix. Of course this bug fix is included in the next minor version of the IPT (2.3.3).

Thanks @kbraak, that fixed the issue. We've now discovered duplicate occurrenceIDs, so we'll fix that first and try to publish again

We can't discover any missing or non-unique occurrenceIDs in the source data, so we don't now why the validation fails (publication log):

Archive generation started for version #45.4
Start writing data file for Darwin Core Occurrence
No lines were skipped due to errors for mapping Darwin Core Occurrence in source florabank1
No lines were skipped due to errors for mapping Darwin Core Occurrence in source florabank1
No lines with fewer columns than mapped for mapping Darwin Core Occurrence in source florabank1
All lines match the filter criteria for mapping Darwin Core Occurrence in source florabank1
Data file written for Darwin Core Occurrence with 3779710 records and 34 columns
All data files completed
EML file added
meta.xml archive descriptor written
Validating the core file: occurrence.txt. Depending on the number of records, this can take a while.
? Validating the core basisOfRecord is always present is always present and its value matches the Darwin Core Type Vocabulary.
? Validating the core ID field occurrenceID is always present and unique.
Archive generation failed!
org.gbif.ipt.task.GeneratorException: Problem occurred while validating DwC-A
    at org.gbif.ipt.task.GenerateDwca.validate(GenerateDwca.java:377)
    at org.gbif.ipt.task.GenerateDwca.call(GenerateDwca.java:952)
    at org.gbif.ipt.task.GenerateDwca.call(GenerateDwca.java:64)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:345)
    at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
    at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
    at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
    at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:135)
    at java.io.OutputStreamWriter.write(OutputStreamWriter.java:220)
    at java.io.Writer.write(Writer.java:157)
    at org.gbif.utils.file.FileUtils.sortAndWrite(FileUtils.java:802)
    at org.gbif.utils.file.FileUtils.sortInJava(FileUtils.java:625)
    at org.gbif.utils.file.FileUtils.sort(FileUtils.java:592)
    at org.gbif.ipt.task.GenerateDwca.sortCoreDataFile(GenerateDwca.java:411)
    at org.gbif.ipt.task.GenerateDwca.validateCoreDataFile(GenerateDwca.java:657)
    at org.gbif.ipt.task.GenerateDwca.validate(GenerateDwca.java:367)
    ... 6 more

I have posted a message on ipt@lists.gbif.org

Caused by: java.io.IOException: No space left on device

How much free disc space is there on the server? That could be the problem. The current DWCA needs 3.3GB unpacked, and it looks like that's what the IPT is doing.

Right! Good suggestion, thanks!

Increasing storage solved the issue.