srappel / btaa-metadata-harvesting-guide

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

harvesting-guide

About

This repository contains Jupyter Notebooks for harvesting metadata for the BTAA Geoportal.

The BTAA Geoportal holds metadata records that point to geospatial data, maps, aerial imagery, web services, and websites hosted online by external organizations. The most common way of obtaining this metadata is to programmatically harvest it from an organization's website. These websites may be in the form of a data portal, a static page, or custom platform. Due to the many variations in how the different websites are structured, we have several workflows for obtaining the metadata.

Who are these Notebooks for?

They are primarily intended for the BTAA-GIN Product Manager and Graduate Research Assistants. However, anyone interested in batch metadata harvesting and processing may be able to find useful techniques presented here.

Download the workflow scripts

This Guide is hosted in GitHub at https://github.com/geobtaa/harvesting-guide and has all of the necessary files to run the tutorials as well as Jupyter Notebooks for running the Recipes. Make a fork or new branch of the repository to get started.

Tutorials

The Tutorial section contains short, easy to complete exercises to help someone get the basics of running and writing scripts to harvest metadata.

Recipes

The recipes are step by step workflows for harvesting metadata from specific websites or groups of portals using the same technology. They may involve multiple steps and require manual troubleshooting at times. These guides will need regular maintenance and updates as the source websites may upgrade, change, or disappear.

Credits

The tutorials and recipes were prepared by Alexander Danielson and Karen Majewicz in April 2023.

The recipes also contain code contributed by **Melinda Kernik ** and alumni BTAA graduate research assistants, including:

  • Ziying (Gene) Cheng - (2020-2022)
  • Yijing (Zoey) Zhou - (2020-2021)
  • Emily Ruetz (2018-2020)
  • Andrew Smith (2017-2019)
  • Lewei Hi (2017)

About


Languages

Language:Jupyter Notebook 99.8%Language:Python 0.2%