flash1293 / airbyte-source-git-file

Airbyte source connector for file revisions from git repository

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Git File Source

This is the repository for the Git File source connector, written in Python. For information about how to use this connector within Airbyte, see the documentation.

Data model

The records produced by this source are revisions of files in the configured git repository on the configured branch. The history is linearized by always picking the first parent in the case of merges.

Each record has the following fields:

field description
hash The hash of the commit
authorName The name of the author of the commit
authorEmail The email of the author of the commit
date The date of the commit in ISO format
message The message of the commit
branch The branch the commit is from
repository The URL of the repository
size The size in bytes of the file
loc The number of lines in the file if it's a text file
fileName The path of the file from the root of the repository
operation Either "add" or "remove", signifying whether the referenced commit added the file or removed the file. Modifying a file produces two records
newContent The content of the file after the commit is applied
oldContent The content of the file before the commit was applied. Only exists if there used the be a previous revision of this file
oldSize This size in bytes of the file before the commit was applied
oldLoc The number of lines in the file if it's a text file before the commit was applied
oldFileName The name of the file before the commit was applied. Only exists if there used to be a previous revision of this file (which might be under another file name)

To keep the history of your repository in sync, run incremental syncs which will only add new revisions.

To only keep an image of the latest state, set depth to 1 and always run a full sync

Local development

Prerequisites

To iterate on this connector, make sure to complete this prerequisites section.

Minimum Python version required = 3.9.0

Build & Activate Virtual Environment and install dependencies

From this connector directory, create a virtual environment:

python -m venv .venv

This will generate a virtualenv for this module in .venv/. Make sure this venv is active in your development environment of choice. To activate it from the terminal, run:

source .venv/bin/activate
pip install -r requirements.txt

If you are in an IDE, follow your IDE's instructions to activate the virtualenv.

Note that while we are installing dependencies from requirements.txt, you should only edit setup.py for your dependencies. requirements.txt is used for editable installs (pip install -e) to pull in Python dependencies from the monorepo and will call setup.py. If this is mumbo jumbo to you, don't worry about it, just put your deps in setup.py but install using pip install -r requirements.txt and everything should work as you expect.

Building via Gradle

From the Airbyte repository root, run:

./gradlew :airbyte-integrations:connectors:source-git-file:build

Create credentials

If you are a community contributor, follow the instructions in the documentation to generate the necessary credentials. Then create a file secrets/config.json conforming to the source_git_file/spec.yaml file. Note that the secrets directory is gitignored by default, so there is no danger of accidentally checking in sensitive information. See integration_tests/sample_config.json for a sample config file.

If you are an Airbyte core member, copy the credentials in Lastpass under the secret name source git-file test creds and place them into secrets/config.json.

Locally running the connector

python main.py spec
python main.py check --config secrets/config.json
python main.py discover --config secrets/config.json
python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json

Locally running the connector docker image

Build

First, make sure you build the latest Docker image:

docker build . -t airbyte/source-git-file:dev

You can also build the connector image via Gradle:

./gradlew :airbyte-integrations:connectors:source-git-file:airbyteDocker

When building via Gradle, the docker image name and tag, respectively, are the values of the io.airbyte.name and io.airbyte.version LABELs in the Dockerfile.

Run

Then run any of the connector commands as follows:

docker run --rm airbyte/source-git-file:dev spec
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-git-file:dev check --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-git-file:dev discover --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-git-file:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json

Testing

Make sure to familiarize yourself with pytest test discovery to know how your test files and methods should be named. First install test dependencies into your virtual environment:

pip install .[tests]

Unit Tests

To run unit tests locally, from the connector directory run:

python -m pytest unit_tests

Integration Tests

There are two types of integration tests: Acceptance Tests (Airbyte's test suite for all source connectors) and custom integration tests (which are specific to this connector).

Custom Integration tests

Place custom tests inside integration_tests/ folder, then, from the connector root, run

python -m pytest integration_tests

Acceptance Tests

Customize acceptance-test-config.yml file to configure tests. See Source Acceptance Tests for more information. If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py. To run your integration tests with acceptance tests, from the connector root, run

python -m pytest integration_tests -p integration_tests.acceptance

To run your integration tests with docker

Using gradle to run tests

All commands should be run from airbyte project root. To run unit tests:

./gradlew :airbyte-integrations:connectors:source-git-file:unitTest

To run acceptance and custom integration tests:

./gradlew :airbyte-integrations:connectors:source-git-file:integrationTest

Dependency Management

All of your dependencies should go in setup.py, NOT requirements.txt. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development. We split dependencies between two groups, dependencies that are:

  • required for your connector to work need to go to MAIN_REQUIREMENTS list.
  • required for the testing need to go to TEST_REQUIREMENTS list

Publishing a new version of the connector

You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?

  1. Make sure your changes are passing unit and integration tests.
  2. Bump the connector version in Dockerfile -- just increment the value of the LABEL io.airbyte.version appropriately (we use SemVer).
  3. Create a Pull Request.
  4. Pat yourself on the back for being an awesome contributor.
  5. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.

About

Airbyte source connector for file revisions from git repository


Languages

Language:Python 91.9%Language:Dockerfile 5.7%Language:Shell 2.3%