mrueda / pheno-search

Streamlined Searching in GA4GH-Standard Phenotypic and Clinical Data Repositories and Beyond

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pheno-Search

Streamlined Searching in GA4GH-Standard Phenotypic and Clinical Data Repositories and Beyond

Build and Test Coverage Status version Docker Build Docker Pulls Docker Image Size Documentation Status License: Artistic-2.0

Documentation: https://mrueda.github.io/pheno-search

Docker Hub Image: https://hub.docker.com/r/manuelrueda/pheno-search/tags

Download and Installation

Installing Elasticsearch

ElasticSearch LICENSE.

From Docker Image:

To pull the Docker image, use the following command:

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.10.0

Running the Image

To run the image, execute:

docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.0

Installing jq

To install jq, run:

sudo apt-get install jq

Data Ingestion

Suppose you have a file named data/individuals.json containing 100 entries. First, you'll need to process it to make it compatible with the Elasticsearch API:

jq -c '.[] | {"index": {"_index": "dataset1"}}, .' data/individuals.json > dataset1.json

Now perform the data ingestion:

curl -H "Content-Type: application/json" -XPOST "http://localhost:9200/index_name/_bulk?pretty" --data-binary "@dataset1.json"

This command flattens the data, potentially losing its nested structure. If maintaining nestedness is crucial, you'll need to use a data/mapping.json file to inform Elasticsearch of the data's structure.

Deleting the Old Index

First, delete the old index:

curl -X DELETE "http://localhost:9200/dataset1"

Sending the Right Parameters

Then, create the index with the correct structure:

 curl -X PUT "http://localhost:9200/dataset1" -H 'Content-Type: application/json' -d'@data/mapping.json'

Now perform the data ingestion:

curl -H "Content-Type: application/json" -XPOST "http://localhost:9200/index_name/_bulk?pretty" --data-binary "@dataset1.json"

Data Query

To query for "Alzheimer disease, susceptibility to", use curl:

curl -X GET "http://localhost:9200/dataset1/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "nested": {
      "path": "diseases",
      "query": {
        "bool": {
          "must": [
            { "match": { "diseases.diseaseCode.label": "Alzheimer disease, susceptibility to" }}
          ]
        }
      }
    }
  }
}
'

Pheno-Search

To install the required modules, run:

pip install -r requirements.txt

To execute the code, run:

python3 pheno-search.py

About

Streamlined Searching in GA4GH-Standard Phenotypic and Clinical Data Repositories and Beyond

License:Other


Languages

Language:Python 100.0%