kasraavand / PubData

Smart search engine for all bioinformatics databases worldwide

Home Page:http://www.pubdata.bio

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pubdatalogo

PubData

About

PubData is a search engine and file retrieval system for all bioinformatics databases worldwide. PubData searches biomedical FTP data in a user-friendly fashion similar to how PubMed searches biomedical literature. PubData is hosted as a standalone GUI software program, while PubMed is hosted as an online web server. PubData is built on novel network programming and natural language processing algorithms that can patch into the FTP servers of any user-specified bioinformatics database, query its contents, and retrieve files for download.

Future plans include adding web server support for PubData, and contributions from the open source community are welcome. Refer to the PubData paper for more info: http://dx.doi.org/10.1101/069575

PubData is designed as a graphical user interface (GUI) software program written in the Python programming language and PyQt4 (Python binding of the cross-platform GUI toolkit Qt). PubData can remotely search, access, view, and retrieve files from the deeply nested directory trees of any major bioinformatics database via a local computer network.

By assembling all major bioinformatics databases under the roof of one software program, PubData allows the user to avoid the unnecessary hassle and non-standardized complexities inherent to accessing databases one-by-one using an Internet browser. More importantly, it allows a user to query multiple databases simultaneously for user-specified keywords (e.g., human, cancer, transcriptome). As such, PubData allows researchers to search, access, view, and download files from the FTP servers of any major bioinformatics database directly from one centralized location. By using only a GUI, PubData allows the user to simultaneously surf multiple bioinformatics FTP servers directly from the comfort of their local computer.

PubData is an ongoing bioinformatics software project financially supported by the United States Department of Defense (DoD) through the National Defense Science and Engineering Graduate Fellowship (NDSEG) Program. This research was conducted with Government support under and awarded by DoD, Army Research Office (ARO), National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a.

Please cite: "Khomtchouk et al.: 'PubData: search engine for bioinformatics databases worldwide', 2016: http://dx.doi.org/10.1101/069575" within any source that makes use of any methods inspired by PubData.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Installation

Requirements

  • Python >= 2.7

Linux:

  • Make sure you have Python 2.7 installed.
  • Python libraries needed: PyQt4, PyQt
  • Git clone PubData directory.
  • Navigate to /interface directory.
  • Run “python GUI.py”.
  • If you are missing a Python library, it will tell you when running this.

Screenshots

s1_lean s2_lean s3_lean s4_lean s5_lean

When you open PubData, first pick a bioinformatics database to login to:

s6_lean

Logged into PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System database:

s7_lean

If you don’t see your favorite database in the list, you can manually insert it yourself (convenient for recently published databases):

s8_lean s9_lean

Let’s say you want to "Google search" multiple databases simultaneously:

s10_lean

Keyword search for ChIP-seq files across these selected databases (multiple keywords may be used as well):

s11_lean

Showing all relevant search results pertaining to ChIP-seq files across all selected databases:

s13_lean

Keyword search for RNA-seq files across these selected databases (multiple keywords may be used as well):

s12_lean

Showing all relevant search results pertaining to RNA-seq files (from the selected databases):

s14_lean

About

Smart search engine for all bioinformatics databases worldwide

http://www.pubdata.bio


Languages

Language:Python 100.0%