Innoplexus (innoplexus)

innoplexus

Geek Repo

Company:Innoplexus Consulting Services Pvt Ltd

Location:India

Home Page:https://www.innoplexus.com

Github PK Tool:Github PK Tool

Innoplexus's starred repositories

flink

Apache Flink

Language:JavaLicense:Apache-2.0Stargazers:23897Issues:947Issues:0

luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Language:PythonLicense:Apache-2.0Stargazers:17758Issues:473Issues:995

pyspider

A Powerful Spider(Web Crawler) System in Python.

Language:PythonLicense:Apache-2.0Stargazers:16475Issues:895Issues:824

ceph

Ceph is a distributed object, block, and file storage platform

Language:C++License:NOASSERTIONStargazers:13981Issues:659Issues:0

WebFundamentals

Former git repo for WebFundamentals on developers.google.com

Language:JavaScriptLicense:Apache-2.0Stargazers:13851Issues:652Issues:3930

arangodb

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

Language:C++License:NOASSERTIONStargazers:13532Issues:328Issues:4628

beef

The Browser Exploitation Framework Project

GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings

Language:CLicense:Apache-2.0Stargazers:6840Issues:228Issues:162

tabula

Tabula is a tool for liberating data tables trapped inside PDF files

Language:CSSLicense:MITStargazers:6716Issues:195Issues:0

titan

Distributed Graph Database

Language:JavaLicense:Apache-2.0Stargazers:5248Issues:404Issues:1125

openlibrary

One webpage for every book ever published!

Language:PythonLicense:AGPL-3.0Stargazers:5129Issues:172Issues:4263

oboe.js

A streaming approach to JSON. Oboe.js speeds up web applications by providing parsed objects before the response completes.

Language:JavaScriptLicense:NOASSERTIONStargazers:4785Issues:95Issues:163

flasgger

Easy OpenAPI specs and Swagger UI for your Flask API

Language:PythonLicense:MITStargazers:3601Issues:54Issues:418

heritrix3

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Language:JavaLicense:NOASSERTIONStargazers:2790Issues:188Issues:156

conceptnet5

Code for building ConceptNet from raw data.

Language:RoffLicense:NOASSERTIONStargazers:2776Issues:173Issues:170

facebook-sdk

Python SDK for Facebook's Graph API

Language:PythonLicense:Apache-2.0Stargazers:2739Issues:200Issues:239

gitinspector

:bar_chart: The statistical analysis tool for git repositories

Language:PythonLicense:GPL-3.0Stargazers:2374Issues:59Issues:200

abot

Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

Language:C#License:Apache-2.0Stargazers:2235Issues:199Issues:183

dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark

Language:JavaLicense:Apache-2.0Stargazers:1353Issues:128Issues:316

rappor

RAPPOR: Privacy-Preserving Reporting Algorithms

Language:RLicense:Apache-2.0Stargazers:858Issues:62Issues:39

wayback

IA's public Wayback Machine (moved from SourceForge)

brozzler

brozzler - distributed browser-based web crawler

Language:PythonLicense:Apache-2.0Stargazers:658Issues:36Issues:52

linkedin-scraper

Scrapes the public profile of the linkedin page

Language:RubyLicense:MITStargazers:553Issues:50Issues:58

ebot

Ebot, an Opensource Web Crawler built on top of a nosql database (apache couchdb, riak), AMQP database (rabbitmq), webmachine and mochiweb. Ebot is written in Erlang and it is a very scalable, distribuited and highly configurable web cawler. See wiki pages for more details

Language:ErlangLicense:GPL-3.0Stargazers:330Issues:27Issues:24

OpenNlp

Open source NLP tools (sentence splitter, tokenizer, chunker, coref, NER, parse trees, etc.) in C#

Language:C#License:MITStargazers:281Issues:38Issues:30

commoncrawl-crawler

The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)

Language:JavaLicense:GPL-3.0Stargazers:214Issues:45Issues:1

cdx-index-client

A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/

Language:PythonLicense:MITStargazers:179Issues:12Issues:6
Language:Jupyter NotebookLicense:MITStargazers:91Issues:9Issues:6

analyze_ocr

Parse OCR result files for pagenos, tables of contents, etc.

Language:PythonStargazers:14Issues:8Issues:0

webarchive-indexing

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

Language:PythonLicense:MITStargazers:4Issues:8Issues:5