Tim Allison (tballison)

tballison

Geek Repo

Company:Rhapsode Consulting LLC

Home Page:https://mastodon.social/@tallison

Github PK Tool:Github PK Tool

Tim Allison's starred repositories

elasticsearch

Free and Open, Distributed, RESTful Search Engine

Language:JavaLicense:NOASSERTIONStargazers:68785Issues:2687Issues:35742

tesseract

Tesseract Open Source OCR Engine (main repository)

Language:C++License:Apache-2.0Stargazers:60054Issues:1687Issues:2626

markdown-here

Google Chrome, Firefox, and Thunderbird extension that lets you write email in Markdown and render it before sending.

Language:JavaScriptLicense:MITStargazers:59591Issues:1030Issues:616

jsoup

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

OpenRefine

OpenRefine is a free, open source power tool for working with messy data and improving it

Language:JavaLicense:BSD-3-ClauseStargazers:10673Issues:472Issues:3084

tabula

Tabula is a tool for liberating data tables trapped inside PDF files

Language:CSSLicense:MITStargazers:6627Issues:194Issues:0

pdfcpu

A PDF processor written in Go.

Language:GoLicense:Apache-2.0Stargazers:6561Issues:77Issues:780

caldera

Automated Adversary Emulation Platform

Language:PythonLicense:Apache-2.0Stargazers:5402Issues:167Issues:735

lucene-solr

Apache Lucene and Solr open-source search software

License:Apache-2.0Stargazers:4368Issues:303Issues:0

sqlite-jdbc

SQLite JDBC Driver

Language:JavaLicense:Apache-2.0Stargazers:2756Issues:104Issues:598

mp4parser

A Java API to read, write and create MP4 files

Language:JavaLicense:Apache-2.0Stargazers:2742Issues:110Issues:404

tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

Language:JavaLicense:Apache-2.0Stargazers:2339Issues:100Issues:0

open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Language:ShellLicense:GPL-3.0Stargazers:946Issues:54Issues:464

action-automatic-releases

READONLY: Auto-generated mirror for https://github.com/marvinpinto/actions/tree/master/packages/automatic-releases

License:MITStargazers:724Issues:4Issues:0

guides

Now stored here:

License:CC-BY-SA-4.0Stargazers:410Issues:58Issues:0

juniversalchardet

Originally exported from code.google.com/p/juniversalchardet

Language:JavaLicense:NOASSERTIONStargazers:326Issues:17Issues:51

junrar

Plain Java unrar library

Language:JavaLicense:NOASSERTIONStargazers:283Issues:15Issues:47

dd-plist

A java library providing support for ASCII, XML and binary property lists.

Language:JavaLicense:NOASSERTIONStargazers:257Issues:37Issues:51

kelinci

AFL-based fuzzing for Java

Language:JavaLicense:Apache-2.0Stargazers:230Issues:12Issues:12

go-tika

Go package for using Apache Tika

Language:GoLicense:Apache-2.0Stargazers:225Issues:11Issues:14

rated-ranking-evaluator

Search Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures

Language:JavaLicense:Apache-2.0Stargazers:175Issues:12Issues:66

chorus

Towards an open source stack for e-commerce search

Language:RubyLicense:Apache-2.0Stargazers:139Issues:9Issues:58

tika-docker

Convenience Docker images for Apache Tika Server

Language:ShellLicense:Apache-2.0Stargazers:117Issues:15Issues:0

arlington-pdf-model

A vendor- and implementation-independent specification-derived, machine-readable model of PDF.

Language:CLicense:Apache-2.0Stargazers:73Issues:19Issues:81

ocrevalUAtion

OCR evaluation brought to you by University of Alicante

Language:HTMLLicense:Apache-2.0Stargazers:66Issues:15Issues:22

htmlparser

The Validator.nu HTML parser https://about.validator.nu/htmlparser/

Language:JavaLicense:NOASSERTIONStargazers:56Issues:14Issues:19

solr-ocrpayload-plugin

Efficient indexing and retrieval of OCR bounding boxes in Solr

Language:JavaLicense:MITStargazers:22Issues:13Issues:2

file-tests

File-tests is test-suite for File tool. Previous home: https://fedorahosted.org/file-tests/

Language:C++License:GPL-2.0Stargazers:17Issues:6Issues:4

dropwizard-tika-server

A DropWizard wrapper around Apache Tika.

Language:JavaStargazers:10Issues:5Issues:0

tika-gui-v2

Unofficial user interface for Apache Tika

Language:HTMLLicense:Apache-2.0Stargazers:6Issues:3Issues:71