Tim Allison (tballison)

tballison

Geek Repo

Company:Rhapsode Consulting LLC

Home Page:https://mastodon.social/@tallison

Github PK Tool:Github PK Tool

Tim Allison's repositories

quaerite

Search relevance evaluation toolkit

Language:JavaLicense:NOASSERTIONStargazers:30Issues:4Issues:12

commoncrawl-fetcher-lite

Simplified version of a common crawl fetcher

Language:JavaLicense:Apache-2.0Stargazers:9Issues:3Issues:15

file-observatory

Single server/laptop grade file-observatory

Language:JavaLicense:Apache-2.0Stargazers:9Issues:6Issues:8

tika-gui-v2

Unofficial user interface for Apache Tika

Language:HTMLLicense:Apache-2.0Stargazers:5Issues:3Issues:71

SimpleCommonCrawlExtractor

Simple wrapper around IIPC Web Commons to take a literal warc.gz and extract standalone binaries

Language:JavaLicense:Apache-2.0Stargazers:4Issues:5Issues:0

cord-19

Data munging for CORD-19

Language:JavaLicense:NOASSERTIONStargazers:3Issues:2Issues:0

share

Public share

awesome-digital-preservation

Carefully curated list of awesome digital preservation resources.

Language:JavaScriptLicense:CC0-1.0Stargazers:1Issues:1Issues:0

hodgepodge

one off dev repo, very experimental

Language:HTMLStargazers:1Issues:2Issues:0

language-detector

Language Detection Library for Java

License:Apache-2.0Stargazers:1Issues:0Issues:0

tika-addons

Addons not part of the official Tika release

any23

Apache Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents.

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

commons-compress

Mirror of Apache Commons Compress

Language:JavaLicense:Apache-2.0Stargazers:0Issues:3Issues:0

commons-io

Apache Commons IO

Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

droid

DROID (Digital Record and Object Identification)

Language:JavaLicense:BSD-3-ClauseStargazers:0Issues:1Issues:0
Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

incubator-stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm

License:Apache-2.0Stargazers:0Issues:0Issues:0

junrar

plain java unrar util (former sf project)

Language:JavaLicense:NOASSERTIONStargazers:0Issues:2Issues:0
Language:JavaLicense:Apache-2.0Stargazers:0Issues:3Issues:1

metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image files

Language:JavaLicense:Apache-2.0Stargazers:0Issues:2Issues:0

nanite

Nanite - a friendly swarm of format-identifying robots.

Language:JavaStargazers:0Issues:2Issues:0

nutch

Apache Nutch is an extensible and scalable web crawler

Language:JavaLicense:Apache-2.0Stargazers:0Issues:1Issues:0

opensearch-java

Java Client for OpenSearch

Language:JavaLicense:Apache-2.0Stargazers:0Issues:1Issues:0

poi

Mirror of Apache POI

Language:JavaStargazers:0Issues:0Issues:0

tika-arlington-pdf-model

Simple wrapper around the Arlington PDF model's TestGrammar

Language:DockerfileLicense:Apache-2.0Stargazers:0Issues:2Issues:0

tika-detector-stormcrawler

Wraps the charset detection logic from StormCrawler as a Tika module

Language:JavaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

tika-docker

Convenience Docker images for Apache Tika Server

Language:ShellLicense:Apache-2.0Stargazers:0Issues:2Issues:0

tika-eval-multi-comparer

Demo tika-eval-multi-comparer

Language:JavaStargazers:0Issues:0Issues:0