USC Information Retrieval & Data Science (USCDataScience)

USC Information Retrieval & Data Science

USCDataScience

Geek Repo

USC Information Retrieval and Data Science Group

Location:Los Angeles, CA

Home Page:http://irds.usc.edu/

Github PK Tool:Github PK Tool

USC Information Retrieval & Data Science's repositories

sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Language:JavaLicense:Apache-2.0Stargazers:409Issues:47Issues:153

supervising-ui

Web UI for labelling dataset for supervised learning.

Language:PythonLicense:Apache-2.0Stargazers:77Issues:6Issues:5

Image-Similarity-Deep-Ranking

Deep Ranking based ImageSimilarity will be developed as plugin on ImageSpace. https://users.eecs.northwestern.edu/~jwa368/pdfs/deep_ranking.pdf

autoextractor

A toolkit for clustering web pages based on various similarity measures.

Language:JavaLicense:Apache-2.0Stargazers:33Issues:7Issues:8

SentimentAnalysisParser

Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.

NLTKRest

This is a REST Server endpoint built using Flask and Python.

Language:JavaLicense:Apache-2.0Stargazers:23Issues:4Issues:7

tika-dockers

A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video

AgePredictor

Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum

Language:JavaLicense:Apache-2.0Stargazers:15Issues:2Issues:10

polar.usc.edu

Polar USC activities related to NSF Polar CyberInfrastructure program at the University of Southern California

Language:HTMLLicense:Apache-2.0Stargazers:15Issues:7Issues:51

polar-deep-insights

Conceptual - Temporal - Spatial analysis of the trec polar dataset

parser-indexer-py

Python tools for parsing documents and building the inverted index with enriched metadata. Java version with slightly different features - https://github.com/USCDataScience/parser-indexer

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9Issues:7Issues:22

uscdatascience.github.io

USC Information Retrieval and Data Science Group

Language:HTMLLicense:Apache-2.0Stargazers:9Issues:4Issues:15

cmu-fg-bg-similarity

CMU Foreground/Background Similarity Server from DARPA MEMEX

Language:C++License:Apache-2.0Stargazers:7Issues:9Issues:0

img2text

Models, and associated helper code for GSOC 2017 project Tensorflow Image to Text in Apache Tika

Language:PythonLicense:Apache-2.0Stargazers:7Issues:9Issues:0

ufo.usc.edu

Collection of projects from IRDS students studying unidentified flying objects

Language:HTMLLicense:Apache-2.0Stargazers:6Issues:8Issues:1

deepsentirank

Deep Learning based Sentiment Ranking for Multimedia

Language:PythonLicense:Apache-2.0Stargazers:5Issues:9Issues:0

file-content-analyzer

A set of python modules to perform Byte Frequency Analysis, Byte Frequency Correlation, Cross Correlation and FHT analysis on files

Language:PythonStargazers:5Issues:3Issues:0

pdi-topics

LDA Topic Modeling for Polar Data Insights

Language:HTMLLicense:LGPL-3.0Stargazers:5Issues:3Issues:0

PolarDataCollection

Using Google Search API we collect URLs relevant to the Polar Domain for deep insights and intelligent crawling

Language:HTMLStargazers:3Issues:9Issues:0

PolarPostProcessing

This code gets connected to Solr DB created for Sparkler Crawled Data to do further data extraction, classification, filtering and insights generation using various Machine Learning models. The ML models are capable of using keywords list from user, extract features from URL content, and classify (score) output and update Solr parameter accordingly. Apache Sparkler Link: https://github.com/USCDataScience/sparkler

Language:Jupyter NotebookStargazers:3Issues:7Issues:0

sweet-neo4j

A ruby parser using linkeddata and RDF to fetch the JPL Sweet ontology and load it into Neo4J for cool graph queries and examination.

Language:RubyStargazers:3Issues:4Issues:0

liresolr

Putting LIRE into Solr - an ongoing project

Language:JavaLicense:GPL-2.0Stargazers:2Issues:8Issues:0

pdftabextract

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

Language:PythonLicense:Apache-2.0Stargazers:2Issues:7Issues:0

tika-dl-models

A place to release saved machine learning models for tika-dl

License:Apache-2.0Stargazers:2Issues:9Issues:0

tika-ner-corenlp

Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser

Language:JavaLicense:GPL-3.0Stargazers:1Issues:7Issues:0
Language:PythonStargazers:0Issues:9Issues:0

Ocean_Observation_FacetView

This is a FacetView setup for ocean observation Crawled Data.

Language:JavaScriptLicense:Apache-2.0Stargazers:0Issues:8Issues:0

sce-domain-discovery

Domain Discovery for the Sparkler Crawl Environment

Language:HTMLLicense:Apache-2.0Stargazers:0Issues:8Issues:0