USC Information Retrieval & Data Science's repositories
supervising-ui
Web UI for labelling dataset for supervised learning.
Image-Similarity-Deep-Ranking
Deep Ranking based ImageSimilarity will be developed as plugin on ImageSpace. https://users.eecs.northwestern.edu/~jwa368/pdfs/deep_ranking.pdf
autoextractor
A toolkit for clustering web pages based on various similarity measures.
SentimentAnalysisParser
Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.
tika-dockers
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
AgePredictor
Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum
polar.usc.edu
Polar USC activities related to NSF Polar CyberInfrastructure program at the University of Southern California
polar-deep-insights
Conceptual - Temporal - Spatial analysis of the trec polar dataset
parser-indexer-py
Python tools for parsing documents and building the inverted index with enriched metadata. Java version with slightly different features - https://github.com/USCDataScience/parser-indexer
uscdatascience.github.io
USC Information Retrieval and Data Science Group
cmu-fg-bg-similarity
CMU Foreground/Background Similarity Server from DARPA MEMEX
ufo.usc.edu
Collection of projects from IRDS students studying unidentified flying objects
deepsentirank
Deep Learning based Sentiment Ranking for Multimedia
file-content-analyzer
A set of python modules to perform Byte Frequency Analysis, Byte Frequency Correlation, Cross Correlation and FHT analysis on files
pdi-topics
LDA Topic Modeling for Polar Data Insights
PolarDataCollection
Using Google Search API we collect URLs relevant to the Polar Domain for deep insights and intelligent crawling
PolarPostProcessing
This code gets connected to Solr DB created for Sparkler Crawled Data to do further data extraction, classification, filtering and insights generation using various Machine Learning models. The ML models are capable of using keywords list from user, extract features from URL content, and classify (score) output and update Solr parameter accordingly. Apache Sparkler Link: https://github.com/USCDataScience/sparkler
sweet-neo4j
A ruby parser using linkeddata and RDF to fetch the JPL Sweet ontology and load it into Neo4J for cool graph queries and examination.
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
tika-dl-models
A place to release saved machine learning models for tika-dl
tika-ner-corenlp
Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser
Ocean_Observation_FacetView
This is a FacetView setup for ocean observation Crawled Data.
sce-domain-discovery
Domain Discovery for the Sparkler Crawl Environment