Shriphani Palakodety's repositories
enlive-helper
A more powerful html-resource for use with enlive's functions
clj-heritrix
Clojure implementation of the heritrix REST API
structural_similarity
Compare html documents for similarity in structure (or template)
consistent-hashing
Consistent hashing implementation in clojure
index-page-crawler
Follow pagination and get pages
web-corpus
Clueweb web corpus pipeline
armies-dataset
data visualization experiment with incanter
art
To enrich the mind
clj-kba
Clojure code to read and parse the KBA corpus
clueweb-clj
Clojure interface to the Clueweb12 dataset
cpi-vis
Visualizing CPI data
detect-leaf
Clojure module for forum leaf detection
enlive
a selector-based (à la CSS) templating and transformation system for Clojure
google-phantomjs-expt
Google phantomjs experiment
gorilla-repl
A rich REPL for Clojure in the notebook style.
heritrix-3.2.0
Personal mirror of Heritrix 3.2.0 (stable) version.
html-mining-utils
Utils to help mine HTML documents
hw0-spalakod
homework 1
kba-2013-clj
Clojure tools to work with the 2013 streamcorpus
near-duplicate
Near duplicate detection repo for web-pages sharing the same template
papers-we-love
Papers from the computer science community to read and discuss.
process-common-crawl
Process the common crawl dataset for clueweb
query_indri
Experiments on a clueweb indri index
single-hop-heritrix-job
A heritrix config file to do a single hop crawl from a given seed.
sketchy
Sketching Algorithms for Clojure (bloom filter, min-hash, hyper-loglog, count-min sketch)
vertexAPI2
A vertex-centric CUDA/C++ API for large graph analytics on GPUs using the Gather-Apply-Scatter abstraction
visualizations
Visualizations done by me
warc-clojure
Clojure wrapper around a Java library to read warc files.