Master degree in Computer Science, Università di Pisa. Notes of the Information Retrieval course taught by prof. Ferragina.
The markdown files are thought to be converted via Pandoc.
pandoc ??_*.md -o ir-2019.pdf
An updated PDF copy of the notes is automatically generated as an artifact in the actions tab.
A not-surely-updated PDF copy is by the way included in the repository as ir-2019.pdf
.
- Introduction
- Crawling
- LSH
- Deduplication
- Compressed Storage
- Index Construction
- Document Compression
- Document Parsing
- Search
- Posting Compression
- Query Processing
- Document Ranking
- Web ranking
- Packing to fewer dimensions
- Semantic annotation
- LSH k-means comparison table kind of sucks.
- Cosine distance in 03 (now draft)
- Copy block in 04
- MapReduce in distributed indexing
- Fancy-hits heuristic in 11
- Skip Pointers with known distribution in 10
- Confusing WAND description
- Personalized PageRank in 12
- Eigenvalues ↔ SingularValues conditions in 13
- Proof of the cosine-distance bound in 13
- Wiser presentation in 14