ZayneLiu / Librarian

A Cross-platform full-text search engine for local filesystem.

dotnet-core csharp full-text-search

Report can be found here

PDF version can be found here

Librarian

A cross-platform full-text search engine for local filesystem.

An alternative to Apache Tika for C# ecosystem written in .NET Core, C#.

Project Structure

Custodian API ( Searching / Indexing API )
- Indexing Module
  - OCR
- Searching Module
  - Semantic search
Custodian Web API ( REST Web API )
librarian-ui-nw-vue (NW.js + Vue)
[Planed] Librarian ( Cross-platform Native UI )

Progress

Icon			🚧	⚠️
Meaning	Planed	Done	WIP	Potential issues

API
- 64-bit multi-platform support.
- Indexing.
- Searching.
- Semantic Search. !!!
- Web API + API Documentation.
File type support
- Microsoft Office 2007+ documents support.
  - .docx, .docm | Word 2007+ documents.
  - .pptx, .pptm | PowerPoint 2007+ documents. ⚠️
  - .xlsx, .xlsm | Excel 2007+ documents. ⚠️
- Plain-text documents.
  - .txt | Text documents.
  - .md | Markdown documents. 🚧
  - .rtf | Rich text format documents.
- .pdf | PDF documents.
  - Text-based PDF.
  - Image-only PDF. 🚧
- .epub, .mobi
Testing

Icon Meaning

✅ Fully Tested

✔️ Partially Tested
( Basic functionalities )

File Type Testing Status

.txt ✅

.docx, .docm ✅

.pptx, .pptm ✔️

.xlsx, .xlsm ✔️

.pdf ✔️

Notes

~ Abandon Electron for Node.js API~
~~Adopting C# + Xamarin.Mac due to better performance (benchmark)~~
Focus on CLI & API.
Use nw.js + vue.js just for the demonstration of the work flow.
Go through Apache Tika's source code to find a solution for ooxml parsing.
A potential C# text extraction framework (Tika C# alternative).

About

A Cross-platform full-text search engine for local filesystem.

dotnet-core csharp full-text-search

MIT License

Languages

Language:C# 69.2%Language:Vue 28.3%Language:HTML 1.5%Language:TypeScript 0.8%Language:JavaScript 0.1%