Report can be found here
PDF version can be found here
A cross-platform full-text search engine for local filesystem.
An alternative to Apache Tika for C# ecosystem written in .NET Core
, C#
.
- Custodian API ( Searching / Indexing API )
- Indexing Module
- OCR
- Searching Module
- Semantic search
- Indexing Module
- Custodian Web API ( REST Web API )
- librarian-ui-nw-vue (NW.js + Vue)
- [Planed] Librarian ( Cross-platform Native UI )
Icon | 🚧 | |||
---|---|---|---|---|
Meaning | Planed | Done | WIP | Potential issues |
-
API
- 64-bit multi-platform support.
- Indexing.
- Searching.
- Semantic Search. !!!
- Web API + API Documentation.
-
File type support
- Microsoft Office 2007+ documents support.
-
.docx
,.docm
| Word 2007+ documents. -
.pptx
,.pptm
| PowerPoint 2007+ documents.⚠️ -
.xlsx
,.xlsm
| Excel 2007+ documents.⚠️
-
- Plain-text documents.
-
.txt
| Text documents. -
.md
| Markdown documents. 🚧 -
.rtf
| Rich text format documents.
-
-
.pdf
| PDF documents.- Text-based PDF.
- Image-only PDF. 🚧
-
.epub
,.mobi
- Microsoft Office 2007+ documents support.
-
Testing
Icon Meaning ✅ Fully Tested ✔️ Partially Tested
( Basic functionalities )File Type Testing Status .txt
✅ .docx
,.docm
✅ .pptx
,.pptm
✔️ .xlsx
,.xlsm
✔️ .pdf
✔️
- ~ Abandon Electron for Node.js API~
Adopting C# + Xamarin.Mac due to better performance (benchmark)- Focus on CLI & API.
- Use nw.js + vue.js just for the demonstration of the work flow.
- Go through Apache Tika's source code to find a solution for ooxml parsing.
- A potential C# text extraction framework (Tika C# alternative).