ZayneLiu / Librarian

A Cross-platform full-text search engine for local filesystem.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Report can be found here

PDF version can be found here

Librarian

A cross-platform full-text search engine for local filesystem.

An alternative to Apache Tika for C# ecosystem written in .NET Core, C#.

Project Structure

  • Custodian API ( Searching / Indexing API )
    • Indexing Module
      • OCR
    • Searching Module
      • Semantic search
  • Custodian Web API ( REST Web API )
  • librarian-ui-nw-vue (NW.js + Vue)
  • [Planed] Librarian ( Cross-platform Native UI )

Progress

Icon 🚧 ⚠️
Meaning Planed Done WIP Potential issues
  • API

    • 64-bit multi-platform support.
    • Indexing.
    • Searching.
    • Semantic Search. !!!
    • Web API + API Documentation.
  • File type support

    • Microsoft Office 2007+ documents support.
      • .docx, .docm | Word 2007+ documents.
      • .pptx, .pptm | PowerPoint 2007+ documents. ⚠️
      • .xlsx, .xlsm | Excel 2007+ documents. ⚠️
    • Plain-text documents.
      • .txt | Text documents.
      • .md | Markdown documents. 🚧
      • .rtf | Rich text format documents.
    • .pdf | PDF documents.
      • Text-based PDF.
      • Image-only PDF. 🚧
    • .epub, .mobi
  • Testing

    Icon Meaning
    Fully Tested
    ✔️ Partially Tested
    ( Basic functionalities )
    File Type Testing Status
    .txt
    .docx, .docm
    .pptx, .pptm ✔️
    .xlsx, .xlsm ✔️
    .pdf ✔️

Notes

  • ~ Abandon Electron for Node.js API~
  • Adopting C# + Xamarin.Mac due to better performance (benchmark)
  • Focus on CLI & API.
  • Use nw.js + vue.js just for the demonstration of the work flow.
  • Go through Apache Tika's source code to find a solution for ooxml parsing.
  • A potential C# text extraction framework (Tika C# alternative).

About

A Cross-platform full-text search engine for local filesystem.

License:MIT License


Languages

Language:C# 69.2%Language:Vue 28.3%Language:HTML 1.5%Language:TypeScript 0.8%Language:JavaScript 0.1%