Google Books alternative - Full text browsing and search

Question

Google Books alternative - Full text browsing and search

ocdtrekkie opened this issue 8 years ago · comments

Project description

I have a lot of ebooks, mostly in PDF format, but some EPUBs as well, and I feel the information in them is often better than on the Internet. But I've got no way to find that information easily. I'd like a self-hosted (ideally web-based) platform that I can just upload all my books, and then search and browse them from wherever I am.

I asked for something like this on HN, and other than general full text search apps to run on my desktop, there really wasn't anything to do this.

Relevant Technology

Needs to support hosting on a Linux box. Should ideally be usable from Windows, Linux, Android, Mac, etc. through a web interface.

Who is this for

Probably at least somewhat savvy users since we're talking about self-hosting here, but hopefully one could admin a server for their family to use or something.

Karuppiah Natarajan commented 7 years ago

👍

Mikael Brevik · Answer 1 · Tue Oct 18 2016 01:12:31 GMT+0800 (China Standard Time)

Some technology one could build on for indexing is ElasticSearch and the mapper-attachments.

Deleted user · Answer 2 · Tue Oct 18 2016 04:49:17 GMT+0800 (China Standard Time)

I'm doing this. Will update here :)

Mikael Brevik · Answer 3 · Tue Oct 18 2016 04:55:08 GMT+0800 (China Standard Time)

@mysticmode Cool! Really looking forward to this. There's a lot of potential here to make a open source, ebook indexing, library. One could do auto-fetching of book covers, overview of all books, fetching reviews, having highlights/bookmarks, etc.

I think there is a lot of work that could be done here and there might be room for a collaboration for several people at some point.

This'll be awesome 💪

Mikael Brevik · Answer 4 · Tue Oct 18 2016 05:01:05 GMT+0800 (China Standard Time)

This could also be easily wrapped in something like electron if that adds any value.

Fredrik A. Madsen-Malmo · Answer 5 · Tue Oct 18 2016 05:28:28 GMT+0800 (China Standard Time)

To make it a bit easier you could just make a website and use kiosk-view/app-view in chrome. They probably have something similar in Firefox etc. That way you don't have to deal with electronic as well

Jan van Brügge · Answer 6 · Tue Oct 18 2016 05:31:20 GMT+0800 (China Standard Time)

There is already a server mode for the ebook management software calibre, but it's rather ugly and feature-poor. Developing something with a good server client architecture would be good. Server should do the scraping, searching and managing of ebooks (maybe throw in a few converter plugins for different formats), the client would display it neatly, possibly via chrome or electron. Would be interested, especially serverside.

Cyris · Answer 7 · Tue Oct 18 2016 06:35:44 GMT+0800 (China Standard Time)

What about building an electron wrapped app, that authenticates and syncs with dropbox. We could use the API to search the contents of PDF files on your dropbox and bring the results back to the Electron app. I don't think the contents search supports EPUBs though.

Jacob Weisz · Answer 8 · Tue Oct 18 2016 06:39:25 GMT+0800 (China Standard Time)

A dependency on Dropbox, a proprietary cloud service would mostly defeat the point.

Alex · Answer 9 · Tue Oct 18 2016 12:09:26 GMT+0800 (China Standard Time)

Great idea! I'm heavy user of Google Books, but would really like some open source solution. How do you plan to search in PDF files? Some kind of OCR will be required. It's also worth mention that I have non-English books as well.

Jacob Weisz · Answer 10 · Tue Oct 18 2016 12:15:12 GMT+0800 (China Standard Time)

@la0rg Most PDFs are already OCR'd. Or, in many cases, were always digital to begin with. Page images are rare, PDFs are a multimedia format.

Jan van Brügge · Answer 11 · Tue Oct 18 2016 13:30:18 GMT+0800 (China Standard Time)

Yes, pdf as file format should be supported, but for OCR you are better served with tesseract, which is Open Source and does an amazing job.

Deleted user · Answer 12 · Tue Oct 18 2016 13:58:10 GMT+0800 (China Standard Time)

Elastic search is good. I'm trying to take baby steps here and implement something which is well-thought and discussed rather than just another ebook reader which is open source and web client based.

First step. Domain registration http://www.libreread.org/ :)
I'm not sure, if we can discuss the whole process here. So, I'll create a slack team and let you know.

I feel the information in them is often better than on the Internet. But I've got no way to find that information easily

@ocdtrekkie Could you elaborate in examples of how do you need the full text search should be?

Also if you take the existing ebook readers like Kindle, iBooks, etc., other than those are proprietary, what features that you need are missing in it or other ebook readers?

Answer from anyone on the features is appreciated! :)

Thanks!

Jacob Weisz · Answer 13 · Tue Oct 18 2016 14:05:33 GMT+0800 (China Standard Time)

@mysticmode If I'm looking for a section on a programming structure, for example, in my programming books, I'd expect searching it to show me which books mention it, and let me open that book to the first mention.

Another feature that would be super important would be the ability to import a set of my ebooks data in some format I could easily create. In my case, I have a homegrown (and relatively shoddy) database app which indexes my books, and I'd want to be able to import the metadata I have into a format I could feed into this system, rather than having to sit there and reenter all of that information.

I'm not super concerned about the actual reading elements of this, much more than embedding the PDF reader in my web browser. But that's just my personal use case, I suppose.

Deleted user · Answer 14 · Tue Oct 18 2016 14:11:06 GMT+0800 (China Standard Time)

Features:

Open Source
Support PDF and EPUB
Browser based
Full-text search on the metadata and the content.
Fetch book reviews
Centralised annotations (maybe a browser plugin like hypothes.is)
Highlights
Library with categories

I'm looking into Elastic Search for PDF formats

Jan van Brügge · Answer 15 · Tue Oct 18 2016 14:15:33 GMT+0800 (China Standard Time)

Basicly this would be Plex, but for Books

Mikael Brevik · Answer 16 · Tue Oct 18 2016 14:24:38 GMT+0800 (China Standard Time)

While it's certainly cool to have a lot of features and there is a lot of potential here. But I think it's smart to start with the "core feature" and scope out from that when it's solved. I'd say start small, solve the critical, core feature (which is creating a full text register for pdf ebooks) and expand on that in time. Along the way you'll learn a lot about the problem and probably see new ways to improve it.

Just creating infrastructure for setting up a distributable project with ElasticSearch & attachments is a job in it of it self. Maybe a good way to start would be just test it out with some ElasticSearch plugin for searching your indexes. And then creating a small web server and a UI as an extension of that when you have the index fundaments on place. When you have that foundation it's much easier to work in parallel for people who want to join in also, I think. Just my two cents. I think the chance of success increases drastically if the scope is small from the beginning. 😄

Markus Mayer · Answer 17 · Tue Oct 18 2016 16:23:21 GMT+0800 (China Standard Time)

I'm with @SuperManitu here about calibre. I've always been thinking of something that is either using calibre's own database (which is probably a bad idea) or something that just imports and/or exports to it as a starting point. I also thought it could be something peer-to-peer, e.g. utilizing torrent technology to broadcast metadata or content. That might really come in handy in a academic context with open data in mind.

Deleted user · Answer 18 · Tue Oct 18 2016 20:20:47 GMT+0800 (China Standard Time)

I agree with @la0rg on OCR support for PDF formats. I think full-text search for most ebooks that are in PDF formats would work well on most cases with the extracted data. But OCR support should be in the pipeline though.

As @mikaelbr suggested, I'll try to start with the basic implementation of PDF extraction and search and share it here. Then if people find it good, we can move on from there.

Deleted user · Answer 19 · Wed Oct 19 2016 17:26:59 GMT+0800 (China Standard Time)

I'm looking into pdf.js and elastic-search. Through pdf.js we could get the rendered HTML5.

I'm passing that to elastic search. Using strip char filter we could get the search content stripped from HTML.
we could full-text search on all the documents and find the relevant book as we have all the book contents in elastic search as documents.
I'm passing the HTML content which is in elastic-search documents to the client instead of pdf to be converted and rendered every time in the browser.
I'm keeping the pdf files as a backup. If it is self-hosted, people would want to download the books whenever they need.

I'm writing this, if someone knows these technologies for them to tell me if I'm on the right way :)

I'll do the above approach and try to share it in couple days.

Jan van Brügge · Answer 20 · Wed Oct 19 2016 17:39:32 GMT+0800 (China Standard Time)

I don't think pdfjs is needed you will almost never read the pdf in browser. For thise cases I would just open the pdf in a new tab and let the browser does its job. Pdfjs is rather slow and not a pleasurable reading experiance.

Deleted user · Answer 21 · Wed Oct 19 2016 17:43:32 GMT+0800 (China Standard Time)

No. When the user uploads a book, I'd be using pdf.js and converting to HTML through a headless browser like phantom-js in the server.

Deleted user · Answer 22 · Wed Oct 19 2016 17:45:01 GMT+0800 (China Standard Time)

I tried python pdfminer but the extracted HTML is messy. PDF.js gives the clean code

Jan van Brügge · Answer 23 · Wed Oct 19 2016 17:48:05 GMT+0800 (China Standard Time)

It might be easier to do the server in Java, as Elasticsearch provides a Java API, you can use a REST microframework like Jersey and you have libraries like PDFBox for reading pdfs. Plus Node's performance on bigger files is really bad.

Jan van Brügge · Answer 24 · Wed Oct 19 2016 17:54:07 GMT+0800 (China Standard Time)

I would opt for ElasticSearch + Jersey (or similar) REST Server + clientside SPA (preferably typescript) + optionally (later on) nodejs for Server-rendering the SPA.

As this is a rather complicated setup provide a zero-config docker container to run it.

Deleted user · Answer 25 · Wed Oct 19 2016 20:05:42 GMT+0800 (China Standard Time)

Working with filesystems is costly. I need to think about this.

Maybe I could use java as a semi-standalone process to extract pdfs.

But I'm planning to use nodejs as the base for the application. As this is open source, I think using javascript for server is better when it comes to collaboration and it works pretty well for SPA.

I'll try the pdf extractors and see which one suits best. As for text, python pdfminer works well.

Jan van Brügge · Answer 26 · Wed Oct 19 2016 20:24:27 GMT+0800 (China Standard Time)

I dont think javascript is better for collaboration, a type system can help you a lot when using code of others.
I'll create a working prototype in the next days

Fredrik A. Madsen-Malmo · Answer 27 · Wed Oct 19 2016 20:31:32 GMT+0800 (China Standard Time)

What language would you use then?

On Wed, Oct 19, 2016, 14:24 SuperManitu notifications@github.com wrote:

I dont think javascript is better for collaboration, a type system can
help you a lot when using code of others.
I'll create a working prototype in the next days

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHVdG_2tWB-zzFRRfI9rxOVFe4dzwMozks5q1gv8gaJpZM4KY3le
.

Jan van Brügge · Answer 28 · Wed Oct 19 2016 20:35:19 GMT+0800 (China Standard Time)

As said a small REST Server written in Java as Java is easy to adopt and learn, has a good type system and has very good tooling (maven and eclipse/intellij)
For the SPA I would use typescript as it enhances Javascript with an unobstrusive type system and is very popular, so you get type definitions for almost all javascript libraries

Fredrik A. Madsen-Malmo · Answer 29 · Wed Oct 19 2016 20:39:23 GMT+0800 (China Standard Time)

What about C++? Also has good tools and isn't much harder to learn than
Java IMO

On Wed, Oct 19, 2016, 14:35 SuperManitu notifications@github.com wrote:

As said a small REST Server written in Java as Java is easy to adopt and
learn, has a good type system and has very good tooling (maven and
eclipse/intellij)
For the SPA I would use typescript as it enhances Javascript with an
unobstrusive type system and is very popular, so you get type definitions
for almost all javascript libraries

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHVdG8Kk5uFdg3ejFAy_l6Pxm9NsoeGnks5q1g6IgaJpZM4KY3le
.

Fredrik A. Madsen-Malmo · Answer 30 · Wed Oct 19 2016 20:40:18 GMT+0800 (China Standard Time)

We could also write it in Python, which is by far easier to learn for newer
programmers, and has great support for platforms, with plenty of libs we
could use.

On Wed, Oct 19, 2016, 14:39 Fredrik August Madsen-Malmo <
mail.fredrikaugust@gmail.com> wrote:

What about C++? Also has good tools and isn't much harder to learn than
Java IMO

On Wed, Oct 19, 2016, 14:35 SuperManitu notifications@github.com wrote:

As said a small REST Server written in Java as Java is easy to adopt and
learn, has a good type system and has very good tooling (maven and
eclipse/intellij)
For the SPA I would use typescript as it enhances Javascript with an
unobstrusive type system and is very popular, so you get type definitions
for almost all javascript libraries

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHVdG8Kk5uFdg3ejFAy_l6Pxm9NsoeGnks5q1g6IgaJpZM4KY3le
.

Jan van Brügge · Answer 31 · Wed Oct 19 2016 20:50:23 GMT+0800 (China Standard Time)

I personally like C++ far more than Java, but it has too many disadvantages in this particular case:
Pro:

No JVM required

Cons:

No central module registry (like maven or npm)
Many caveats not clear to new developers (const, object creation and manual memory management to name a few)
complicated dependency management (autotools, cmake, or similar)

Python:
Con:

No type system

Fredrik A. Madsen-Malmo · Answer 32 · Wed Oct 19 2016 20:55:09 GMT+0800 (China Standard Time)

Yeah, those are good points. Dependency management isn't really something contributors need to worry about a lot, considering most of that will be done during the initial phase. Consts and memory management are indeed a bit harder for newer devs.

Python however I don't see why we couldn't use. The lack of types isn't really that big of a problem IMO, but it is handy.

With Python we don't have to worry too much about setups and the like, as we have pip for deps and could use something like pep8 for formatting tools, plus most (if not all) IDEs have support for Python.

Nick Morrison · Answer 33 · Wed Oct 19 2016 21:56:37 GMT+0800 (China Standard Time)

I'm interested in this, and I'm for using Python. The low barrier to entry for newer devs is probably a huge plus for using the language for the backend. Also, @SuperManitu, I believe @mysticmode has already created a repo for this project.

Jan van Brügge · Answer 34 · Wed Oct 19 2016 22:01:12 GMT+0800 (China Standard Time)

I wouldnt use python, because having a type system is really helpful, python has the tendency to be rather slow compared to Java and i had problems with some libs using native code. Plus using intendation as blocks is ugly

Nick Morrison · Answer 35 · Wed Oct 19 2016 22:07:51 GMT+0800 (China Standard Time)

@supermanitu, if we were to use Java, what backend framework would you
suggest?

On Wed, Oct 19, 2016 at 10:01 AM SuperManitu notifications@github.com
wrote:

I wouldnt use python, because having a type system is really helpful,
python has the tendency to be rather slow compared to Java and i had
problems with some libs using native code. Plus using intendation as blocks
is ugly

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFW7PiGTiU9If4eMXe6tKtMKJ8vCnNpTks5q1iKogaJpZM4KY3le
.

Jan van Brügge · Answer 36 · Wed Oct 19 2016 22:15:11 GMT+0800 (China Standard Time)

As said I would use the Jersey RESTful Framework: https://jersey.java.net/documentation/latest/getting-started.html#new-project-structure

Here is a rather good explanation: http://www.vogella.com/tutorials/REST/article.html

Fredrik A. Madsen-Malmo · Answer 37 · Thu Oct 20 2016 00:37:52 GMT+0800 (China Standard Time)

I don't like python either because of the indentation "thing". If you're up
for using something like python that doesn't use that system we could use
ruby.

On Wed, Oct 19, 2016, 16:15 SuperManitu notifications@github.com wrote:

As said I would use the Jersey RESTful Framework:
https://jersey.java.net/documentation/latest/getting-started.html#new-project-structure

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHVdG4_JFBgQ8uEsn02zbseHQHU2Qmdqks5q1iXwgaJpZM4KY3le
.

Nick Morrison · Answer 38 · Thu Oct 20 2016 02:03:47 GMT+0800 (China Standard Time)

I'm personally not for using Ruby. PHP and Java are good for me.

On Wed, Oct 19, 2016, 12:37 PM Fredrik A. Madsen-Malmo <
notifications@github.com> wrote:

I don't like python either because of the indentation "thing". If you're up
for using something like python that doesn't use that system we could use
ruby.

On Wed, Oct 19, 2016, 16:15 SuperManitu notifications@github.com wrote:

As said I would use the Jersey RESTful Framework:

https://jersey.java.net/documentation/latest/getting-started.html#new-project-structure

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<
#11 (comment)
,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AHVdG4_JFBgQ8uEsn02zbseHQHU2Qmdqks5q1iXwgaJpZM4KY3le

.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFW7PmCM57_whOesg3fwdPx-2prE1evYks5q1kdhgaJpZM4KY3le
.

Deleted user · Answer 39 · Thu Oct 20 2016 02:37:08 GMT+0800 (China Standard Time)

When we are handling files, in our case pdf, epub. We could try using Xpdf I haven't tested it yet, but I read in few places that it gives better output than PDFBox. It's written in C++ and it's licensed under GPL. If I start this project, I would like this to be in GPL. And there are multiple pdf extractors available in C++ which we can test.

As far as language goes for the core part of our application, I certainly won't choose java, it's confusing philosophy of licensing for it's ecosystem and the court claims + object-oriented + Certainly hard to pickup for a new programmer makes me go for some other languages that has a fairly good motive towards open philosophy and good in performance compared to java and considering readability.

I would choose Elixir. if we need multi-threading for this application, it does right away and goes well with performance compared to java. The web framework Phoenix is ruby inspired. The coding approach is readable and far easier to pick-up quickly than java.

Deleted user · Answer 40 · Thu Oct 20 2016 02:41:47 GMT+0800 (China Standard Time)

Gotta say Elixir is far better than Ruby in performance plus You will experience the taste of Ruby style with the speed of Erlang.

Jan van Brügge · Answer 41 · Thu Oct 20 2016 05:30:56 GMT+0800 (China Standard Time)

Never used Elixir, but from what I've seen it looks good. Should be a good choice for the server.a

Ngoc · Answer 42 · Thu Oct 20 2016 05:49:18 GMT+0800 (China Standard Time)

I think using node.js with walmarts electrode as a framework would be great. The project can be super modularized. I believe react would work well for this project and is such a popular framework people can really help with its development and pick up easy. It's also easy to onboard someone

Fredrik A. Madsen-Malmo · Answer 43 · Thu Oct 20 2016 05:53:38 GMT+0800 (China Standard Time)

Do we really need electron though? Couldn't it just be a website?

On Wed, Oct 19, 2016, 23:49 Ngoc Buu Tran notifications@github.com wrote:

I think using node.js with walmarts electrode as a framework would be
great. The project can be super modularized. I believe react would work
well for this project and is such a popular framework people can really
help with its development and pick up easy. It's also easy to onboard
someone

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHVdG218x9lMlWMXhFwJ6EYaesBlyfqHks5q1pBfgaJpZM4KY3le
.

Jacob Weisz · Answer 44 · Thu Oct 20 2016 05:55:28 GMT+0800 (China Standard Time)

I absolutely would like to see a standards-compliant website I can access from anywhere. (As a personal note, I'd like to be able to host it as a Sandstorm.io app, which is possible as long as it's A. web-based and B. runs on 64-bit Linux.)

Jan van Brügge · Answer 45 · Thu Oct 20 2016 05:58:44 GMT+0800 (China Standard Time)

For the SPA I would make a simple website. Using electron can be done if needed later (but i doubt that). As Frontend frameworks there is either React + Redux or Cycle.js (which I'm in favour of). Used Angular2 and won‘t use it again.

Deleted user · Answer 46 · Thu Oct 20 2016 13:56:57 GMT+0800 (China Standard Time)

@SuperManitu Could you explain why do we need a front-end framework in our use-case? I don't think our front-end is complex, so far with what @ocdtrekkie pointed out earlier, the browser based app should be standards-compatible and upon features it should be minimal at first. Most of the process is happening in the backend.

Let's build it with bare-bone javascript or maybe typescript, and on the go.. we'll figure out if we need a framework that might help us solve the problems that we would be facing at that time. I think building the minimal version first -> then get it right -> then get it better is the way to go.

To initiate this project, I'm doing the initial setup now and share it here once ready.

Deleted user · Answer 47 · Thu Oct 20 2016 14:22:14 GMT+0800 (China Standard Time)

I think it's better if we take this discussion to chat. I've created a slack team http://libreread.slack.com

Let's discuss the tools and intricacies of the app in the chat. And If we want to discuss about the features, we can post here.

Please share your email if you would like to join the development, so I could add you on slack team.

Thanks!

Jan van Brügge · Answer 48 · Thu Oct 20 2016 14:23:31 GMT+0800 (China Standard Time)

Yes of course, for the beginning we dont need a framework. Just standard Typescript. Awaiting the setup :)

Email is supermanitu@gmail.com

Nick Morrison · Answer 49 · Thu Oct 20 2016 20:22:43 GMT+0800 (China Standard Time)

@mysticmode email is nickmorrrison09@gmail.com

Deleted user · Answer 50 · Fri Oct 21 2016 03:15:25 GMT+0800 (China Standard Time)

@FutureProg hey, let me know if you get the invite :) It's bouncing here and I couldn't send the invite again. Weird

Nick Morrison · Answer 51 · Fri Oct 21 2016 03:49:29 GMT+0800 (China Standard Time)

@mysticmode Nope, haven't received one yet. I think you can post a join link?

Fredrik A. Madsen-Malmo · Answer 52 · Fri Oct 21 2016 05:27:32 GMT+0800 (China Standard Time)

mail.fredrikaugust@gmail.com :)

Deleted user · Answer 53 · Fri Oct 21 2016 17:24:04 GMT+0800 (China Standard Time)

@FutureProg I revoked and sent it again. Please check now.

Nick Morrison · Answer 54 · Fri Oct 21 2016 23:20:17 GMT+0800 (China Standard Time)

@mysticmode I'm not certain what's going on but I still haven't received an invite (and it says my email isn't in the system when I do a password reset). Are you sure you're sending it to nickmorrison09@gmail.com ?

Jacob Weisz · Answer 55 · Fri Oct 21 2016 23:21:41 GMT+0800 (China Standard Time)

I have no interest in joining Slack. Please just make sure when there's a repository to check out, that the link to it makes it into this thread.

Mikael Brevik · Answer 56 · Fri Oct 21 2016 23:31:15 GMT+0800 (China Standard Time)

FYI: There's also created a Slack team that can be used for discussing projects from this repo. You can get an automatic invitation here http://opensourceideas.herokuapp.com/

@ocdtrekkie We'll make sure a link to the project is posted here and in this overview 👍

Deleted user · Answer 57 · Fri Oct 21 2016 23:33:11 GMT+0800 (China Standard Time)

@FutureProg yeah, your email has a spelling mistake previously :)

@ocdtrekkie Sure, we are yet to do a basic setup, we are testing tools and working on the mockup. I need to think about this, I'm keeping slack only for discussing on development. This is the repo https://github.com/mysticmode/libreread
I'll notify you about the process in this thread soon.

danfickle · Answer 58 · Sat Oct 22 2016 13:47:08 GMT+0800 (China Standard Time)

Elasticsearch with the Apache Tika mapper plugin already does search of PDFs and many other formats, you just need to provide a web client.

https://hustbill.wordpress.com/2015/09/11/full-text-search-by-elasticsearch-mapper-attachments-in-pdf-format/

Deleted user · Answer 59 · Sat Dec 10 2016 00:41:50 GMT+0800 (China Standard Time)

Hi, I have created a basic version of LibreRead. You can signup, upload books and it will get indexed in Elastic Search as a background job. Once the indexing is complete, you can search through all books by Metadata and Content. I have implemented PDF.js for UI consistency. Also I'm going to use ePub.js for epubs.

https://github.com/mysticmode/LibreRead

Here are some screenshots of what I built.

There is more work to be done in order to publish this. I think I'm going to roll out the beta version first.

To-Do:

Multiple user Roles
- Super admin (Can manage books and add/revoke users)
- Admin (can upload books)
- User (Only read books)
Book Access
- Public (Any user can read)
- Private (Only the user uploaded can read)
- Permissions (You can select specific users for reading access. This option will be available when you upload)
Create Notes and see other users notes while reading.
ePub implementation

Mikael Brevik · Answer 60 · Thu Dec 15 2016 17:45:39 GMT+0800 (China Standard Time)

This is well underway with brilliant work from all of you and @mysticmode. I'm closing the issue as it's started and progress has been made. Please feel free to continue the discussion even though the issue is closed.

Deleted user · Answer 61 · Fri May 26 2017 22:44:53 GMT+0800 (China Standard Time)

Hi, I'm asking for code contribution help from you guys for LibreRead.

Major help is needed in the backend which is written in Flask. For setup, Please check the Readme post in the repo for development setup. I can help you with that, see my email below.

I'm working on a new design and it would be greatly helpful if someone could port the old app code to work with the new design and continue taking part with the development.

Current list of features:

Add books (only pdfs)
Full-text search
Collections
Highlights + Annotations

To do list of features:

Multiple user roles
Book Access
ePub Implementation (Major work, Not sure for the first release)

You can email me for further detail hello@nirm.al or please comment here.

Deleted user · Answer 62 · Mon Jun 05 2017 09:24:43 GMT+0800 (China Standard Time)

UPDATE on LibreRead:

It's a single user product.
Backend being written in Go.
Updated README.md with new features & goals for the initial release.
New design.

Soon I'll show you the version put on the server for testing.

Deleted user · Answer 63 · Thu Jun 08 2017 23:07:18 GMT+0800 (China Standard Time)

I deployed my code to the server http://172.104.59.151:8080/signin
email: rkumarnirmal@gmail.com
password: demo

It's not done yet. Right now I'm testing file upload and full-text search. Would love to know your feedback.

Work needs to be done:

Collections
Account settings
Landing page design
Deployment documentation
Much more testing

Karuppiah Natarajan · Answer 64 · Thu Jun 08 2017 23:56:18 GMT+0800 (China Standard Time)

@mysticmode Pretty Neat! I even tried sign up. But in the confirmation mail I got the URL as "localhost:8080...." instead of "172.104.59.151:8080...."

Deleted user · Answer 65 · Fri Jun 09 2017 00:01:56 GMT+0800 (China Standard Time)

@karuppiah7890 Sorry for that! I should disable signup. It's a single user product :)

Deleted user · Answer 66 · Fri Jun 09 2017 00:15:26 GMT+0800 (China Standard Time)

Adding one more request. The design is responsive. But still needs some fine-tuning. You could check in hand-held devices and please let me know if I need to make any change.

Thanks!

Karuppiah Natarajan · Answer 67 · Fri Jun 09 2017 00:16:43 GMT+0800 (China Standard Time)

@mysticmode Oh yeah, I remember seeing somewhere in discussions that it's for a single user. Self hosted system and for private use.

Karuppiah Natarajan · Answer 68 · Fri Jun 09 2017 00:23:27 GMT+0800 (China Standard Time)

@mysticmode Not very responsive. Menu is responsive, but books are eaten up (and no scroll bar) when size width decreases. Try decreasing your browser width and see it.

And a good tool to try out responsiveness across lots of devices is : https://sizzy.co/
Code for the sizzy project https://github.com/kitze/sizzy

Deleted user · Answer 69 · Fri Jun 09 2017 00:27:20 GMT+0800 (China Standard Time)

@karuppiah7890 Could you check in the mobile/tablet browser once? I've added max-device-width

I think that is the reason if you decrease desktop browser width, you are not seeing it responsive.

Deleted user · Answer 70 · Fri Jun 09 2017 00:37:01 GMT+0800 (China Standard Time)

@karuppiah7890 I could add multi-user support. But that brings more complexity. For example, you wouldn't like your invited user to fill up the storage space with lots of books. We should be providing a way to control the storage space for each user.

And if there is multi-user support, people would expect to share ebooks with other users. In that case, We need to put a notice that we are not responsible for sharing copyrighted ebooks in the platform. And you have to control the book reading access. You wouldn't like to share it to everyone.

Those will take more thoughts and time. I'm trying to keep it simple for the initial release.

Karuppiah Natarajan · Answer 71 · Fri Jun 09 2017 00:38:02 GMT+0800 (China Standard Time)

Makes sense!

Karuppiah Natarajan · Answer 72 · Fri Jun 09 2017 00:43:44 GMT+0800 (China Standard Time)

@mysticmode It works fine in my tab!

Karuppiah Natarajan · Answer 73 · Fri Jun 09 2017 00:46:00 GMT+0800 (China Standard Time)

Cool stuff! 😄

Deleted user · Answer 74 · Sun Jun 25 2017 17:56:18 GMT+0800 (China Standard Time)

I've added EPUB support including full-text search feature. But it's still WIP, we need to do more user testing.

For Demo:
http://demo.libreread.org/
email: rkumarnirmal@gmail.com
password: demo

To Do list:

Test by uploading and using more EPUBs.
Highlights & Annotations for both PDF and EPUB
Account settings

I'm planning to launch beta in the next couple weeks by completing the above to do list.

Deleted user · Answer 75 · Sun Jun 25 2017 17:59:56 GMT+0800 (China Standard Time)

IMPORTANT NOTICE:
The project repo has been moved to Savannah
https://savannah.nongnu.org/projects/libreread/

Karuppiah Natarajan · Answer 76 · Sun Jun 25 2017 18:11:43 GMT+0800 (China Standard Time)

@mysticmode Does the upload work for demo ?

Deleted user · Answer 77 · Sun Jun 25 2017 18:12:09 GMT+0800 (China Standard Time)

@karuppiah7890 Yeah, is there a problem?

Karuppiah Natarajan · Answer 78 · Sun Jun 25 2017 18:14:07 GMT+0800 (China Standard Time)

Oops. Sorry. Yes, I just noticed. It does work, but it doesn't show the thumbnail of the Ebook though. And I was dumb as I forgot it takes some time to index the PDF in the background and started trying out the search immediately after upload, to see if the book is present as thumbnail wasn't shown. But now I am able to search. Just that the thumbnail is just white

Karuppiah Natarajan · Answer 79 · Sun Jun 25 2017 18:16:02 GMT+0800 (China Standard Time)

The book has a simple page with title (first page) as a thumbnail, from what I see in my file explorer

Karuppiah Natarajan · Answer 80 · Sun Jun 25 2017 18:16:31 GMT+0800 (China Standard Time)

@mysticmode Check the demo site. The book I uploaded is Eloquent JavaScript

Deleted user · Answer 81 · Sun Jun 25 2017 18:20:21 GMT+0800 (China Standard Time)

Yeah, I can see that and it doesn't generate cover for some PDFs. I'm using poppler-utils to generate PDF cover.

Some days before @ocdtrekkie pointed out that the user should be able to edit the Metadata of the ebook(title, author and cover). So you can add cover to the ebook manually.

I'll be adding that feature in the next coming days :)

Karuppiah Natarajan · Answer 82 · Sun Jun 25 2017 18:20:37 GMT+0800 (China Standard Time)

I see a 404 error for the thumbnail, just noticed it while checking out collections page

Karuppiah Natarajan · Answer 83 · Sun Jun 25 2017 18:21:28 GMT+0800 (China Standard Time)

That sounds cool. It's true. Sometimes the Ebook may not have all the metadata properly. :)

Deleted user · Answer 84 · Sun Jun 25 2017 18:21:57 GMT+0800 (China Standard Time)

@karuppiah7890 Could you try some other PDFs which has cover image?

As said, I'll add the feature where you could attach PDF cover if it doesn't generate from the code.

Karuppiah Natarajan · Answer 85 · Sun Jun 25 2017 18:46:47 GMT+0800 (China Standard Time)

Yes, it works in a pretty smooth manner

Deleted user · Answer 86 · Mon Jun 26 2017 20:50:02 GMT+0800 (China Standard Time)

There is a problem with my EPUB implementation. Right now I'm doing it like this

unzip epub
load all htmls into a single file
show that file

That way I could just show the entire epub content as a single html file. I thought scrolling through a single page would be more natural than clicking next and previous buttons.

But I have a problem here, EPUB table of contents doesn't work that way. Each link points to the particular file. I tried to manipulate it, but it didn't go well.

So, I'm going to do EPUB viewer in a traditional way like other EPUB readers. I'm going to use Redis for this.

unzip epub
Fetch the spine data(id and href) and store in redis
Load each html based on the spine data by using next/previous button.

This way table of contents will automatically work and I don't need to do manipulation.

I'll get back when I'm done with this implementation.

Deleted user · Answer 87 · Mon Jun 26 2017 20:58:10 GMT+0800 (China Standard Time)

Things need to be done for the beta release:

Working EPUB implementation.
Edit/Delete books.
Edit/Delete Collections.
Highlights/Annotations.
Account settings.

Deleted user · Answer 88 · Thu Jul 06 2017 22:43:34 GMT+0800 (China Standard Time)

EPUB support including the search functionality is implement now. Please let me know your feedback.

http://demo.libreread.org
Email: rkumarnirmal@gmail.com
Password: demo

I'm going to work on Highlight/Annotations feature.

Deleted user · Answer 89 · Sat Jul 22 2017 20:06:19 GMT+0800 (China Standard Time)

I have done Highlights & Annotations for PDFs. Now you can Highlight a text and add comment to it on PDF files.

I'm starting to do the same for EPUBs now.

But we need to more user testing on this feature. If you are interested, Please check the uploaded book here
http://demo.libreread.org
Email: rkumarnirmal@gmail.com
Password: demo

Thanks!

Karuppiah Natarajan · Answer 90 · Sat Jul 22 2017 20:18:20 GMT+0800 (China Standard Time)

It's pretty neat @mysticmode ! But I noticed some unusual things while trying out stuff

Deleted user · Answer 91 · Sat Jul 22 2017 20:22:44 GMT+0800 (China Standard Time)

@karuppiah7890 Thanks for trying it out! What is it?

Karuppiah Natarajan · Answer 92 · Sat Jul 22 2017 21:00:43 GMT+0800 (China Standard Time)

Once I highlight a line and then delete the highlight, then I am not able to highlight it again

Deleted user · Answer 93 · Sat Jul 22 2017 21:03:26 GMT+0800 (China Standard Time)

@karuppiah7890 Wow! That's a nice find. Thank you! will fix it :)

Karuppiah Natarajan · Answer 94 · Sat Jul 22 2017 21:06:05 GMT+0800 (China Standard Time)

It never gets selected with the blue highlight itself, when trying to highlight a text which is a superset of it. Everything gets selected except the old highlighted text

Deleted user · Answer 95 · Sat Jul 22 2017 21:07:36 GMT+0800 (China Standard Time)

Ah! That's quite complicated for me now. Highlighting over the already highlighted text. I'll try to do that.

Karuppiah Natarajan · Answer 96 · Sat Jul 22 2017 21:08:23 GMT+0800 (China Standard Time)

No, I mean, highlighting over an already highlighted and deleted highlight text

Deleted user · Answer 97 · Sat Jul 22 2017 21:08:47 GMT+0800 (China Standard Time)

Yup, I will fix that :)

Karuppiah Natarajan · Answer 98 · Sat Jul 22 2017 21:09:47 GMT+0800 (China Standard Time)

Like, say the text is "This is an example of an highlighted text". I highlight the word "example" and then delete the highlight. Then when I try to highlight whole sentence, it doesn't show blue highlight for "example" while selecting also.

Karuppiah Natarajan · Answer 99 · Sat Jul 22 2017 21:10:59 GMT+0800 (China Standard Time)

And you could change the color of the highlight selection from blue to something else and also make it transparent so people can read when they are highlighting. And I see you use tooltips to show options to change the color and for notes and for deleting highlight. I think you could use the same for asking people if they want to highlight a text - as a tooltip, instead of showing an alert for asking if they want to highlight. I am talking about something like Medium publication highlight feature.