Room-11 / CVBacklog

StackOverflow CloseVote Backlog

Home Page:http://cvbacklog.gordon-oheim.biz/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Database implementation

PeeHaa opened this issue · comments

I'm thinking of implementing database support for the backlog. This would prevent possible issues with Stack Overflow's throttling limits as well as pave the way for some new features I want to implement.

It would also introduce a way to better integrate the cv-pls plugin.

My question is: would it be hard for you to be able to setup a database for the backlog?

Setting up a db would be fairly easy. How much access to it would you need?

Don't think we need that much. Still thinking about what it is we want / need to implement together with the cv-pls plugin team. I first wanted to find how big a pain it would be to enable database access before making concrete decisions. Will update this ticket when things are more clear about the requirements.

What do you have in mind? A server-side database like MySQL, or a client-side database like Local Storage?

@J0celyn I was thinking about serverside storage, because that would open up a lot of possibilities we cannot easy have when relying on client side storage.

There is still no concrete ideas why the CVBacklog would need this though. The throttling issues are gone since I added the API key. So while I am still open to the idea, I am hesitant to implement this just because I can.

I think the first use of a database would be to store the list of questions already extracted from the chat transcripts. If the date of the earliest and latest chat lines parsed are stored too, then the script knows exactly what to browse the next time the data is updated.
The database could also be useful to show the list of questions in a different order. Right now, the lastest questions posted in the chat are the first ones to appear in the list.
Instead, the list could show the questions with the most close-votes or del-votes first. Or first show questions that need to be close-voted or del-voted to avoid automatic deletion of close-votes/del-votes.

Scraping is not an issue. It's always only 25 pages and that's not a bottleneck in terms of speed. Sorting is also not an issue. We can easily sort the results from the SE.API in memory. The questions are currently listed by creation date btw and not by how they appear in the Chat Search (if they do it's coincidence or a bug). So the only thing that a database would result in is a longer backlog. Do we need that? I don't think so.

Like I said, it's not that I am against changing to a database. But right now, the backlog does nothing that would really warrant the effort. For the scope of this application, the current approach works quite well without a database.

Since there we did not find a concrete reason why we require a database, I am closing this. I suggest opening a new ticket once we have a concrete Use Case.