mtgjson / mtgjson3

MTGJSON repository for Magic Cards

Home Page:http://mtgjson.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Full Rewrite

ZeldaZach opened this issue · comments

This summer, I plan on re-writing the entire project (MTGJSONv4) using Python3 over NodeJS9.

This is for sustainability, as there are more python devs than node devs and it'll be easier to manage the system and make changes on my side.

I will be making a new project in the near future and people are free to open tickets/PRs there if they'd like to make changes. Eventually, it will be rolled back to this URL (with this current project being archived).

If there are any questions, comments, or concerns, please feel free to let me know before I get started. I'll be planning out everything (probably on this ticket) before I get started with working so I can be sure I do it right the first time (🤞)

commented

Some things to consider:

  • We should further separate the code from the product. Ideally, the product should not be in the repository at all. This would reduce noise in the git history as well as ensure that all parts of the build process actually work. It also parallels standard practice on GitHub: the .gitignore files for many compiled programming languages ignore the built product by default.
  • An asyncio-based approach might be helpful to decrease the time spent waiting for downloads.
  • We might want to keep our own manually-maintained database of promos and spoilers not yet present on Gatherer or magiccards.info.
  • We should take this opportunity to make a few backwards-incompatible changes the format:

Also, why don't we run the webpage from github pages?
There is https support for custom domains now, too!
That would makes deploying easier and updates to files and the changelog are instantly live with each merge for example.

Could have opened an issue on its own, but thought it might fit the "big change" ideas around a rewrite.

That runs counter to the "No output in the repo". But it's an interesting concept.

Note that Github Pages has some issues, including a lack of header control and really short (ten (10) minute) cache expiry.

commented

We could run the website as a separate repo which is automatically maintained by a bot that just builds the latest release (and maybe also master, we could use something like beta.mtgjson.com for that). This would give us easy access to past versions for adding manual patches when Gatherer introduces new errors.

I'd still recommend against using GitHub Pages though for the reasons @silasary gave since we do have our own server.

I think it's worth thinking of mtgjson as three separate projects:

  1. A downloadable set of .json files containing information about all Magic cards.
  2. A set of tools for maintaining the aforementioned .json files.
  3. A website for presenting and describing the .json files and maintenance tools.

We have run into a decent number of historical problems because we conflate the three. Personally, I would like to see each of these treated independently.

If we focus some energy on making sure that the output of the tools is consistent, we can have useful diffs for the .json files and even include some sanity checks to avoid some of the more annoying data regressions we've seen in the past. Staying away from auto-building everything will be safer and prevent data releases that don't actually change anything just because the code has been updated.

In the v4 repo there is also a list of planned changes and further goals which should get incorporated in a rewrite as well:
https://github.com/mtgjson/mtgjson4#new-features-and-changes
https://github.com/mtgjson/mtgjson4_temporary#new-features-and-changes

As well as the "v4" milestone issues: https://github.com/mtgjson/mtgjson/milestone/3

Moving forward, we should consider what relationship, if any, we want to have with Scryfall.

In many ways, Scryfall provides more reliable data than Gatherer, magiccards.info, or any of the other sources of Magic data. If I'm being honest, it's more reliable than mtgjson tends to be.

There are downsides to Scryfall: it's rate limited, it doesn't work offline, it tends to have more data than most people want, and, worst of all, you cannot grab a single, zipped file with everything.

I have considered the idea of building a scraper for Scryfall that generates mtgjson style files and there are only a few ways that would fall short of mtgjson: mostly lack of links to Gatherer, magiccards.info, and some other sites.

In constructing a v4 rewrite, it might be easiest to start with a Scryfall downloader as a starting point and then adding Gatherer, magiccards.info, Librarities, and other sources of information as follow-up work.

I'll reach out to them and see what we can do

Also, I'm more than happy to have mtgjson and my project (https://magicthegathering.io) get more closely related. That is, have mtgjson and my api stay even closer (or exact) in terms of data and json structure. A single service providing both a downloadable dataset and an API would be pretty awesome for users.

i have been struggling whether to consider promos as sets or as statuses on a card. there's merits to both, but most projects tend to create them as sets for now. but i wonder if this will break down if wizards begins re-releasing more and more conflicting cards within a set, such as several prerelease promos of the same rare

This already breaks down slightly because we've seen things like:

Search for Azcanta:

  • Pack rare
  • Pack foil
  • Pre-release promo
  • Buy-a-box treasure chest booster promo

Fatal Push:

  • Pack rare
  • Pack foil
  • Pre-release promo
  • FNM promo

Statuses would also need to be maintained on very copy of a card, and it would get particularly confusing as cards are re-printed in new sets, I think.

how would ya'll handle a card released as a pre-release promo twice?

They would be two different printings (with different date stamps), so I'd list them twice.

At that point, does there exist any connection to the original set it was (pre..)released in?

Perhaps a "promotional=true" flag and then

"id":"UNIQUE_ID_OF_THIS_PROMO",
"promotional":true,
"promotionalSource": "ID_OF_ORIGINAL_CARD", 
"promotionalType":"FNM" [or "prerelease", "media", "buy-a-box"] 
// then normal fields
"name":"Island...etc"
"setcode":"DOM"

within the same setcode?

i think that would work for (at least...) promos currently in the 'prerelease' set-based dumping ground. i don't know the behavior of wizard for other promo types.

edit: at tappedout, we have foil: [foil / pre] and the rest are promo sets

Hi, I'd just like to volunteer to help with this if help is still needed.