Add local data source override support back in

Question

Add local data source override support back in

jlengstorf opened this issue 7 years ago · comments

@ecwyne it looks like the ability to override data sources with a local version got lost in this migration. Was there a reason you pulled it out, or was it just an oversight that we can revert?

Eric Wyne · Answer 1 · Sat Nov 04 2017 02:15:04 GMT+0800 (China Standard Time)

Currently, this is set up to run gramps start ./path/to/local/source

Technically @gramps/gramps is still looking for process.env.GQL_DATA_SOURCES and will override them.

I think there needs to be a discussion of what the CLI is responsible for and what @gramps/gramps is responsible for regarding loading local sources.

I propose having the CLI in charge of loading local sources and @gramps/gramps is responsible for taking the sources it's given and merging them.

The CLI would look like this.

gramps start ./source1 ./source2 ./source3
and possibly
gramps start ./source1 "@gramps/data-source-xkcd"

Jason Lengstorf · Answer 2 · Sat Nov 04 2017 02:39:27 GMT+0800 (China Standard Time)

I see where you're going. Let me give you the use case that we have at IBM.

We have a GraphQL µ-service, which imports GrAMPS
That µ-service imports all of the IBM Cloud GrAMPS data sources
During development, we need that µ-service to be running with all data for normal console operation
Internal data sources set the µ-service repo as a dependency
We use the gramps --live --data-source-dir ./ command to allow development with the local override

It's not really feasible to expect every team to keep track of all the other data sources, and unlikely that a given view will only need data from a single data source. (Plus stitching, etc.)

I'm realizing now that the existing command doesn't have a way to specify which µ-service should be used as the "master" service (e.g. the one that's actually registering all the data sources), so it looks like I'll be adding support for that flag. 😅

If we imagine a µ-service called @company/graphql, a data source package.json might look like this:

{
  "name": "@company/data-source-internal",
  "scripts": {
    "dev": "gramps start --data-source-dir ./ --master @company/graphql"
  },
  "devDependencies": {
    "@gramps/cli": "1.0.0",
    "@company/graphql": "1.0.0"
  }
}

This would change the start command by swapping out the rootDir value and pointing to the @company/graphql main script. (Unless you've got a different idea for how this could be managed.)

Having the gramps command transpile and auto-override installed sources eliminated friction for managing local development, which was a huge selling point during the adoption process, so I don't want to lose that.

As GrAMPS is incorporated into other companies, I see this pattern becoming pretty common (where the µ-service that uses GrAMPS is a dependency of internal data sources) — and even if it's not, I need it to continue to exist in order to prevent breaking IBM's dev workflow. 😄

Eric Wyne · Answer 3 · Sat Nov 04 2017 03:07:06 GMT+0800 (China Standard Time)

I think I understand, but have some clarifying questions. For clarity of communication I'll draw a distinction between the central Gateway service and branching Source services.

If I'm working on a Source service:

Do I need the Gateway running locally, or just my peerDependencies? (which are also Sources)

If I do need to run the complete Gateway service locally while only overriding my Source

What if there was a separate command like gramps start-gateway @company/graphql --override ./

As an aside, I believe the second could be achieved using yarn link

Eric Wyne · Answer 4 · Sat Nov 04 2017 03:15:34 GMT+0800 (China Standard Time)

If the answer to Question 1 above is that in reality, a Source author only needs to have the peerDependencies loaded, I believe the CLI could automatically load those peers and start a dev server with them all.

I think this would be preferred to limit scope of what's running in development, providing as much isolation to the Source developer as possible - not overloading them with everything that's going on in the Gateway

Jason Lengstorf · Answer 5 · Sat Nov 04 2017 03:22:43 GMT+0800 (China Standard Time)

Do I need the Gateway running locally, or just my peerDependencies? (which are also Sources)

A data source can be developed standalone (with peerDependencies only), or within the context of the gateway.

To add more context, we're often working on a UI, and in order to develop locally we need the gateway µ-service running locally as well:

I am working on a UI that makes queries against several data sources, and one of them needs a new field
I clone the data source, make the change, and boot the gateway µ-service with the modified data source as a local override
I test the UI with the updated data source to verify that everything works as expected
I can now confidently submit pull requests to the data source and UI repos

If the answer to Question 1 above is that in reality, a Source author only needs to have the peerDependencies loaded, I believe the CLI could automatically load those peers and start a dev server with them all.

I think this would be preferred to limit scope of what's running in development, providing as much isolation to the Source developer as possible - not overloading them with everything that's going on in the Gateway

I agree that this should be an option, but due to the workflow above, we need both options to be easy and baked in. (In my experience working with GrAMPS, having all the data sources loaded doesn't pose any issues — they're independent, after all — and may be preferable to avoid things like name collisions.)

I think my preference would be that we allow for a single command that just checks where it should start from:

{
  "scripts": {
    "dev-iso": "gramps start --data-source-dir ./",
    "dev": "gramps start --gateway @company/graphql --data-source-dir ./"
  }
}

I'm proposing we revert to the original start script (with transpilation support, etc.) because it solves the problem and can be extended to support the --gateway flag with relatively low effort. Any opposition to this?

Eric Wyne · Answer 6 · Sat Nov 04 2017 04:00:50 GMT+0800 (China Standard Time)

I may misunderstand how the @company/graphql is structured. I see a Gateway and Source as very much different.

A Source exports {schema, resolvers, namespace} etc... while a Gateway starts it's own server and listens on a port etc. The Gateway could do a million other things that gramps has no control over.

gramps start would load the Source and PROVIDE a "development" Gateway

gramps start --gateway would start the specified Gateway (presumably with ENV variables that @gramps/gramps would check)

Jason Lengstorf · Answer 7 · Sat Nov 04 2017 04:06:08 GMT+0800 (China Standard Time)

It might, but that would be set up by the developer. As we use it, our gateway is a Node µ-service that creates a /graphql endpoint using GrAMPS. In order to develop our UI locally with new data sources, we need to be running the /graphql endpoint locally so we can point our local UI dev to it without having to commit it first.

The way I see it, any developer who's building with GrAMPS will likely have a local setup, and will not want to have to develop on a data source locally, commit those changes, and then push them up to see how it all fits together in the combined GraphQL schema. We shouldn't make any assumptions about the gateways, but I do think it's a core requirement that a developer can specify that they'd like to run their full data layer with a modified local source, if they so choose.

Eric Wyne · Answer 8 · Sat Nov 04 2017 05:02:33 GMT+0800 (China Standard Time)

I've been looking into mockery to solve this and want to hear your thoughts.

It allows you to replace the results of require/import;

pseudocode...

const {name} = require(`${dataSrcDir}/package.json`);
const localCopy = require(dataSrcDir);

mockery.registerMock(name, localCopy);
require(gateway);

Essentially this will run the provided Gateway like normal but replace every instance of the data source with a local copy wherever it's required/imported.

// gateway.js
import express from 'express';
import gramps from '@gramps/gramps';
import internalSource from '@company/internal-source' //automatically replaced with local copy

const app = express();
app.all('/graphql', graphqlExpress(gramps({dataSources: [internalSource]})); // local copy is passed to gramps

Then gramps doesn't need to manually override those data-sources, they're automatically replaced with the local version.

Jason Lengstorf · Answer 9 · Sat Nov 04 2017 05:13:49 GMT+0800 (China Standard Time)

This would probably work, but let me ask you this: is there any need to rewrite this? It works, is pretty clear, and handles transpiration (something that it doesn’t appear is being addressed in this new build). So while it’s certainly possible to rewrite all of this code, I’m curious why we need to.

Eric Wyne · Answer 10 · Sat Nov 04 2017 05:45:28 GMT+0800 (China Standard Time)

https://github.com/gramps-graphql/gramps/blob/master/src/lib/externalDataSources.js#L25-L27
https://github.com/gramps-graphql/gramps/blob/master/src/lib/externalDataSources.js#L42-L43
https://github.com/gramps-graphql/gramps/blob/master/src/lib/externalDataSources.js#L49

@gramps/gramps is the production tool, @gramps/cli is the development tool

While I would say this isn't something I'm willing to die on the hill for - I would say that there's a lot of flux in gramps and now would be the best time to clearly define the responsibilities of each part. We're currently teasing apart what @gramps/gramps and @gramps/cli is responsible for and I would argue that having @gramps/gramps responsible for loading local data sources necessitates tight coupling between the two that need not exist.

If @gramps/cli is to be responsible for local data source loading then the mechanism of doing so needs to be re-considered. Doing so would also make the production @gramps/gramps very lean and actually fairly simple which I think is desirable for production.

I would like to handle transpilation either way.

Eric Wyne · Answer 11 · Sun Nov 05 2017 01:02:08 GMT+0800 (China Standard Time)

@jlengstorf I'm trying to simply copy the current implementation from @gramps/gramps and I'm confused how this is working today at IBM. Looking both at the @gramps/gramps-express and @gramps/gramps bin implementations they both seem to be calling dist/dev/server.js and load the ONE data source (which as far as I understand is functionally equivelant to how I currently have gramps start .)

How are IBM UI developers running the actual Gateway service with one data-source override today? (leaving the remaining data-sources untouched)

I also started looking into copying over the transpilation and see that the following assumptions are made.

The data source is written in javascript (not typescript for example)
The package's (pre-transpiled) "main" file is either /src/index.js or /index.js
All javascript files are either at the root or in /src
The package author's .babelrc is compatible with the gramps .babelrc
The package only uses *.js or *.graphql files (as they're the only files copied to /.tmp)
The package is written in es module syntax (here)
The data source has no additional external dependencies (I think... Not sure how relative node_modules would work)

All of these assumptions are true when coming from @gramps/data-source-base but I'm not sure we can count on (nor enforce) these assumptions remaining true

Jason Lengstorf · Answer 12 · Sun Nov 05 2017 05:04:05 GMT+0800 (China Standard Time)

Ah, you know what? The run script is directly copied to the GraphQL microservice. So basically, it runs as is, but it’s moved across to the gateway. So we’ll have to fix that. :/

As far as the assumptions go, I think we need to make and enforce them. We can look at Typescript support, but the intention of GrAMPS is to make data sources somewhat standardized; a standard file structure isn’t out of character for a plugin-style architecture, so I don’t think we’re asking too much in that case.

We can always expand the transpiration support as we get people using different formats, but I think for now we should focus on the current use cases. Otherwise we’re risking a fall down a hypothetical rabbit hole — let’s just make sure it’s all modular enough to be replaced/extended. :)

Jason Lengstorf · Answer 13 · Sun Nov 05 2017 05:08:14 GMT+0800 (China Standard Time)

What I mean by “enforce” is: a GrAMPS-compatible data source needs to meet certain requirements, or it’s not a GrAMPS-compatible data source. Developers can do whatever they want, of course, but GrAMPS only works if they follow our guidelines. I don’t think that’s unreasonable; it’s how every other system works (npm packages, WordPress plugins, etc.). If we run into problems we can reassess, but for now I think it’s best to make recommendations and build toward those recommendations. Make sense?

Jason Lengstorf · Answer 14 · Sun Nov 05 2017 07:34:04 GMT+0800 (China Standard Time)

@ecwyne Are you actively working on this? If not, I can take it.

Eric Wyne · Answer 15 · Sun Nov 05 2017 07:39:43 GMT+0800 (China Standard Time)

@jlengstorf Go for it!

Jason Lengstorf · Answer 16 · Wed Nov 22 2017 01:29:18 GMT+0800 (China Standard Time)

This will be closed by #11