Proposal for runtime safe typed GROQ

Question

Proposal for runtime safe typed GROQ

stipsan opened this issue a year ago · comments

TypeScript Playground example implementation

Situation

It's useful to type GROQ responses (client.fetch) in a way that's runtime safe.
It brings a lot of value as users can make sure the way they're using the data they client.fetch are safe from runtime bugs and there's no risk of unexpected errors.
Providing typings on the responses also gives auto-complete support for them so people can write their app code much faster and with less risk of typos.

Complication

What we offer today is just a generic that lets you specify the response:

const data = await client.fetch<[title: string]{}>(`*[]{title}`)
// data is typed to `{title: string}[]`

But client isn't runtime checking this so it's possible that the dataset contains documents that don't define title, or maybe sets it to null.
Which could lead to runtime errors that TypeScript isn't catching since it's trusting the typings provided to the generic to be "truth":

data.map(({title}) => data.title.toUpperCase()) // this can throw at runtime if `title` is `null` or `undefined`

Question

How can we offer runtime safety, or at least make it easier for other libraries to provide it?
Can we offer it in a way that doesn't increase the bundle size for people that won't use this feature?

Answer

Libraries like groqd have started using the convention of taking a GROQ string as input, and output an object like: {query: string, schema: {parse(data: unknown): TypesafeResponse}}.

We can support this convention, making it very ergonomic for libraries like groqd to provide the heavy lifting while the Sanity Client stays lightweight.

Suggested API:

import {createClient} from '@sanity/client'

const client = createClient()

const custom = await client.fetch({
  query: `count(*)`,
  schema: {
    parse(data: unknown): number {
      if (isNumber(data)) {
        return data
      }
      throw new TypeError(`data isn't a number`)
    },
  },
})

function isNumber(data: unknown): data is number {
  return Number.isFinite(data)
}

The reason why it makes sense to use {query: string, schema: {parse(data: unknown): TypedResponse}} as opposed to a simpler {query: string, schema(data: unknown): TypedResponse} or {query: string, parse(data: unknown): TypedResponse} is it makes it easier for libraries to support letting userland customise the parser. They might want narrower types than their library provides out of the box, maybe widen them or perhaps coerce date strings to Date instances.

There's more examples exploring this in the playground. As well as the companion codesandbox.

Marius Lundgård · Answer 1 · Mon Jan 16 2023 17:32:00 GMT+0800 (China Standard Time)

Looks like this proposal would make this possible:

import {q} from 'groqd'

const data = await client.fetch(
  q('*')
    .filter("_type == 'bookmark' && defined(url)")
    .slice(0, 1)
)

And get typed & validated results back.

I think this is a good enhancement of the client API.

It's still unclear to my why {query: string, schema: {parse(data: unknown): TypedResponse}} is better than the less verbose {query: string, validate(data: unknown): TypedResponse}. Maybe you can expand on this?

Marius Lundgård · Answer 2 · Mon Jan 16 2023 17:39:18 GMT+0800 (China Standard Time)

Would also be good to see an example which includes query params.

Cody Olsen · Answer 3 · Mon Jan 16 2023 22:36:44 GMT+0800 (China Standard Time)

@mariuslundgard

It's still unclear to my why {query: string, schema: {parse(data: unknown): TypedResponse}} is better than the less verbose {query: string, validate(data: unknown): TypedResponse}. Maybe you can expand on this?

For libraries like groqd where you write a typed query, then you indeed don't get a lot from the extra nesting as you don't have much reason to touch schema in your own code. You just want to pass it to @sanity/client and get types back, sure.

It's when you put it in context with typegen libraries that rely on inference (like groqz) that it starts to make sense having an object with a parse method. Where you instead write an arbitrary query, and types are inferred from it as a best-effort. Sometimes the typegen faces something too dynamic and it de-opts resulting in a wider type than what you might want.

For example, you query:

*[_type=='movie']{..., "popularity": select(
  popularity > 20 => "high",
  popularity > 10 => "medium",
  popularity <= 10 => "low"
)}

And while in groqd you would write this query in a way that lets TS know that popularity: 'high' | 'medium' | 'low'. It's possible for groqz to be able to do that too, by combining studio schema information, parsing your query, and doing an introspection evaluation. But Rome weren't built in a day so there'll be a lot of cases like these where libraries like groqz won't be able to narrow things down and the best it can do is give you string, or in worst scenario: unknown.

In any case, then it's nice to have easy access to the schema so you can narrow down the typings on your own:

import {groq} from 'groqz'

const typedQuery = groq`*[_type=='movie']{..., "popularity": select(
  popularity > 20 => "high",
  popularity > 10 => "medium",
  popularity <= 10 => "low"
)}`
const narrowerSchema = typedQuery.schema.extend({popularity: z.enum([z.literal('high'), z.literal('medium'), z.literal('low')])})
const data = await client.fetch({...typedQuery, schema: narrowerSchema})

Look at the bottom of the playground to see a couple of more scenarios that makes use of the easy schema access to do other things that Zod codebases often do.

Saiichi Shayan Hashimoto · Answer 4 · Tue Jan 17 2023 00:12:59 GMT+0800 (China Standard Time)

‘groqd’ & ‘groqz’ look phenomenal and exactly the kind of help groq needs. The thing I’m seeing is they’re both schema unaware (for good reason around the flexibility of the content lake). My initial thinking of an approach would be a builder that was context aware, ie the builder would only provide and allow methods & params that adhere to the schema. I think they’re both valid and honestly very similar approaches. The main reason I’m seeing that is to continue having one source of truth. The context unaware approaches end up having you use a zod-like approach to validate output, but that ends up mirroring your sanity schemas again.

Cody Olsen · Answer 5 · Tue Jan 17 2023 06:34:20 GMT+0800 (China Standard Time)

@saiichihashimoto we're exploring that aspect of type safety as well and will share our findings once we have a solution or a plan 😌 And groqz will be schema-aware as whatever strategy is used to provide type-safety it's hard to imagine one that doesn't take your Studio Schema into account. If you don't have a single source of truth then is it really runtime safe? 🤔

Grant Sander · Answer 6 · Tue Jan 17 2023 21:56:15 GMT+0800 (China Standard Time)

👋 groqd author here. I love to see this conversation happening!

I think what you have proposed here looks good. It seems like this would work out of the box with groqd, but also allow non-groqd users to provide their own simple, lightweight solutions if they'd like (e.g. they could plug into something like yup or other schema validation libraries, or roll their own).

Is the idea to also provide legacy support for just providing a groq string to client.fetch, and just overload the function?

Cody Olsen · Answer 7 · Tue Jan 17 2023 23:25:45 GMT+0800 (China Standard Time)

Is the idea to also provide legacy support for just providing a groq string to client.fetch, and just overload the function?

Yes, if given a string it works just like before. 👍

wimbarelds · Answer 8 · Fri Feb 17 2023 23:12:16 GMT+0800 (China Standard Time)

I like the part of the implementation that Sanity client takes care of (though I'd love for Sanity to take care of more). But assuming that Sanity opts to stay out of providing a type-providing implementation...

It feels like an ideal (third or first party) solution would be:

wrap defineType, defineField, defineArrayMember
in those functions, return sanity-schema-types and a typesafe query-builder
query builder then provides query and schema for client.fetch, thus providing appropriate returntypes
optionally also provides appropriate zod definitions for validation

However, if something like this were made (first or third party); that seems like it could still be compatible with the implementation outlined in this issue.

wimbarelds · Answer 9 · Wed Apr 05 2023 17:47:43 GMT+0800 (China Standard Time)

Have there been any updates in this area?

Cody Olsen · Answer 10 · Wed Apr 05 2023 22:45:12 GMT+0800 (China Standard Time)

Have there been any updates in this area?

Still in the exploratory phase. We remain convinced that typed GROQ brings a lot of value and committed to deliver this. It's still too early for us to be able to share any timelines as it's still undetermined what tooling and features we want to ship.
There's a lot of interesting ideas bouncing around internally that we discuss on a weekly basis. And the concern is "can it really be done this way or is it too ambitious" and a lot of other unknowns.

For me, for example, the dream setup would be to continue to write GROQ template strings. And have my IDE tell me if there's a typo in my query. Or if I'm asking for a field that doesn't exist. Or asking for an array when it's actually a string. Types for my app should be generated without me having to change my workflow in any way, I edit the query and save the file and it updates. I edit my Studio schema, it should update the typings. If I remove a field my IDE should tell me right away if it'l break my app. If I change the type of a field it should offer to refactor my code to match the change.

It's still too early to tell if what I'm describing above is even possible, or if we'll have to make some compromises.
I hope this sheds some light as to why we're a bit quiet on this while we're deep down the rabbit holes 🙂

Marc Logemann · Answer 11 · Fri Dec 15 2023 19:04:14 GMT+0800 (China Standard Time)

@stipsan event though its quite here these days... i like your summary of your "dream" workflow. Your mentioned approach would be pretty awesome.

Cody Olsen · Answer 12 · Mon Apr 08 2024 23:57:47 GMT+0800 (China Standard Time)

In case you missed the announcement on the new TypeGen feature: https://www.sanity.io/blog/introducing-sanity-typegen

Since this issue started our ecosystem has evolved. There no longer is a client.fetchcall for all your GROQ. If you're using Sanity Presentation you're probably calling loadQuery and useQuery instead.
In light of this it doesn't make sense to enforce runtime checks (like with zod) in @sanity/client's API.

Thanks for the discourse here and feel free to continue the discussion on the #typescript channel on our Slack