pksunkara / alpaca

Given a web API, Generate client libraries in node, php, python, ruby

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem with definition of parameters vs documentation

jrpereira opened this issue · comments

Hi,

First, great project, great idea! :)

I do have an issue with the ease of keeping documentation text in a separate doc.js file, which means there is the need to stay synchronized with the "class" portion of the api.js and the order of params defined in there.

Currently, params are defined as an array. This means it's not possible to add code-relevant information to each param (eg, "type" for typed/hinted languages such as PHP), but also means that if the order of parameters is changed, the documentation becomes incorrect.

I would suggest changing api.js so that instead of this:

    "create_update": {
            ...
            "params": ["text", "profile_ids"]
     }

You'd instead write the params field as an object, and could also provide info immediately in there. I also suggest adding a type field, which if used can generate type hints for the method signatures. So, in the api.js file you'd write:

    "create_update": {
            ...
            "params": {
                "text": {
                     "type": "string",  // optional
                     "desc": "This is the status update text.",  // optional
                     "value": "This is an example update"  // optional
                 },
                 "profile_ids": {}, // don't define anything for this field, will be done only in doc.js
            }
    }

The doc.js file would then become:

    "create_update": {
            ...
            "params": {
                "text": {
                     "type": "string",
                     "desc": "Here you'd write the text for the status update.",
                     "value": "This is my status!"
                 },
                 "profile_ids": {
                     "type": "array",
                     "desc": "An array of profile id's that the status update should be sent to. Invalid profile_id's will be silently ignored.",
                     "value": ["4eb854340acb04e870000010", "4eb9276e0acb04bb81000067"]
                 },
            }
    }

With this change you no longer need to worry about the order of parameters. This also means that in the future, you could allow multi-language documentation to be generated by having multiple files (eg, doc_EN.js, doc_DE.js), with different people doing the writing and the programmers only caring about api.js.

If you'd like to keep these formats backwards-compatible, you can check whether params is an array or a hash (object), but personally, I'd suggest breaking backwards compatibility. It's a young project, it's too early to be burdened with backwards compatibility, and it's an easy change for people to do. :)

Thanks you writing your thoughts about this. I have been thinking about including a type for params since the beginning of this project. But finally decided on not doing it for the initial release.

But with the generation of typed languages such as java, golang, scala in the roadmap, I think this needs to be discussed more and then implemented.

Great! The "type" is relevant, but really my main gripe is with params being an array rather than a hash. If the programmer changes the order of params in the api.js file, either he'll have to go in and change it in docs.api too, or it'll break. This is even more important if you're supporting multiple languages or working in a team.

If you decide to change params from being an array and make it a hash, then you're solving a bunch of issues with one simple change, and opening the road for further expansion like having a "type" field. Another potentially useful indicator is an "optional" field. All of these make sense in the "params" definition in api.js, so it'd seem that making it a hash is almost mandatory. Which is fine, it's still easy to just use {} if you don't want to define anything.

I agree with params needed to be a hash.

I don't see the use of optional field though. Except maybe to use in docs of the generated libraries. Am I missing something?

No, you're right. The "optional" field would mainly be used to generate documentation, indicating whether a parameter is required or not.

Maybe a better approach would be to have a "default" field, and anything with a default would then be optional? This is how most languages work, so it seems like a natural choice.

How about mentioning optional params only in doc.json? I dont want to clutter up api.json

Actually, How about the following idea?

Leave the current api.json as it is with params as an array of required arguments. But allow params in doc.json to be a hash with fields desc, value. Type can be inferred from value and optional can be inferred too.

That's a very good idea, and I'd suggest taking it just one step further.

First, if you're working on a project with multiple people, it's very often different people in charge of doing the programming and writing the documentation. This means that one or more people know how the API operates and are in charge of setting up the parameters, and others are in charge of writing the documentation.

In my mind, the "api.js" should describe the "signature" of the API methods. This means that anything that ties directly into the programming such as param type and default values should go in there.

This is even more relevant if you have multiple languages for the documentation. The text may change due to different language, but the param type won't. So, you may have many doc.js files, repeating "type" across them would open the way for inconsistency.

How about an intermediary approach?

  • Change doc.js to be a hash - I think we all agree that makes sense.
  • In api.js allow both formats:
    • If it's an array, you assume none of the parameters has any params, it's just a convenience format.
    • If it's a hash, then it can have params

One could then easily write:

    "params": ["param1", "param2", "param3"]

But if more control is needed:

"params": {
   "param1": {},
   "param2": {default: 0},
   "param3": {type: "array"}
}

Hmm... In my mind, api.json and doc.json combinedly is the signature of the api. Where as api.json just has those value which are needed for the important parts of the code.

I tried defining the params in the api.json in the early stages of the project. It didnt work out too well.

There's metadata about each param, that belongs to the definition of the signature of the API method. Those belong together, so that programmers know where to change them. You may choose not to have "type" or "default" parameters, but if you do, they belong in api.js, seeing as they are properties of the parameters.

The documentation is a different concern, and I agree it should be separate, particularly because you can have documentation in various languages. But while obviously a description of a parameter will change across languages, its type won't, so it doesn't make sense for that to be in doc.js.

Maybe the best for now is to follow your approach, of simply having parameters in doc.api be a hash. With that we'd be happy to start using this in our own up-and-coming project, and possibly (if you so wish) help out anyway we can. :)

After a bit of thinking, I decided to change the params in api.json to a hash with support for type and required keys. While the doc.json params will be changed to a hash with support for desc and value keys.

It's because inferring types from example values while generating code will take a bit of time and I didnt want that extra complexity.

Sounds great!

What about a default field too? For instance, let's say there is a param "max" that defines the maximum numbers of results. Obviously it's not required, but it's still important to know how many will be returned by default (eg: 100 results).

This would cover the 3 possible scenarios for a param:

  • required is set to true: you must provide a value for the param
  • default set to some value: if you don't provide a value, the default will be used
  • neither default or required are set: the param is optional

Isn't the meaning of default value in API means the default on the server side? If yes. Then shouldn't that be doc.json ?

Good comparison, that link sort of portrays most properties of a parameter. And yes, default would be server side, and yes it's something that is there to be presented in the documentation. But then to an extent, everything is, even the method name and method argument names.

The point here is that if it's something programmatic, it's not really documentation. It's metadata about the API method. A programmer would write api.js to describe that metadata, and once it's done, he'd write doc.js about how the function operates.

Now, consider the possibility of multiple doc.js, one for each supported language. You would NOT want to have to copy metadata to each, which does not need to be translated, and could potentially get out of sync.

You can approach this in a similar way to how translation systems normally work, where on you define everything about the logical properties of the function in the source code (api,js), and then the translation files (doc.js) you write plain text descriptions, examples, etc. It's a commonly used pattern.

Put this in another perspective.

Consider how large the doc.js file may become, if you're writing full-text on it, explaining the behavior of the functions, its expected parameters and output.

Now put yourself in a programmer's shoes. Let's say you change the default value of a function from 50 to 100 on your code. Would it make more sense for you to make a change in the definition of the method (api.js) or in the documentation of the method (doc.js)? Personally, I wouldn't hesitate on picking the first.

Hey,

I worked on it today. It's in params-hash branch. You should check the readme on the new format.

https://github.com/pksunkara/alpaca/tree/params-hash#apijson
https://github.com/pksunkara/alpaca/tree/params-hash#docjson

in api.js, params is still an array? Why not an hash like in docs.js ? Would probably be more coherent...

@jrpereira Golang do not guarantee the order of keys in a hash. One time, the params might print as $text, $profile_ids and the next time as $profile_ids, $text. This might lead to all sorts of problems.

That is why I chose array in the first place. (I just forgot why)

Yeah, that's a typical issue, but why would it be a problem? I mean, in api.js we're defining named parameters to the methods, so the order doesn't really matter there. It's just metadata definition, right? So it would seem to me that in this file, a simpler approach would be:

params: {
    text: {},
    profile_ids: {}
}

Notice that using YAML (which I saw in your roadmap), the natural approach for api.js would be this:

params:
   text: 
       required: true
   profile_ids:
       type: array

So, in this case you won't be able to define order either.

However, order may matter for printing out documentation, right? In that case, you could indeed use the array version in doc.js:

params: [
    {
         name: "text",
         description: "..."
    },
    {
          name: "profile_ids",
          description: "...",
    }
]

And yes, I know this is just moving the problem to another place. But it does seem to me that we'd be better off keeping api.js simpler and more obvious.

What do you think?

@jrpereira I think you are missing the point. I need order for function signatures too. Which should be probably done by the technical people. Right?

@jrpereira

If you look here, https://github.com/alpaca-api/buffer-php#create-a-social-update-post-updatescreate

You can see that method needs 2 arguments. If I use a hash instead of array, the next time I generate the library, those two might have reversed in the function signature which would break the api lib.

I understand you need order in so you can create API in languages that do not support named parameters, such as PHP or even JS (node). In the later you could pass a "parameters" hash and work around lack of named parameters support (like most libs do), but the same would be odd in PHP. You'd expect a clean "createUpdate" function call.

My concern here is that we're forcing people to write things in a more convoluted way. And this is just a result of the fact we're using JSON. In other words, this is a result of the medium we're using to express the information.

Of course, I understand your concern. And mind you, this is just me trying to push further to see if we can reach a more elegant solution.

We can take a page out of other people going through the same trouble. Ruby people using YAML for defining database structures for instance, and getting those structures in a random way. Until Ruby 1.9 came out (which does keep order of hash keys), you'd need to use an OrderedHash rather than a standard Hash (which in YAML resulted in having to include a !!omap line). (disclaimer, I'm not a Ruby fan)

At the end of the day, this is not something you can easily resolve while still using JSON. But it's a pitty, because in terms of expressive power, the information (order) is there, and the inability to actually keep that order is a side-effect of the technology we're using.

Anyway, your new version DOES solve the problem, even if it's not as elegant in terms of descriptive power. So I guess until there's work on the YAML version, there's nothing we can do. ;)

@jrpereira I agree with your point regarding the medium. But it's one of those things we need to compromise on. :(

I will release the v0.2.0 soon.

Thank you discussing with me and helping me put things in perspective. I have been mulling over this problem for 2 months.

It is my pleasure!

We're working on an API soon, and hopefully we'll be able to use Alpaca to publish libs for various languages. If that happens, I'll let you know.

Regards,

Release v0.2.0