normalizing text and json apis for 0.7

Question

normalizing text and json apis for 0.7

dgreisen opened this issue 11 years ago · comments

I think that when two parts of the api do the same thing, they should use the same structure as much as possible. I would like to see the text and json apis normalized. There is no reason that in order to delete a substring, you call
remove in the text api and del in the json api. Same for get and getText. I would also like to see the arguments normalized.

In order to do this I propose the following changes:

JSON API
- Paths can go all the way into a string. Thus, if you have the following json object: {"s": "hello world"}, the path ["s", 6] would point to "w".
- Paths can be written in dot notation: "hello.world.100" == ["hello", "world", 100]. 100 == "100" == [100]. This would simplify reading and writing paths in code. it would also mean that wherever a text api uses a position, json api could use a path with no need for different arguments.
- remove(path, [len], callback) removes a substring (just like text's remove), or a range within a list (new functionality) or an entire subdoc (current functionality)
- deprecate del
TEXT API
- change getText to get; deprecate getText
- add a set method that replaces the entire document with another string - same functionality as json's set

I have some time to work on this this week, if you aprove, so your feedback is appreciated as quickly as possible.

Thanks,

David

Seph Gentle · Answer 1 · Wed Aug 21 2013 09:38:35 GMT+0800 (China Standard Time)

I like.

Totally agree with the text API changes.
JSON API:
- Paths going all the way into the string ... If you want. I've never needed that. It would be nice to be able to use the list insert operations to edit strings (its almost the same logic), but the JSON OT code doesn't work like that at the moment.
- The paths-as-dotted-strings idea has floated around several times - @nornagon and I talked about it a whole bunch when we were initially designing the JSON API. The problem is that path strings are slow and ambiguous - its not clear if "foo.123" means foo["123"] or foo[123]. These are the same thing in javascript, but OT semantics are different. In the case of the JSON API, we can figure out what you want from actually looking at the document snapshot, but in general the ambiguity makes me nervous.
- I like the change to remove. In general, I like the idea of treating string edits & list edits the same way.
- I can't find del in the JSON API.

Leave paths as strings for now, but the other changes sound good. And ping the mailing list talking about what you're doing to keep everyone in the loop.

Seph Gentle · Answer 2 · Wed Aug 21 2013 09:47:17 GMT+0800 (China Standard Time)

Long term we should be able to make a better API on top of Object.observe or a polyfill which would let people just edit the snapshot directly.

David Greisen · Answer 3 · Wed Aug 21 2013 10:11:27 GMT+0800 (China Standard Time)

I misspoke. it is not called del, it is called deleteText.

Paths into text will be necessary if we combine deleteText and remove into a single function.

The reason why I want paths as strings and paths into text is so that if I have a json subdoc that points to a string, I can edit it with the exact same calls as if i was using a document with the text api. If we don't have paths as text, then the text document position is an integer, but the json document path is a list. if we don't have paths into strings then the api must have a path and a position, not just a single path/position.

I don't think the ambiguity with dotted paths is a deal-breaker. paths can be specified as either a dotted string or an array. Internally, the paths will always be represented as arrays, so don't have to worry about ambiguity. While the string parsing might be a bit slow, it will only happen once, as all internal representations are still as an array. When paths are generated programmatically, a coder would have to be foolish to represent them as anything but an array, so don't have to worry about ambiguity. Basically, the only time strings are used is when a human is hard-coding into his/her code. The rules for parsing a string would be clearly defined:

split on dots
convert any digit-only string into an integer

There are only two times when this will not work. when "100" should be ["100"] instead of [100] and when "a.b" should be ["a.b"] and not ["a", "b"]. If the path the human is coding will not be parsed correctly, then she will simply have to code the the path as an array. Basically, the dotted string is a bonus representation with faster coding and improved readability, that will work for 98% of all paths, but can always fall back to the array representation when dotted won't work.

I have dozens and dozens of hard-coded paths in my code. I find them tiresome to write, and hard to quickly scan and grok. They are entirely optional, but can significantly improve code readability.

Stephan Seidt · Answer 4 · Wed Aug 21 2013 16:12:37 GMT+0800 (China Standard Time)

The problem is that path strings are slow and ambiguous - its not clear if "foo.123" means foo["123"] or foo[123]. These are the same thing in javascript, but OT semantics are different.

@josephg Could you please elaborate on the OT difference of foo["123"] and foo[123]? We've been seeing an array of objects [anObject] being turned into an object {"0": anObject} and we have no idea where that's coming from. Could this be connected?

Ted Young · Answer 5 · Thu Aug 22 2013 11:32:42 GMT+0800 (China Standard Time)

+1 on Text and JSON api's having the same method names.

The dot notation stuff makes me a little nervous, it seems like this only saves a couple of keystrokes. But if it doesn't complicate the use of the array notation, there's no harm in trying.

Object.observe on the snapshot looks like an awesome interface.

Slightly OT: has there been any thought on handing partial loads of large data sets? In the long run this would be awesome for ethersheet.

Seph Gentle · Answer 6 · Thu Aug 22 2013 13:13:49 GMT+0800 (China Standard Time)

@tedsuo We're talking about using middleware to filter documents & show views on them in a way thats invisible to the client, but we're not planning on supporting switching views anytime soon. If you can figure out a good way to do it, I'd love to see it.

David Greisen · Answer 7 · Thu Aug 22 2013 21:00:36 GMT+0800 (China Standard Time)

@tedsuo: I've been thinking about partial views as well. I think the easiest way to do this is to create a partial document that simply points to a location within another json document. the partial doc would be defined by:

the whole doc id
the path into the whole doc
the most recently requested operation from the whole doc
4 whether the partial doc exposes the text or json api

Any operations on the partial doc would be modified to include the full path, and submitted to the whole doc.
Any operations on the whole doc would be modified to remove path
the partial path would store snapshots and ops just like the full doc.

The cool thing about this (and one of the reasons I wanted to rationalize the text and json apis) is that there is no reason a partial doc that points to a string couldn't be operated on like a text doc, transforming the ops on the fly.

This would also make sharing between individuals much easier. most people don't want to share all their stuff - just some of it. anything they want to share, they could put in a branch of a json object and just share that branch by creating a new partial document of that branch. the partial branch would hide the path to the branch, protecting privacy. The next bit of infrastructure to complete sharing would be to allow symlinks within json structures so an entire document could be inserted into a json structure. Then, when you share a partial document with me, it would be inserted into my json structure completely transparently.

Seph Gentle · Answer 8 · Fri Aug 23 2013 06:20:03 GMT+0800 (China Standard Time)

@ehd Briefly yes - if you have a list, do list insert/deletes. If you have an object, do object insert/deletes. List edits also splice in/out values (so if you have [0,1,2] and remove the 1, you get [0,2]. With an object {"0":0, "1":1, "2":2} if you object remove the "1", you still have {"0":0, "2":2}. But lets not talk about it here. Use the mailing list.

Seph Gentle · Answer 9 · Fri Aug 23 2013 06:22:39 GMT+0800 (China Standard Time)

@dgreisen - @tedsuo's use case is quite different. I assume he wants partial docs so that a user doesn't get updates about sections of a spreadsheet that they aren't currently looking at. As you pan around the spreadsheet, your view's bounds should change. Doing that by constantly making views on the server would be harder - you want the server to not send you new snapshots when you change 'view'.

David Greisen · Answer 10 · Fri Aug 23 2013 06:49:55 GMT+0800 (China Standard Time)

@josephg. thanks for the clarification. I had not understood @tedsuo's use case.

Ted Young · Answer 11 · Sat Aug 24 2013 09:52:48 GMT+0800 (China Standard Time)

@josephg is there something like a doc.exportData() to go with doc.injestData(), so we could make temporary caches locally and avoid sending unnecessary snapshots? Would be useful for populating data on pageload.

Seph Gentle · Answer 12 · Sat Aug 24 2013 23:43:01 GMT+0800 (China Standard Time)

Your question is not related to the issue. Use the mailing list or open a
new issue.
On 23 Aug 2013 19:03, "Ted Young" notifications@github.com wrote:

@josephg https://github.com/josephg is there something like a
doc.exportData() to go with doc.injestData(), so we could make temporary
caches locally and avoid sending unnecessary snapshots? Would be useful for
populating data on pageload.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/239#issuecomment-23200971
.

Ted Young · Answer 13 · Sun Aug 25 2013 08:01:02 GMT+0800 (China Standard Time)

Whoops sorry! All looks the same in my email. :)

David Greisen · Answer 14 · Thu Sep 05 2013 03:21:15 GMT+0800 (China Standard Time)

resolved by #241