LinkedDataFragments / Client.js

[DEPRECATED] A JavaScript client for Triple Pattern Fragments interfaces.

Home Page:http://linkeddatafragments.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Need some help finding my way into the code

cecton opened this issue · comments

I've been asked on a project to do a variation of the server and the client to allow using relative paths instead of complete URLs.

So, on the server, instead of returning ids like http://data.example.org/dbpedia-sparql#dataset it will returns #dataset. Then the client needs to remember what is the base URL of the server (http://data.example.org/dbpedia-sparql) and prefix it on all the relative URL it queries.

I did the change on the server part here: https://github.com/YourDataStories/LinkedDataFragment-Server.js (tests are ok).
But I'm struggling to do the change on the client part: with all the asynchronous generators I just get lost into code to be honest.

So that's why I'm here seeking some advice on how I can implement that. For now, here are my changes:

diff --git a/lib/extractors/ControlsExtractor.js b/lib/extractors/ControlsExtractor.js
index d5d73f5..82ec3d4 100644
--- a/lib/extractors/ControlsExtractor.js
+++ b/lib/extractors/ControlsExtractor.js
@@ -53,6 +55,7 @@ ControlsExtractor.prototype._extract = function (metadata, tripleStream, callbac
   tripleStream.on('end', function () {
     var controls = Object.create(DEFAULT_CONTROLS);
     controls.fragment = metadata.fragmentUrl;
+    var baseUrl = metadata.fragmentUrl.replace(/\?.*/, '')
 
     // Parse the links
     LINK_TYPES.forEach(function (property) {
@@ -95,7 +101,7 @@ ControlsExtractor.prototype._extract = function (metadata, tripleStream, callbac
         variables[mappings[rdf.RDF_SUBJECT]]   = triplePattern.subject;
         variables[mappings[rdf.RDF_PREDICATE]] = triplePattern.predicate;
         variables[mappings[rdf.RDF_OBJECT]]    = triplePattern.object;
-        return searchTemplate.expand(variables);
+        return baseUrl + searchTemplate.expand(variables);
       };
     }
     callback(null, controls);

It works at the HTTP query level because my URLs are now valid. But the final iterator is empty for some reason and I just can't find why.

Thanks for your help

Hi @cecton,

So, on the server, instead of returning ids like http://data.example.org/dbpedia-sparql#dataset it will returns #dataset.

If I may ask so, why? I can't imagine any circumstance in which this is a good idea.

But I'm struggling to do the change on the client part

The internal libraries, in particular N3.js, work with full IRIs, because the RDF model does.
The notion of a relative URL only exists in serializations, not in memory.

I.e., you can write #dataset in a document http://example.org/my/doc. However, that URL will not mean <#dataset> but <http://example.org/my/doc#dataset>, because relative URLs need to be resolved against a base URL before they can enter the RDF model.

So what you want is not possible, because there is no such thing as the notion of a relative URL in the RDF model; triples have full IRIs.

Thanks a lot for your answer @RubenVerborgh

The reason why we would need relative paths is because the service will possibly be served both over http and https, and possibly on multiple domains and the path may vary. I don't know the use cases so I can't really explain more than that.

For the hostname and port I know I can override the Host HTTP header. It would be better though if I had the possibility to use the "de facto" standard HTTP headers for forwarding: X-Forward-Host and X-Forward-Proto ( https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Common_non-standard_request_fields ) because then I can make the scheme vary between http and https depending on the origin of the request.

That will solve at least part of the issue. The only problem remaining is for the variable path... I suppose I could invent a X-Forward-Location (or something like that) to address it.

So...

  1. What do you think about implementing the use of X-Forward-Host and X-Forward-Proto? ExpressJS does something similar: http://expressjs.com/en/guide/behind-proxies.html
  2. Add a X-Forward-Location (this one is even less standard and maybe should go in a fork of this repository instead of here)

The reason why we would need relative paths is because the service will possibly be served both over http and https

That's a good use case indeed!

We'll only need to change the server code though, the client will automatically follow.
E.g., if the client sees #dataset on http://example.org/my/path, it will read http://example.org/my/path#dataset, and if it sees the same on https://example.org/my/path, it will read https://example.org/my/path#dataset.

It would be better though if I had the possibility to use the "de facto" standard HTTP headers

Even better if you could use the actual standard Forwarded header.

The only problem remaining is for the variable path...

I don't have an answer to this either, except for making a datasource available at multiple paths.

What do you think about implementing the use of X-Forward-Host and X-Forward-Proto?

I'm all for Forwarded. We could make it so on the server that the Forwarded header is preferred over Host.

Okay then I will see what I can do to implement the Forwarded behavior.

For the varying path I will check that on my side.

Great, pull requests to the server are welcome. If you're stuck, let me know.

BTW I've done something similar in the Node.js Solid server recently, in case you want inspiration. (It's not rocket science though, but just FYI.)

@RubenVerborgh I just checked the "Forwarded" HTTP header because that was the first time I heard about it but I think there is a problem with it. It implements the For, the Host and the Proto parts of the X-Forward-* headers. Host and Proto are really what we want, I suppose we can't do anything with the For part. Unfortunately the specifications of the Forwarded header for the "for=" part is a bit complicated: in case of IPv6, you need to parse double quoted strings. This makes things more complicated for the implementation here (unless you want something more approximate).

Also the X-Forward-For is implemented in Nginx, they provide a variable for that: http://nginx.org/en/docs/http/ngx_http_proxy_module.html Unfortunately they don't seem to provide the same kind of variable for Forwarded. I suppose many people are gonna need to use the X-Forward-* headers anyway. Therefore we will need to implement both headers (Forwarded and X-Forward-* as a fallback).

For now I propose that we only implement X-Forward-Host/Proto headers in this service. It will already be an upgrade compared to what we have currently.

I actually found a library that handle the parsing of the Forwarded HTTP header: https://github.com/lpinca/forwarded-parse

I can do it too if you want.

I just checked the "Forwarded" HTTP header because that was the first time I heard about it but I think there is a problem with it.

Let's hope not, it is a standardized header 😄

It implements the For, the Host and the Proto parts of the X-Forward-* headers.

That's right, it's the standardized form of the X-Forward-* headers and, hence, the way forward.

Host and Proto are really what we want, I suppose we can't do anything with the For part. Unfortunately the specifications of the Forwarded header for the "for=" part is a bit complicated:

That doesn't matter, the for part is optional. You only need to add what you want to use.

Also the X-Forward-For is implemented in Nginx, they provide a variable for that: http://nginx.org/en/docs/http/ngx_http_proxy_module.html Unfortunately they don't seem to provide the same kind of variable for Forwarded.

That's not needed, we can just set the Forwarded header:

server {
  server_name echo.verborgh.org;

  location / {
    proxy_pass http://127.0.0.1:1234$request_uri;
    proxy_set_header Forwarded 'host=$http_host;proto=$scheme';
    proxy_pass_header Server;
  }
}

I suppose many people are gonna need to use the X-Forward-* headers anyway.

Not in the long term, hopefully. X-* headers ought to disappear.

Therefore we will need to implement both headers (Forwarded and X-Forward-* as a fallback).

Yes, fallback is a good idea.

For now I propose that we only implement X-Forward-Host/Proto headers in this service.

We're really committed to standards and sustainability, so I prefer to have the standardized option first.