w3c / rdf-canon

RDF Dataset Canonicalization (deliverable of the RCH working group)

Home Page:https://w3c.github.io/rdf-canon/spec/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What a reasonable way of defending against poison graphs?

iherman opened this issue · comments

This is more a question to those who have a practical experience of the algorithm implementation out there, ie, @davidlehn @dlongley and maybe @gkellogg.

As a protection against poison graphs what I did in my implementation is as follows:

  1. I have set a maximum limit to the recursion in 5.4.5.1.. I.e., if the "level" of the recursion has reached a limit, I stop the algorithm. At this moment, the limit is set to 50, but I must admit this more a gut feeling rather than anything else.
  2. The user has the possibility, via a separate call in the interface, to set a lower maximum limit (but cannot exceed that limit).

Is this a reasonable approach? If yes, is 50 indeed a reasonable limit?

An alternative would be to let the system-wide limit be set at positive infinity (more exactly, the maximum integer representable in JavaScript), and the user can set a limit by herself.

Any advice/experience is welcome...

@iherman,

In our implementation we have decided to set a default limit of 1 recursion per blank node. We do this rather than setting a global recursion limit. This allows for the processing of cases where any two blank nodes may look the same according to their immediate surroundings (direct relationships), but it requires that any differences be discoverable through one level of indirect relationships. We think if two blank nodes cannot be detected as different by then, you're almost always dealing with a poison graph and the default setting should have to be changed manually to allow for processing such inputs.

@iherman,

To address your specific questions:

Is this a reasonable approach? If yes, is 50 indeed a reasonable limit?

If this is experimentally arrived at, i.e., with this limit every call to your implementation is sufficiently responsive, it's probably ok. If it's arbitrarily chosen, I can't really comment on it. I think the right way to think of it is that an attacker will always try to use inputs that will come in just under whatever limit you set. I've offered our reasoning for a default setting for our implementation above.

An alternative would be to let the system-wide limit be set at positive infinity (more exactly, the maximum integer representable in JavaScript), and the user can set a limit by herself.

For security purposes, I don't think this is a good idea. I think a default limit should be set. It's obviously better to expose the ability to set a limit than not at all, but users should not be expected to have to set anything most of the time. Most users will not know what to set, should be protected from attackers by default, and they should only need to consider changing the setting if they actually have some useful data that is failing to be processed because of it.

So, setting some default that allows most useful data to be processed and yet does not allow for an attacker to use too many resources is best. This is the same messaging I tried to convey in the security considerations section as well.

@iherman,

To address your specific questions:

Is this a reasonable approach? If yes, is 50 indeed a reasonable limit?

If this is experimentally arrived at, i.e., with this limit every call to your implementation is sufficiently responsive, it's probably ok. If it's arbitrarily chosen, I can't really comment on it.

I do not have any experience with this, hence my question. What I saw, when I explicitly printed things out, is that the maximum number of recursion in our test suite is 7 (or 8? I do not remember). So 50 seemed to be a good number to tackle for all reasonable usage out there. (I first set it to 20, actually... Setting this value is black art.)

The speed, on my machine (MacBook Pro 2012 edition) is instantaneous for our tests; I do not have yet an example that would really shake the code in this respect.

I am also trying to get the library running with deno, which can be used to compile Typescript into machine code (and hence increase the speed compared to node.js), but deno changes so frequently that what used to work does not anymore :-(

I think the right way to think of it is that an attacker will always try to use inputs that will come in just under whatever limit you set. I've offered our reasoning for a default setting for our implementation above.

An alternative would be to let the system-wide limit be set at positive infinity (more exactly, the maximum integer representable in JavaScript), and the user can set a limit by herself.

For security purposes, I don't think this is a good idea. I think a default limit should be set.

Thanks. This was my thinking, but I needed feedback on that.

It's obviously better to expose the ability to set a limit than not at all, but users should not be expected to have to set anything most of the time. Most users will not know what to set, should be protected from attackers by default, and they should only need to consider changing the setting if they actually have some useful data that is failing to be processed because of it.

So, setting some default that allows most useful data to be processed and yet does not allow for an attacker to use too many resources is best. This is the same messaging I tried to convey in the security considerations section as well.

We are on the same line...

Maybe I should leave it as is, and experiences (if this code will be picked up by the user community, that is...) will direct to set the right number...

Would it be a good idea to have some example "poison" graphs, or could this lead to problems? Not in the spec, but in some examples directory in the repo.

@gkellogg,

I imagine it could lead to problems as most things can :P. We shouldn't rule it out to add such things in the future if we find that it's helpful.

Discussed 2023-06-21. Ended with a resolution https://www.w3.org/2023/06/21-rch-minutes.html#r01 to add a normative statement to the effect that implementations MUST abort if the algorithm exceeds a certain computational threshold (decided by the implementation). At least one example will be available in the test suite.

At least one example will be available in the test suite.

After small experiments, I found that all the test cases except for test 044-046 succeeded with a limit of one recursion per blank node (= @dlongley's default limit), whereas test 044-046 require at least twelve recursions per blank node. (@davidlehn probably mentioned these cases in the meeting)
Can't we say that the latter tests are already available as examples that exceed a "certain computational threshold"?

In my tests, I find that the only existing test that recurses deeper than 4 levels is test054, so we may want to compare run logs. For reference, see my log for test044.

In looking for poison graphs, I've found a reference to Clique graphs, where Aiden's paper identifies these as being problematic. I've found that a graph of six vertices which is maximally interconnected will run for 13 seconds my my MacBook Air without recursing deeper than 2 levels. A greater number of vertices takes exponentially longer time without reaching deep recursion levels.

_:e0 <http:/example.com/p> _:e0 .
_:e0 <http:/example.com/p> _:e1 .
_:e0 <http:/example.com/p> _:e2 .
_:e0 <http:/example.com/p> _:e3 .
_:e0 <http:/example.com/p> _:e4 .
_:e0 <http:/example.com/p> _:e5 .
_:e0 <http:/example.com/p> _:e6 .
_:e1 <http:/example.com/p> _:e0 .
_:e1 <http:/example.com/p> _:e1 .
_:e1 <http:/example.com/p> _:e2 .
_:e1 <http:/example.com/p> _:e3 .
_:e1 <http:/example.com/p> _:e4 .
_:e1 <http:/example.com/p> _:e5 .
_:e1 <http:/example.com/p> _:e6 .
_:e2 <http:/example.com/p> _:e0 .
_:e2 <http:/example.com/p> _:e1 .
_:e2 <http:/example.com/p> _:e2 .
_:e2 <http:/example.com/p> _:e3 .
_:e2 <http:/example.com/p> _:e4 .
_:e2 <http:/example.com/p> _:e5 .
_:e2 <http:/example.com/p> _:e6 .
_:e3 <http:/example.com/p> _:e0 .
_:e3 <http:/example.com/p> _:e1 .
_:e3 <http:/example.com/p> _:e2 .
_:e3 <http:/example.com/p> _:e3 .
_:e3 <http:/example.com/p> _:e4 .
_:e3 <http:/example.com/p> _:e5 .
_:e3 <http:/example.com/p> _:e6 .
_:e4 <http:/example.com/p> _:e0 .
_:e4 <http:/example.com/p> _:e1 .
_:e4 <http:/example.com/p> _:e2 .
_:e4 <http:/example.com/p> _:e3 .
_:e4 <http:/example.com/p> _:e4 .
_:e4 <http:/example.com/p> _:e5 .
_:e4 <http:/example.com/p> _:e6 .
_:e5 <http:/example.com/p> _:e0 .
_:e5 <http:/example.com/p> _:e1 .
_:e5 <http:/example.com/p> _:e2 .
_:e5 <http:/example.com/p> _:e3 .
_:e5 <http:/example.com/p> _:e4 .
_:e5 <http:/example.com/p> _:e5 .
_:e5 <http:/example.com/p> _:e6 .
_:e6 <http:/example.com/p> _:e0 .
_:e6 <http:/example.com/p> _:e1 .
_:e6 <http:/example.com/p> _:e2 .
_:e6 <http:/example.com/p> _:e3 .
_:e6 <http:/example.com/p> _:e4 .
_:e6 <http:/example.com/p> _:e5 .
_:e6 <http:/example.com/p> _:e6 .

Based on this, some total number of algorithm cycles, or runtime limitation scaled to the size of the input dataset may also be necessary.

Aiden also references Miyazaki graphs, but I haven't found a concrete example to use.

If anyone has some example problematic graphs to add to our test suite, please reference.
Please reference different types of problematic graphs to include in tests.

Based on this, some total number of algorithm cycles, or runtime limitation scaled to the size of the input dataset may also be necessary.

But that is obviously extremely difficult, because it becomes even less measurable than the recursion level.

I've found that a graph of six vertices which is maximally interconnected will run for 13 seconds my my MacBook Air without recursing deeper than 2 levels.

I have run the same graph on my machine, and it ran in about 2 seconds, which also included loading the file itself (on a MacBook Pro).

At least one example will be available in the test suite.

After small experiments, I found that all the test cases except for test 044-046 succeeded with a limit of one recursion per blank node (= @dlongley's default limit), whereas test 044-046 require at least twelve recursions per blank node. (@davidlehn probably mentioned these cases in the meeting) Can't we say that the latter tests are already available as examples that exceed a "certain computational threshold"?

Note that, on the other hand, 044-046 may need a higher recursion limit, but the algorithm runs without further ado and without a real computational load. Which means, for me, that these tests should not be considered exceeding a reasonable threshold (I do not remember the exact recursion numbers, but it was looking at our tests that I decided to put that magic threshold relatively high, ie, 50. I think we need a test that would require at least that much recursion).

I have run the same graph on my machine, and it ran in about 2 seconds, which also included loading the file itself (on a MacBook Pro).

Different runtimes will vary, but the point is that a graph with exhibits low recursion can still be poison and guarding against deep recursion is only one way to detect (although more likely to run into memory pressure rather than CPU pressure).

Note that, on the other hand, 044-046 may need a higher recursion limit, but the algorithm runs without further ado and without a real computational load. Which means, for me, that these tests should not be considered exceeding a reasonable threshold (I do not remember the exact recursion numbers, but it was looking at our tests that I decided to put that magic threshold relatively high, ie, 50. I think we need a test that would require at least that much recursion).

As I mentioned above, I did not see that deep of a recursion level on 044, and referenced the log from my run above. I did notice a deeper limit (6) on 054.

Based on the discussion above, I believe that we should limit the number of calls to the Hash N-Degree Quads (HNDQ) algorithm, not the recursion depth. (It seems that Dave's implementation is already doing this.)

The reason for this is that there exist poison graphs, like the clique that @gkellogg showed above, that keep the recursion depth shallow but exponentially increase the number of HNDQ executions. The cause lies in Step 5.4 of the Hash N-Degree Quads Algorithm, where a loop is run for every permutation of the blank node list. This leads to the HNDQ algorithm being executed on a factorial order of the number of blank nodes in the worst cases, such as cliques.
In fact, the above 7-clique was processed in 0.75s on my implementation on ThinkPad,
but the 8-clique took 8.32s, the 9-clique took 99.17s, and the 10-clique took an outrageous 1307.20s.
But still these cliques need 2-level recursions.

(Details) When canonicalizing a clique composed of $n$ blank nodes ($n$-clique), the level 1 HNDQ is executed inside the loop in Step 5 of the Canonicalization Algorithm, which is executed $n$ times, the number of blank nodes. Within each of these $n$ loops, the level 2 HNDQ is executed inside the loop in Step 5.4 of the Hash N-Degree Quads Algorithm. This loop is repeated for the number of permutations of the blank node list, i.e., $(n-1)!$, and within the loop, HNDQ is executed $(n-1)$ times for each permutation. As a result, level 2 HNDQ is executed $(n-1)! * (n-1)$ times. In the case of a clique, since all blank nodes appear at level 2, no further recursion occurs, and the recursion depth is limited to 2. However, the total number of HNDQ executions still amounts to $n * (n-1)! * (n-1)$ times, i.e., $(n-1) * n!$ times, which diverges faster than an exponential function.

@yamdan this makes sense, and I am happy to modify the algorithm accordingly. The original question is, nevertheless, still valid: what is the (initial and maximal) number of calls that an implementation should set to keep it within reasonable bounds? At this time, I have only used my gut feeling for this, which is hardly a good engineering practice... :-)

@iherman This is just an approximation, but since there are about 30,000 HNDQ calls in the 7-clique mentioned above, we can estimate the limit as $30000 T/t$, where $t$ is the time it took for the canonicalization of 7-clique, and $T$ is the upper limit of time that can be allocated for the canonicalization.
As $t=0.75$ in my environment, if for instance the upper limit of the canonicalization time is set at 0.1s, a good value for the limit seems to be around 4000.
If we consider a more general environment ($t=10$) and stricter requirements ($T=0.01$), a safe default value might be around 30.

Great analysis @yamdan. It seems to me that the number of allowed calls should probably scale with the number of blank nodes in the dataset, which we know in advance. I suspect that almost all cases would be handled with at most 2n, where n is the total number of blank nodes in the dataset. Even 20n would execute fairly quickly.

Such a limitation would be informative, but it’s worth noting the ways in which the algorithm can blow up, and strategies to mitigate.

@yamdan:

As t=0.75 in my environment, if for instance the upper limit of the canonicalization time is set at 0.1s, a good value for the limit seems to be around 4000.
If we consider a more general environment (t=10) and stricter requirements (t=0.01), a safe default value might be around 30.

@gkellogg:

I suspect that almost all cases would be handled with at most 2n, where n is the total number of blank nodes in the dataset. Even 20n would execute fairly quickly.

Whether we suggest the approach of @gkellogg or @yamdan (at this moment, I am tempted to go the way proposed by @gkellogg) I believe the right approach is to

  1. Set an initial limit very generously (i.e., set a value of 4000 (or 5000) for the former and, say, 50 for the latter approach)
  2. Allow the user to set a lower value for the purpose of a specific graph.

Implementations should not be too "strict" by default, that is.

Actually, we can also "prune" the approach of @gkellogg a bit. Instead of using the number of blank nodes of the graph, we can consider the number of blank nodes in the hash to blank nodes map when entering step 5). After all, these are the problematic blank nodes; all those that have been properly identified before this step are well behaving after all...

  1. Allow the user to set a lower value for the purpose of a specific graph.

Experience has shown that the user (or administrator, for multi-user systems) should always be given the ability to adjust such values (limits) in either direction. We should not be in the business of telling people how much compute-power or wait-time they can throw at their chosen task. It is entirely valid to warn them of potential ill effects of setting some limit too high (or too low), but it should be left to them to decide whether they are willing to let the computation run as long as it may take.

Instead of using the number of blank nodes of the graph, we can consider the number of blank nodes in the hash to blank nodes map when entering step 5).

I would think implementations could use this to set a limit that would cause the algorithm to behave as if it had an acceptable approximate worst case time complexity. You could say that if there are n blank nodes in that map, then your limit could be, e.g., n, n^2, or n * log(n) iterations / recursive calls to the Hash N-Degree Quads sub algorithm. I'd think you'd still want a max cap on top of that (either that, or a max cap on n).

  1. Allow the user to set a lower value for the purpose of a specific graph.

Experience has shown that the user (or administrator, for multi-user systems) should always be given the ability to adjust such values (limits) in either direction. We should not be in the business of telling people how much compute-power or wait-time they can throw at their chosen task. It is entirely valid to warn them of potential ill effects of setting some limit too high (or too low), but it should be left to them to decide whether they are willing to let the computation run as long as it may take.

There is a difference between what a system administrator can do and what a lambda user accessing the library could do. In practice, what I do is to set a maximum value in a constant of the code and I also offer an API for the end user to set a lower value. The former is fine for the person who has access to the code itself (this is open source after all), the latter for the lambda user who uses that, e.g., via an online service. Opening the API call to set to any value may lead to security problems, I presume.

this is open source after all

Nonetheless, such settings should be settings, not values that are hard-coded in the source. We should strive to not perpetuate the Priesthood of the Coder.

Yes, the sysadmin should be able to set limits which make sense for their overall environment, and the lambda user should only be able to adjust limits within the admin-set range, optimally with clear notice to "contact the administrator of ToolX" if they want/need a different active setting.

All that said, we are writing a standard which may stick for years or decades, during which compute power will certainly increase and may turn what we see today as a reasonable limit on power-users into training-wheels that treat power-users and novices.

I do not believe we should set such limits at all.

Advising deployers and/or implementers that "actions that seem innocuous may lock up your system, return Cartesian products, etc., if you set this value above x or below y" is entirely reasonable.

The normative statement will about about generally guarding against runaway resource use. Everything else would be in an informative note. The tests (still looking for more problematic graphs) will clarify the specific limits expected, at least enough that the poison graph examples should fail for an implementation with reasonable defaults.

@TallTed I just want to understand what you are proposing. My reading is of your proposal is below. (The word "limit" refers to a number yet to be defined.)

  • By default, there should be no limit set in the code.
  • There should be a way for the system administrator to set the limit. That can be a configuration file somewhere.
  • The user can set a limit within that administrator limit or, in the absence thereof, can set any limit whatsoever.

Is this what you propose?

Is this what you propose?

Yes, you've got it.

Is this what you propose?

Yes, you've got it.

Well then... I am not sure if I agree with it. I believe there should be a limit set in the code for any given installation. Otherwise, an installation could open up for an attack.

@iherman,

Is this what you propose?

Yes, you've got it.

Well then... I am not sure if I agree with it. I believe there should be a limit set in the code for any given installation. Otherwise, an installation could open up for an attack.

+1 ... I agree that implementations should try to be "safe by default" (for modern day computers), but allow configurability for specific use. I think we have agreement that this doesn't mean the spec algorithm text itself has to be prescriptive, but that we will have tests for implementations to use to exercise this.

An installation could also be opened up for an attack by a remotely or locally built malicious binary, after modifying the hard-coded limit.

There are plenty of other tools on most machines that could be used as attack vectors. I don't see us advocating their removal nor hard-coded limits.

I'm sure we can come up with near infinite other potential/possible vulns. I don't believe preventing every such potentiality is worth the pain such preventions create. I also believe that preventing every such potentiality would effectively terminate the usefulness of any tool.

Setting root ownership and RWX permissions on the config file I've suggested should be sufficient protection for virtually any deployment environment, and overkill for many if not most.

A possibly viable compromise would be to hard code a limit that takes effect if there's no config file, and is ignored if there is a config file with a higher limit.

A possibly viable compromise would be to hard code a limit that takes effect if there's no config file, and is ignored if there is a config file with a higher limit.

I can be o.k. with that, although I am not sure how to handle the notion of a 'config file' in an operating system independent way. It may become a bit convoluted...

Config file, environment variable, or similar... I'm sure we can come up with a reasonably straightforward way to guide this.

@TallTed + me:

A possibly viable compromise would be to hard code a limit that takes effect if there's no config file, and is ignored if there is a config file with a higher limit.

I can be o.k. with that, although I am not sure how to handle the notion of a 'config file' in an operating system independent way. It may become a bit convoluted...

@TallTed

Config file, environment variable, or similar... I'm sure we can come up with a reasonably straightforward way to guide this.

I hear you @TallTed, and I ended up adding to my implementation the "usual" approach: configuration files are possible in the user's $HOME and local directory (ie, $PWD) as well as environment variables when starting up. And there are built-in values. These are merged using the "usual" priority order to yield the final value (I also use this mechanism for setting the hash function) with built-in values as fallbacks.

I am not sure whether it is worth/necessary/proper to add a reference to this type of mechanism into the document, though.

While discussion of configuration files and other limits is fine, I'm not sure how it bears on the specification itself. Applogies for not having a PR in on this yet (and I'm just taking off for a week again today), but really I think the normative requirement is to simply prohibit implementations from not halting on any input with a more vaguely worded note on how they might do this.