ContentSquare / chproxy

Open-Source ClickHouse http proxy and load balancer

Home Page:https://www.chproxy.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature] Configuration option to allow passing all query parameters

genzgd opened this issue · comments

Is your feature request related to a problem? Please describe.
Currently CHProxy severely restricts the query parameters that can be passed to ClickHouse server for "security reasons". This was the source of a major incompatibility with the official Python client (ClickHouse/clickhouse-connect#191), and also cripples a lot of other behavior for HTTP based ClickHouse clients. The workaround by changing settings at the user/params_group level is somewhat clunky and incomplete.

This sort of security issue seems better handled at the client and not the proxy level in any case.

Describe the solution you'd like
Checking the allowed parameters list here should be optional and controlled by a configuration setting.

Describe alternatives you've considered
Setting user level parameters in parameter groups is insufficient, since it reduces flexibility and doesn't work with, for example, query parameters used for ClickHouse server side binding. Another possibility might be adding another configuration option to dynamically add to the "allowed parameter names", but this again adds additional complexity and overhead with little apparently "security" benefit.

Hi, we've discussed a bit with the maintainers.
Since we're not the original maintainers, it's difficult to know all the reasons behind this limitation.
But, we know that with the current codebase, if we allow people to put the parameters they want, it could mess with the caching mechanism and people might get results they don't expect.
Indeed, the key used to store the result of the query is based on the query and specific params, cf https://github.com/ContentSquare/chproxy/blob/master/cache/key.go#L61. If we let people add the param they want, they could create a situation were:

  • user A does query X with param W, the query is sent to clickhouse then the result is cached
  • user B does query X with param Y, chproxy used the cached result
  • If param Y alters the result given by query X or its(for example if the param is distributed_product_mode, output_format_write_statistics ... ) user B will have a wrong result.

It will take time to see if there are other side effects with the params and if we can avoid the issue with the cache by reworking the way the key is generated.

In the meantime, if the topic is urging, you can create a PR by adding an allow_unsafe_params field in the config (default value to false) to bypass this behavior.

Thanks for looking into it! I understand the cache key concern and I'm not sure what the right balance is without adding a lot of complexity. I'll see about doing a fairly simple PR as you suggested for allow_unsafe_params.

one advice, you should wait before the PR #330 is merged (should be merge in max a few days) because it will be in conflict with some parts of the code you need to modify

FYI: I have merged #330 so this should no longer block you.