hexpm / hex_core

Reference implementation of Hex specifications.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

High-level API

wojtekmach opened this issue · comments

In #26 we made the API lower level, exposing HTTP details, and unsurprisingly we received some feedback that it's leaky: aseigo/hexagon@5991766#commitcomment-30019283 (thanks @aseigo!)

This issue is to collect more feedback and see if and how we want to tackle this.

@aseigo for Hexagon (e.g. here), I'd consider something like this:

defmodule Hexagon.Hex do
  @moduledoc false

  @doc """
  Get all non-retired versions of packages.
  """
  @spec get_active_versions() :: {:ok, %{String.t() => [Version.t()]}} | {:error, term()}
  def get_active_versions() do
    case :hex_repo.get_versions(config()) do
      {:ok, {200, _, %{packages: packages}}} ->
        {:ok, Enum.into(packages, %{}, &{&1.name, non_retired_versions(&1)})}

      {:ok, other} -> {:error, other}
      {:error, _} = error -> error
    end
  end

  defp non_retired_versions(package) do
    package.versions
    |> Enum.with_index()
    |> Enum.filter(fn {_, index} -> index not in package.retired end)
    |> Enum.map(&Version.parse!(elem(&1, 0)))
  end

  @spec get_active_versions!() :: %{String.t() => [Version.t()]}
  def get_active_versions!() do
    {:ok, versions} = get_versions()
    versions
  end

  defp config() do
    :hex_core.default_config()
    |> Map.put(:http_user_agent_fragment, "(hexagon/0.1.0) (httpc)")
    |> Map.put(:api_key, System.get_env("HEXAGON_HEX_KEY"))
  end
end

this way all code that talks to Hex is encapsulated in one module so if you need to add retries, caching, circuit breakers and whatnot (including any API changes to the hex_core itself) - that's the place. WDYT?

At the same time, we do want to provide a great out of the box experience so we're definitely keeping this issue open.

That looks like a great start! I think every heavy user of the library would build this themselves, so having it in the library itself would be very useful. IMHO this should be the preferred API with the lower-level API marked as possibly subject to change so caveat emptor if used directly in applications.

Cheers! :)

+1 to this. It would be great to not have to deal with http anything and for hex_core to follow the format_error convention so when an error is returned we can print a useful error message to the user.

Hm, now I'm thinking it may be that if decisions around caching and retrying is up to the user of hex_core then it may be the proper level of abstraction.

what "it" do you mean? HTTP codes, or a high-level API?

Retrying could be a configuration for the HTTP adapter I think, otherwise I think it's hard to know exactly when to retry.

The caching should be implemented by the user (where to store the cached files, how to maintain the local index etc.) but the high-level API can expose options, such as {etag, binary()} and return {ok, Package} | cached | {error, Reason}.

@aseigo by "it" I meant the current api exposing http details.

@tsloughter What is inherent to raw http protocol information that allows for caching and retry, or even makes it easier?

All it does is tie any decisions to caching and retry strategies (among other things) to the implementation details of both the protocol in question (which is an implementation detail the service, and not the service itself) and the implementation choices of the service (e.g. which http headers are set using which keys/values)

For example, you know if the codes are in 5xx range that the error is not on the client's side; it is generally safe to retry any operation multiple times in that case, particularly if the errors are due to things like gateways timing out. If you get a 4xx however, then the request itself is wrong, and there's usually no amount of retrying that will let you do anything. You also know you can do cheaper bandwidth checks for cache/version validity with HEAD requests for example.

Regarding caching, well HTTP has full blown cache policies. We can get around many of them in hex because we assume packages are immutable, but you could, for example, ask to not cache a package more than 1h when you are querying it in its first published hour, which puts a limit on its mutability.

Right, so the developer needs to know what 5xx and 4xx means. Not what it means to the service, but what HTTP codes mean and then map that knowledge to meaning in the service.

There are multpiple issues with that. First, is that the implication is that the service is forever and always tied to HTTP. Perhaps that is realistic, but it is also unecessary to bake into the library. Second, to use the library the application developer needs to know about HTTP codes, and speficially be able to accurately map from HTTP codes to service-specific meanings. That implies some information about the service itself. That is the entire point of a library: to provide an API to all those implementation details, so that one may learn the domain (hex.pm) and not the various tools it is built on.

There is nothing at all that prevents the library from mapping the HTTP codes, which are an implementation detail (literally), to the concepts of caching.

You give a very good example in your second paragraph: manipulating the HTTP cache to put a time limit on assumed immutability. If that is a valid use case, why would this not be codified in a straight-forward manner (e.g. a function which sets the caching, or even the mutatibility gaurantees, settings on a given package or response) rather than force a developer to understand the vaguaries of HTTP and how hex.pm uses it?

Web developers need to get their head out of HTTP as an API, because it is not one, and consider the point and purpose of their library. Every other domain of libraries does this. Can you, for instance, tell me the message encoding for Redis commands? Postgres? Few Redis or Postres users can, though, every single Redis and Postgres library developer can. That is the point of libraries.

Yeah that's all fair. I was just saying that those are inherent things to the HTTP protocol (as defined in its spec)

Is there anything currently holding up development on this? I think it's about time to make this happen. My argument for a high level api vs the caller having to deal with http status codes and so forth are :

  • The caller's code becomes defensive
  • The caller's code becomes buggy in part because it must be defensive but also because hex is still a moving target
  • Related to point 1 and 2... No one knows better than hex_core itself what is and isn't success or an error.
  • As others have said, everyone will just end up building a high level API (something similar has already been started in rebar3_hex).

I don't think this requires an overhaul to the entire lib, it could be as simple as providing hex_core_client. I know @wojtekmach and @ericmj were talking about API changes that are coming soon, perhaps this was part of it?

Is there anything currently holding up development on this?

No, nothing is holding it up. The first step would be to propose an API, I think it would be good to have it fairly fleshed out before starting to work on it so that we can discuss around it and make sure that everyone can use it (mix, rebar3, hexpm, etc.)

I know @wojtekmach and @ericmj were talking about API changes that are coming soon, perhaps this was part of it?

No, those changes are part of an overhaul of the current low-level API to make it more consistent across all modules.

Gotcha, well might as well flesh it out in rebar3_hex a bit since it's internal.