haskell / cabal

Official upstream development repository for Cabal and cabal-install

Home Page:https://haskell.org/cabal

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Document the relationship between ComponentId, UnitId, MungedPackageId, PackageId etc.

phadej opened this issue · comments

There are these types: ComponentId, UnitId, MungedPackageId, PackageId etc. which all seems to be same same but different. There should be a technical note in the code describing their relationship, with

  • links issues describing planned clean ups to be done (and what's blocking them, if they cannot be done)
  • whether the types are intended to be used in public text-based APIs
    • with remarks what's the current state, previous point
  • haddocks of these types pointing to the note.

Honestly, I have no idea what are the subtle differences about these types, and I won't read through GitHub issues to find out. The note in the code would be the right place to refer.


As an example, ComponentId is documented as

For non-Backpack components, this corresponds one to one with
the 'UnitId', which serves as the basis for install paths,
linker symbols, etc.

And there are no remarks how things are different in Backpack case.


Related issue #4761

There is a Mapping from semantic objects in this thesis (Fig. 3.5 and Fig. 6.1) to their definitions
in GHC
table in Edwards thesis. Something similar but for Cabal types would be great to have too.

FWIW, if someone points out what is the missing documentation for parsec parser implementation, I'd be glad to answer those questions in a note too.

@ezyang that wiki page doesn't mention MungedPackageId

I think you want this comment in MungedPackageName:

-- | Computes the package name for a library.  If this is the public
-- library, it will just be the original package name; otherwise,
-- it will be a munged package name recording the original package
-- name as well as the name of the internal library.
--
-- A lot of tooling in the Haskell ecosystem assumes that if something
-- is installed to the package database with the package name 'foo',
-- then it actually is an entry for the (only public) library in package
-- 'foo'.  With internal packages, this is not necessarily true:
-- a public library as well as arbitrarily many internal libraries may
-- come from the same package.  To prevent tools from getting confused
-- in this case, the package name of these internal libraries is munged
-- so that they do not conflict the public library proper.  A particular
-- case where this matters is ghc-pkg: if we don't munge the package
-- name, the inplace registration will OVERRIDE a different internal
-- library.
--
-- We munge into a reserved namespace, "z-", and encode both the
-- component name and the package name of an internal library using the
-- following format:
--
--      compat-pkg-name ::= "z-" package-name "-z-" library-name
--
-- where package-name and library-name have "-" ( "z" + ) "-"
-- segments encoded by adding an extra "z".
--
-- When we have the public library, the compat-pkg-name is just the
-- package-name, no surprises there!

But that's not true, with package environment containing haddock-library-1.6.0

λ> :show packages
active package flags:
  -package-id transformers-0.5.5.0
  -package-id containers-0.5.11.0
  -package-id array-0.5.2.0
  -package-id deepseq-1.4.3.0
  -package-id bytestring-0.10.8.2
  -package-id haddock-library-1.6.0-455b3b98c686fb127ac7d8c6fdd26ff2d06393b02f0e9642988bdc7e5a70fffc
  -package-id haddock-library-1.6.0-f24a0b3744bcfc55f2e3f40d91d735f5c9082e7cc5db16397cbfa2eb9edf9868
  -package-id integer-gmp-1.0.2.0
  -package-id ghc-prim-0.5.2.0
  -package-id rts
  -package-id base-4.11.1.0

should other haddock-library-1.6.0 be z-haddock-library; something doesn't match.

Did multiple public libraries patch changed something?

Or does something use the wrong type?

Yeah, that doesn't look too good. Maybe something is broken. Will have to look later.

... though ghc-pkg --package-db ... list shows

z-haddock-library-z-attoparsec-1.6.0

Does :show packages shows unit-id's which is yet different from MungedPackageId? but it's still named -package-id. I'm confused. I need a table. It doesn't need to be complete from the beginning, but it can be updated as people (atm me) ask questions.

Why we didn't teach ghc-pkg to differentiate between "public" and "internal" libs, i.e. is MungedPackageId a technical debt as ghc-pkg doesn't know about components? is it impossible to change, is the change the right thing to do, but nobody have time to do it?

To your first comment: That would make sense. But I was under the impression that unit IDs included component names, and that's not what I see above. So it may also be a rendering bug on GHC's part.

The right thing to do is teach ghc-pkg to know about components, but since Cabal supports older versions of ghc-pkg, it still needs to do the MungedPackageId workaround.

@ezyang so, is MungedPackageId a way to name library (public and internal ones) components (i.e. ComponentId for library components) outside the Cabal? Are those semantically equivalent?

@ezyang Can we convert MungedPackageName representation to actually be data MungedPackageName = MungedPackageName PackageName (Maybe UnqualComponentName); and push zdashcode stuff into Pretty / Parsec instances. That way the structure of the type would be closer to its semantics. And string representation "implementation detail" could be pushed into Pretty / Parsec instances.

That sounds good to me!

-- A lot of tooling in the Haskell ecosystem assumes that if something
-- is installed to the package database with the package name 'foo',
-- then it actually is an entry for the (only public) library in package
-- 'foo'.  With internal packages, this is not necessarily true:
-- a public library as well as arbitrarily many internal libraries may
-- come from the same package.

Aaaand I think there is more: In Backpack we talk about instantiations of packages. I am speaking about the case where there is a Backpack package foo instantiated with implementation bar and the same foo instantiated with implementation baz. Technically they are different packages under the same name.

My current hunch is that we need to extend our MungedPackageName encoding to include instantiations as well. Haven't come around to bake a small reproducer yet.