GoogleContainerTools / container-diff

container-diff: Diff your Docker containers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pip differ reports invalid package names with dashes in them

weakcamel opened this issue · comments

container-diff reports the package names with dashes replaced with underscores.

I believe the reason for that are the few lines of code here:

packageDir := regexp.MustCompile("^([a-z|A-Z|0-9|_]+)-(([0-9]+?\\.){2,3})(dist-info|egg-info)$")

Container-diff makes an assumption that a package name is always
a sub-string of the installation path - which isn't always the case.

Ideally package names could come from pip, pkg_resources and such - however I understand it may not be possible without tampering with the container too much (or making the analysis complicated). Would it not be possible to use pip to report that information back to container-diff, likely the second best method would be to check the package meta-data file, for example:

# a package name consistent with what PIP says can be retrieved from PKG-INFO, e.g.:

root@c76f09d5f006:/usr/local/lib/python3.5/dist-packages/strict_rfc3339-0.7.egg-info# pip show -f strict-rfc3339
Name: strict-rfc3339
Version: 0.7
Summary: Strict, simple, lightweight RFC3339 functions
Home-page: http://www.danielrichman.co.uk/libraries/strict-rfc3339.html
Author: Daniel Richman, Adam Greig
Author-email: main@danielrichman.co.uk
License: GNU General Public License Version 3
Location: /usr/local/lib/python3.5/dist-packages
Requires: 
Required-by: sm-adapters
Files:
  __pycache__/strict_rfc3339.cpython-35.pyc
  strict_rfc3339-0.7.egg-info/PKG-INFO
  strict_rfc3339-0.7.egg-info/SOURCES.txt
  strict_rfc3339-0.7.egg-info/dependency_links.txt
  strict_rfc3339-0.7.egg-info/top_level.txt
  strict_rfc3339.py

root@c76f09d5f006:/usr/local/lib/python3.5/dist-packages/strict_rfc3339-0.7.egg-info# grep Name /usr/local/lib/python3.5/dist-packages/strict_rfc3339-0.7.egg-info/PKG-INFO
Name: strict-rfc3339

Expected behavior

Package name for a package with a dash in the name, e.g.
https://pypi.org/project/strict-rfc3339/
should be reported as strict-rfc3339.

Actual behavior

Package name with an underscore, e.g. https://pypi.org/project/strict-rfc3339/
is being reported as strict_rfc3339.

Information

  • container-diff version: v0.14
  • Operating system: MacOS

Steps to reproduce the behavior

  1. Create a Dockerfile with the following content:
FROM python:3.7.2
RUN pip install strict-rfc3339
  1. build your image:
$ docker build -t img:foo .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM python:3.7.2
 ---> 7c5fd2af3815
Step 2/2 : RUN pip install strict-rfc3339
 ---> Running in f45a86b79e1e
Collecting strict-rfc3339
  Downloading https://files.pythonhosted.org/packages/56/e4/879ef1dbd6ddea1c77c0078cd59b503368b0456bcca7d063a870ca2119d3/strict-rfc3339-0.7.tar.gz
Building wheels for collected packages: strict-rfc3339
  Running setup.py bdist_wheel for strict-rfc3339: started
  Running setup.py bdist_wheel for strict-rfc3339: finished with status 'done'
  Stored in directory: /root/.cache/pip/wheels/bb/af/c9/b6e9fb5f9b2470e4ed2a7241c9ab3a8cdd3bc8555ae02ca2e6
Successfully built strict-rfc3339
Installing collected packages: strict-rfc3339
Successfully installed strict-rfc3339-0.7
Removing intermediate container f45a86b79e1e
 ---> ff9dfbe06148
Successfully built ff9dfbe06148
Successfully tagged img:foo
  1. Run the container-diff and observe the output:
$ container-diff analyze -t pip daemon://img:foo

-----Pip-----

Packages found in img:foo:
NAME                   VERSION        SIZE         INSTALLATION
-configobj             5.0.6          87.5K        /usr/lib/python2.7/dist-packages
-mercurial             4.0            5.5M         /usr/lib/python2.7/dist-packages
-pip                   18.1           6M           /usr/local/lib/python3.7/site-packages
-setuptools            40.6.3         1.5M         /usr/local/lib/python3.7/site-packages
-six                   1.10.0         29.4K        /usr/lib/python2.7/dist-packages
-strict_rfc3339        0.7            6K           /usr/local/lib/python3.7/site-packages
-wheel                 0.32.3         77.2K        /usr/local/lib/python3.7/site-packages

$ container-diff diff -t pip daemon://python:3.7.2 daemon://img:foo

-----Pip-----

Packages found only in python:3.7.2: None

Packages found only in img:foo:
NAME                   VERSION        SIZE
-strict_rfc3339        0.7            6K

Version differences: None

whereas for comparison:

$ docker run -it --entrypoint sh  img:foo 
# pip list 
Package        Version
-------------- -------
pip            18.1   
setuptools     40.6.3 
strict-rfc3339 0.7    
wheel          0.32.3 
# 

@weakcamel thanks again for this issue. As I said in #281 there are definitely a few issues here. I opened a PR to try and use the top_level.txt for egg modules which seems to be pretty reliable, but I agree that the METADATA file can also give us the information we want.

I'll work on a fix that tries to use both of those files to more reliably get the package name.

@nkubala Fantastic, thank you - and I'll be very happy to at least test the changes.