tensorflow / data-validation

Library for exploring and validating machine learning data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dependency Issues

mukeshmithrakumar opened this issue · comments

Hi,

Problem:

  • I am unable to use tfdv with poetry due to dependencies not being resolved. For simplicity and debugging purposes, below are the steps to recreate the issues:

Steps:

  • Create a requirements.in file with tensorflow-data-validation as a requirement
  • run pip-compile requirements.in

Output:

  • This will result in a protobuf dependency issue. This is a small example of a larger dependency issue
There are incompatible versions in the resolved dependencies:
  protobuf>=3.6.0 (from tensorflow-serving-api==2.10.0->tfx-bsl==1.10.1->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4,>=3.13 (from tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4,>=3.12.2 (from apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4.0.0dev,>=3.19.0 (from google-cloud-bigquery-storage==2.13.2->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<5.0.0dev,>=3.19.0 (from google-cloud-recommendations-ai==0.7.1->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4.0.0dev,>=3.12.0 (from google-cloud-bigquery==2.34.4->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<5.0.0dev,>=3.20.1 (from google-api-core==2.10.0->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4.0.0dev (from google-cloud-datastore==1.15.5->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<5.0.0dev,>=3.19.0 (from proto-plus==1.22.1->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4.0.0dev (from google-cloud-videointelligence==1.16.3->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4.0.0dev (from google-cloud-bigtable==1.7.2->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<3.20,>=3.9.2 (from tensorboard==2.10.0->tensorflow==2.10.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4,>=3.13 (from tensorflow-metadata==1.10.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<5.0.0dev,>=3.19.0 (from google-cloud-pubsub==2.13.6->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4.0.0dev (from google-cloud-spanner==1.19.3->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<3.21,>=3.13 (from tfx-bsl==1.10.1->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<3.20,>=3.9.2 (from tensorflow==2.10.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4.0.0dev (from google-cloud-vision==1.0.2->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<4.0.0dev (from google-cloud-language==1.3.2->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<5.0.0dev,>=3.19.0 (from google-cloud-dlp==3.9.0->apache-beam[gcp]==2.41.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))
  protobuf<5.0.0dev,>=3.15.0 (from googleapis-common-protos==1.56.4->tensorflow-metadata==1.10.0->tensorflow-data-validation==1.10.0->-r tfdv-requirements.in (line 1))

If you are doing pip install on requirements, this isn't a problem since pip doesn't care about dependencies, but for anything that is in production that uses poetry or any dependency management this is a problem.

You notice the same with absl-py package as well.

There are incompatible versions in the resolved dependencies:
  absl-py==1.2.0 (from -r tfdv-requirements.in (line 1))
  absl-py<0.13,>=0.9 (from tensorflow-data-validation==1.5.0->-r tfdv-requirements.in (line 176))
  absl-py<0.13,>=0.9 (from tfx-bsl==1.5.0->-r tfdv-requirements.in (line 182))
  absl-py>=1.0.0 (from tensorflow==2.10.0->-r tfdv-requirements.in (line 175))
  absl-py<0.13,>=0.9 (from tensorflow-metadata==1.5.0->-r tfdv-requirements.in (line 179))
  absl-py>=0.4 (from tensorboard==2.10.0->-r tfdv-requirements.in (line 172))

So, here is my question and well may be a request.
Request:

  1. Can we add a validation step in the build process that makes sure the package dependencies are resolved at least within tensoflow packages before a major release.

Question:

  1. other than moving away from poetry and just using pip, is there any resolution for this?

@mukeshmithrakumar,

In the official website, it is recommended to install data validation using PyPI, or build from docker (if using Linux) or build from source.
You can also refer to compatible versions for package version compatibility. Hope this helps. Thank you!

A new version of google-api-core has been published which allows protobuf>=3.19.5 . This should resolve the current dependency conflict. The constraint for protobuf < 3.20 and protobuf < 3.21 in tensorflow packages should be removed to prevent future conflicts.

https://pypi.org/project/google-api-core/2.10.2/

@mukeshmithrakumar,

Can you please try using google-api-core which allows protobuf>=3.19.5, and see if it resolves your dependency issue as mentioned above comment.
Thank you!

Awesome, thanks @singhniraj08 , that helps