gitpython-developers / GitPython

GitPython is a python library used to interact with Git repositories.

Home Page:http://gitpython.readthedocs.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unclear rationale for pipe usage in `output_stream` doc

EliahKagan opened this issue · comments

The Git.execute method docstring includes this in the description of its output_stream parameter:

GitPython/git/cmd.py

Lines 990 to 991 in 1e044ea

This feature only has any effect if `as_process` is False. Processes will
always be created with a pipe due to issues with subprocess.

I think this is referring to the subprocess module, which could be clarified by changing subprocess to :mod:`subprocess`. But I'm not sure what issues this is referring to, or what information it seeks to impart to the reader.

To be clear, I'm not arguing that output_stream shouldn't use a pipe. Instead, I think this part of the docstring should be clarified (and I am unsure how to do so).

If this intends to be general, and vagueness is acceptable, yet it is considered better to have this "due to" clause than simply remove it, and the issue really is in subprocess itself, then it could be changed to:

due to limitations of :mod:`subprocess`

But I hope something more specific can be said.

The full context is:

GitPython/git/cmd.py

Lines 987 to 994 in 1e044ea

:param output_stream:
If set to a file-like object, data produced by the git command will be
output to the given stream directly.
This feature only has any effect if `as_process` is False. Processes will
always be created with a pipe due to issues with subprocess.
This merely is a workaround as data will be copied from the
output pipe to the given output stream directly.
Judging from the implementation, you shouldn't use this parameter!

Thanks for bringing this to my attention.

After reading the context, I take most offence from Judging from the implementation, you shouldn't use this parameter! just because I don't find it particularly helpful and think it can be removed without substitute.

From what I read, subprocesses cannot have an output stream which isn't an actual open file (and thus have a file-descriptor) or which isn't a pipe. Thus, if output_stream isn't anything compatible, it will have to perform an inefficient copy from the end of a pipe to whatever stream-like object the user passed.

I think most users wouldn't care about this detail, as long as output is streamed to output_stream which should always work. Further, maybe by now it's possible to pass all kinds of output_stream implementations to processes directly (and Python abstracts over that by itself internally), so said 'issues' are no more.

In any case, I believe that everything starting from Processes will always… is unsubstantial and can be removed.

I'm wondering if the motivation for the extra text is to prevent users from being misled by the phrase "will be output to the given stream directly," especially by its use of "directly," and if some rephrasing, or added note, should accompany the removal of that text.

(This is sort of conceptually related to #1762, but improving this docstring probably need not wait on that.)

I think this will work:

         :param output_stream:
             If set to a file-like object, data produced by the git command will be
-            output to the given stream directly.
-            This feature only has any effect if `as_process` is False. Processes will
-            always be created with a pipe due to issues with subprocess.
-            This merely is a workaround as data will be copied from the
-            output pipe to the given output stream directly.
-            Judging from the implementation, you shouldn't use this parameter!
+            copied to the given stream instead of being returned as a string.
+            This feature only has any effect if `as_process` is False.

Wonderful! Thank you.