Type hinting / annotation (PEP 484) for ndarray, dtype, and ufunc

Question

Type hinting / annotation (PEP 484) for ndarray, dtype, and ufunc

InonS opened this issue 8 years ago · comments

Feature request: Organic support for PEP 484 with Numpy data structures.

Has anyone implemented type hinting for the specific numpy.ndarray class?

Right now, I'm using typing.Any, but it would be nice to have something more specific.

For instance if the numpy people added a type alias for their array_like object class. Better yet, implement support at the dtype level, so that other objects would be supported, as well as ufunc.

original SO question

Nathaniel J. Smith · Answer 1 · Wed Mar 02 2016 08:01:18 GMT+0800 (China Standard Time)

I don't think anyone's thought about it. Perhaps you would like to? :-)

I'm also going to suggest that if you want to followup on this that we close the gh issue and move the discussion to the mailing list, since it's better suited to open-ended design discussions.

Inon S. · Answer 2 · Thu Mar 03 2016 03:09:11 GMT+0800 (China Standard Time)

After getting this answer on SO, I've decided to close the issue.

Nathaniel J. Smith · Answer 3 · Thu Mar 03 2016 10:51:26 GMT+0800 (China Standard Time)

To be clear, we don't actually have any objection to supporting cool new python features or anything (rather the opposite); it's just that we're a volunteer run project without many resources, so stuff only happens if someone who's interested steps up to do it.

The mailing list is usually the best place if you're trying to start working on something or hoping to recruit some other interested folks to help.

Inon S. · Answer 4 · Fri Mar 04 2016 02:44:50 GMT+0800 (China Standard Time)

Thanks, @njsmith. I decided to start here because of the more orderly issue-tracking, as opposed to an unstructured mailing list (I was looking for a 'feature request' tag, among other features...)

Since the guy who answered me on SO got back to me with a viable solution, I decided to leave the matter.
Maybe the Numpy documentation should be updated to include his answer (please make sure to give him credit if you do).

Thanks, again!

Jules Gagnon-Marchand · Answer 5 · Fri Apr 28 2017 01:49:39 GMT+0800 (China Standard Time)

hello guys! I was just kindly wondering if there had been any progress on this issue. Thanks.

Eric Wieser · Answer 6 · Fri Apr 28 2017 01:54:52 GMT+0800 (China Standard Time)

There is some discussion about it on the mailing list here.

Stephan Hoyer · Answer 7 · Tue May 09 2017 04:10:45 GMT+0800 (China Standard Time)

I'm reopening this issue for those who are interested in discussing it further.

I think this would certainly be desirable for NumPy, but there are indeed a few tricky aspects of the NumPy API for typing to sort through, such as how NumPy currently accepts arbitrary objects in the np.array constructor (though we want to clean this up, see #5353).

Nicholas Nadeau, Ph.D., P.Eng. · Answer 8 · Fri Jun 16 2017 09:54:27 GMT+0800 (China Standard Time)

Some good work is being done here: https://github.com/machinalis/mypy-data

There's discussion about whether to push the work upstream to numpy or typeshed: machinalis/mypy-data#16

Nathaniel J. Smith · Answer 9 · Fri Jun 16 2017 10:54:25 GMT+0800 (China Standard Time)

CC @mrocklin

Henry Tanner · Answer 10 · Fri Sep 01 2017 17:14:31 GMT+0800 (China Standard Time)

This really would be a great addition to NumPy. What would be the next steps to push this up to typeshed or NumPy? Even an incomplete stub would be useful and I'm happy to help with a bit of direction?

Stephan Hoyer · Answer 11 · Fri Sep 01 2017 23:52:37 GMT+0800 (China Standard Time)

@henryJack The best place to start would probably be tooling: figure out how we can integrate basic type annotations into the NumPy repository (and ideally test them) in a way that works with mypy and supports adding them incrementally.

Then, start with extremely minimal annotations and we can go from there. In particular, I would skip dtype annotations for now since we don't have a good way to specify them (i.e., only do ndarray, not ndarray[int]).

If it's helpful, I have an alternative version of annotations that I've written for use at Google and could open source. But we have our own unique build system and do type checking with pytype, so there would likely be quirks porting it to upstream.

Jacques Kvam · Answer 12 · Sat Sep 02 2017 00:17:16 GMT+0800 (China Standard Time)

I suppose the only way to test annotations to actually run mypy on sample code snippets and check the output?

Would it be better to have the annotations integrated with the code or as separate stubs?

I suppose we should also learn from dropbox and pandas that we should start with the leaves of the codebase versus core data structures?

Jules Gagnon-Marchand · Answer 13 · Sat Sep 02 2017 01:16:13 GMT+0800 (China Standard Time)

@shoyer figure out how we can integrate basic type annotations
Wouldn't just putting https://github.com/machinalis/mypy-data/blob/master/numpy-mypy/numpy/__init__.pyi in the numpy module base directory do exactly that.. In a experimental version of some kind at least

Stephan Hoyer · Answer 14 · Sat Sep 02 2017 04:40:01 GMT+0800 (China Standard Time)

Would it be better to have the annotations integrated with the code or as separate stubs?

Integrated with the code would be lovely, but I don't think it's feasible for NumPy yet. Even with the comment string version of type annotations, we would need to import from typing on Python 2, and adding dependencies to NumPy is pretty much off the table.

Also, most of the core data structures and functions (things like ndarray and array) are defined in extension modules, we'll need to use stubs there anyways.

Wouldn't just putting https://github.com/machinalis/mypy-data/blob/master/numpy-mypy/numpy/__init__.pyi in the numpy module base directory do exactly that.. In a experimental version of some kind at least

Yes, I think that would be enough for external code. But how does mypy handle libraries with incomplete type annotations?

If possible, we might annotate numpy.core.multiarray directly, rather than just at the top level. (multiarray is the extension module where NumPy's core types like ndarray are defined.) I think this would allow NumPy itself to make use of type checking for some of its pure-Python modules.

Matthew Rocklin · Answer 15 · Sat Sep 02 2017 05:19:39 GMT+0800 (China Standard Time)

I'm curious, what is the type of np.empty(shape=(5, 5), dtype='float32')?

What is the type of np.linalg.svd?

Jacques Kvam · Answer 16 · Sat Sep 02 2017 05:45:31 GMT+0800 (China Standard Time)

I think @kjyv has taken a stab at defining those.

np.empty: https://github.com/kjyv/mypy-data/blob/master/numpy-mypy/numpy/__init__.pyi#L523
np.linalg.svd: https://github.com/kjyv/mypy-data/blob/master/numpy-mypy/numpy/linalg/__init__.pyi#L13

Matthew Rocklin · Answer 17 · Sat Sep 02 2017 06:04:17 GMT+0800 (China Standard Time)

It looks like types are parametrized, is this with their dtype? Is it also feasible to parametrize with their dimension or shape? How much sophistication does Python's typing module support?

Jacques Kvam · Answer 18 · Sat Sep 02 2017 06:42:05 GMT+0800 (China Standard Time)

Yea they are parameterized by their dtype. I'm no expert on the typing module but I think you could just have the ndarray type inherit Generic[dtype, int] to parameterize on ndim. I believe that's what Julia does. I'm not sure if you could easily parameterize on shape. Nor am I sure of what benefits that would bring or why it wasn't done that way in the first place.

Matthew Rocklin · Answer 19 · Sat Sep 02 2017 06:49:15 GMT+0800 (China Standard Time)

Can one use numpy dtypes in the dtype parameter or can this only be typing module types? Also it's odd that numpy.empty returns an array of type Any. I suspect it's challenging to inter and take the type from the dtype= keyword value?

…

On Sep 1, 2017 6:42 PM, "Jacques Kvam" ***@***.***> wrote: Yea they are parameterized by their dtype. I'm no expert on the typing module but I think you could just have the ndarray type inherit Generic[dtype, int] to parameterize on ndim. I believe that's what Julia does. I'm not sure if you could easily parameterize on shape. Nor am I sure of what benefits that would bring or why it wasn't done that why in the first place. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7370 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszMlYO7iHdoPE_GU--njIYICSVVZ0ks5seIhFgaJpZM4Hm_CR> .

Jacques Kvam · Answer 20 · Sat Sep 02 2017 07:32:31 GMT+0800 (China Standard Time)

You can use numpy dtypes, we just need to define them. That was done here with floating with np.std.

https://github.com/kjyv/mypy-data/blob/24ea87d952a98ef62680e812440aaa5bf49753ae/numpy-mypy/numpy/__init__.pyi#L198

I'm not sure, I don't think it's possible. I don't think you can modify the output type based on an argument's value. I think the best we can do is overload the function with all the type specializations we would care about.

https://docs.python.org/3/library/typing.html#typing.overload

Eric Wieser · Answer 21 · Sat Sep 02 2017 08:34:52 GMT+0800 (China Standard Time)

Another option might be to introduce some strict-typed aliases, so np.empty[dtype] is a function with signature (ShapeType) -> ndarray[dtype].

There's already some precedent for this with the unusual np.cast[dtype](x) function

Stephan Hoyer · Answer 22 · Sat Sep 02 2017 08:43:03 GMT+0800 (China Standard Time)

@jwkvam OK, so maybe dtype annotations are doable -- I was just suggesting starting simple and going from there.

I think TypeVar could possibly be used instead of overloads, maybe:

D = TypeVar('D', np.float64, np.complex128, np.int64, ...)  # every numpy generic type
def empty(dtype: Type[D]) -> ndarray[Type[D]]: ...

If I understand this correctly, this would imply empty(np.float64) -> ndarray[np.float64].

It would also be awesome to be able to type check shape and dimensionality information, but that I don't think current type checkers are up to the task. Generic[int] is an error, for example -- the arguments to Generic are required to be instances of TypeVar:
https://github.com/python/cpython/blob/868710158910fa38e285ce0e6d50026e1d0b2a8c/Lib/typing.py#L1131-L1133

We would also need to express signatures involving dimensions. For example, np.expand_dims maps ndim -> ndim+1.

I suppose one approach that would work is to define classes for each non-negative integer, e.g., Zero, One, Two, Three, ... and then define overloads for each. That would get tiring very quickly.

In TensorFlow, tf.Dimension() and tf.TensorShape() let you statically express shapes. But it's not something that is done in the type system. Rather, each function has a helper associated with it that determines the static shape of outputs from the shape of inputs and any non-tensor arguments. I think we would need something similar if we hoped to do this with NumPy, but there's nothing in Pythons typing system that suggests that this sort of flexibility.

Jacques Kvam · Answer 23 · Tue Sep 12 2017 03:06:39 GMT+0800 (China Standard Time)

@shoyer I see, yea that's disappointing. I was able to hack the following

_A = TypeVar('_A')
_B = TypeVar('_B', int, np.int64, np.int32)

class Abs(Generic[_A, _B]):
    pass

class Conc(Abs[_A, int]):
    pass

But I don't think that's leading anywhere...

It seems like your example works! It seemed like it worked better without the type constraints. I could test dtypes like str. I had to remove the default argument, couldn't figure out how to get that to work.

D = TypeVar('D')
def empty(shape: ShapeType, dtype: Type[D], order: str='C') -> ndarray[D]: ...

and code

def hello() -> np.ndarray[int]:
    return np.empty(5, dtype=float)

I get

error: Argument 2 to "empty" has incompatible type Type[float]; expected Type[int]

I'm a little confused because if I swap the types:

def hello() -> np.ndarray[float]:
    return np.empty(5, dtype=int)

I get no error. Even though I don't think anything is marked as covariant.

Even though the type system isn't as sophisticated as we'd like. Do you think it's still worth it? One benefit I would appreciate is better code completion thru jedi.

Stephan Hoyer · Answer 24 · Tue Sep 12 2017 04:45:59 GMT+0800 (China Standard Time)

I'm a little confused because if I swap the types:

I believe the issue here is that int instances is implicitly considered valid for float annotations. See the notes on the numeric tower in the typing PEP:
https://www.python.org/dev/peps/pep-0484/#the-numeric-tower

I think this could be avoided if we insist on NumPy scalar types instead of generic Python types for annotations, e.g., np.ndarray[np.integer] rather than np.ndarray[int].

This is actually a little easier than I thought because TypeVar has a bound argument. So revising my example:

D = TypeVar('D', bound=np.generic)
def empty(dtype: Type[D]) -> ndarray[D]: ...

I had to remove the default argument, couldn't figure out how to get that to work.

I'm not quite sure what you were getting at here?

Jacques Kvam · Answer 25 · Tue Sep 12 2017 05:02:46 GMT+0800 (China Standard Time)

I just tried to encode the default value of dtype in the stub. They did that in the mypy-data repo.

def empty(shape: ShapeType, dtype: DtypeType=float, order: str='C') -> ndarray[Any]: ...

from https://github.com/kjyv/mypy-data/blob/master/numpy-mypy/numpy/__init__.pyi#L523

Following your example, I wasn't able to get mypy to work with a default argument for dtype. I tried dtype: Type[D]=float and dtype: Type[D]=Type[float].

Stephan Hoyer · Answer 26 · Tue Sep 12 2017 08:22:15 GMT+0800 (China Standard Time)

I think dtype also needs to become a generic type, and then you need to set the default value to a numpy generic subclass like np.float64 rather than float, e.g.,

# totally untested!
D = TypeVar('D', bound=np.generic)

class dtype(Generic[D]):
    @property
    def type(self) -> Type[D]: ...

class ndarray(Generic[D]):
    @property
    def dtype(self) -> dtype[D]: ...

DtypeLike = Union[dtype[D], D]  # both are coercible to a dtype
ShapeLike = Tuple[int, ...]

def empty(shape: ShapeLike, dtype: DtypeLike[D] = np.float64) -> ndarray[D]: ...

Eric Wieser · Answer 27 · Tue Sep 12 2017 10:02:19 GMT+0800 (China Standard Time)

That's not right. D == type(dtype.type) == type, so your type parameterization is useless, as the only parameter used is D = type.

Stephan Hoyer · Answer 28 · Tue Sep 12 2017 10:07:46 GMT+0800 (China Standard Time)

@eric-wieser oops, I think it's fixed now.

Jelle Zijlstra · Answer 29 · Fri Oct 20 2017 08:17:27 GMT+0800 (China Standard Time)

There has been some related discussion on mypy's issue tracker (python/mypy#3540 mostly). There, we perceive the main issue to be that numpy arrays conceptually include their dimensions in their type, and the current type system doesn't really support that. If the mypy or typeshed projects can help in any way in getting typing working for numpy, please let us know!

Matthew Rocklin · Answer 30 · Fri Oct 20 2017 08:51:52 GMT+0800 (China Standard Time)

There has been some related discussion on mypy's issue tracker (python/mypy#3540 mostly). There, we perceive the main issue to be that numpy arrays conceptually include their dimensions in their type, and the current type system doesn't really support that. If the mypy or typeshed projects can help in any way in getting typing working for numpy, please let us know!

I could imagine encoding more or less information in parametrized types here. For example an array like np.empty((2, 3)) could be of any of the following types:

Array[float64, (2, 3)]
Array[float64, (n, m)]
Array[float64, ndim=2]
Array[float64]
Array

@JelleZijlstra what is your opinion here on what tools like mypy will likely be able to handle? How sophisticated can we get?

Stephan Hoyer · Answer 31 · Fri Oct 20 2017 09:16:41 GMT+0800 (China Standard Time)

It seems pretty clear that significant work in the type system would be required to support shapes and dimensionality. I would welcome that (and just wrote down a bunch of ideas in python/mypy#3540), but for now let's call that out of scope for NumPy. Just getting ndarray[float64] working seems hard enough, given numpy's complex type hierarchy and the challenges of generic types.

Mitar · Answer 32 · Fri Oct 20 2017 14:35:43 GMT+0800 (China Standard Time)

Yes, I think that the first step would be just to get basic typing support for numpy (and Pandas and sklearn) in, without taking into consideration shapes and other extra constraints on those types.

The issue with other extra constraints is that it is not enough just to describe a dtype (shape = 5,6), but there has to be a language to describe a constraint on that shape. You can imagine that you want to define a function which accepts only square numpy shapes as inputs, or one where one dimension has to be 2x the other one.

Something like that was done in contracts project.

I also think that PEP 472 would be great to support here, because then one could really do things like Array[float64, ndim=2].

Stephan Hoyer · Answer 33 · Fri Oct 20 2017 15:06:43 GMT+0800 (China Standard Time)

Indeed, PEP 472 would be nice for typing, though it would probably be one of the easier fixes to make this happen! (Please ping me if you are interested in restarting discussion around it, as I think there are also compelling use cases for named dimensions in indexing.)

Mitar · Answer 34 · Fri Oct 20 2017 15:19:51 GMT+0800 (China Standard Time)

I am not sure how I an contribute, but I definitely think it would be an awesome feature for multiple reasons. But, we are going in that direction, then it seems like [] just becomes a different way to call an object. So object(*args, **kwargs) does something, object[*args, **kwargs] something else, and then we can even generalize and also have object{*args, **kwags} and object<*args, **kwargs>. ;-)

Eric Wieser · Answer 35 · Fri Oct 20 2017 15:57:26 GMT+0800 (China Standard Time)

@mitar: Looking at it the other way, perhaps we should just be annotating with something like ndarray[float].constrain(ndim=2). We have plenty of available syntax already, and unlike decorators, annotations have no restrictions

Mitar · Answer 36 · Fri Oct 20 2017 16:02:00 GMT+0800 (China Standard Time)

I in fact tried the following syntax: ndarray[float](ndim=2), so overloading that on generics __call__ returns again a class, and not instance of a class. But it became tricky for types which are not generics.

I think the main issue is with ndarray[float] support, because ndarray[float] is not something which really exists in ndarray, one would have to change ndarray itself, which I am not sure is a good general principle to do (changing upstream code to support better typing).

One other approach could be to have new type of type variables, ConstrainedTypeVar, where you could do something like ConstrainedTypeVar('A', bound=ndarray, dtype=float, ndim=2) or something like that, and then you would use A as a var in the function signature. But this becomes very verbose.

Stephan Hoyer · Answer 37 · Tue Nov 21 2017 05:53:10 GMT+0800 (China Standard Time)

I wrote up a doc with some ideas for what typing array shapes could look like with broadcasting and a notion of dimension identity.

The core ideas include:

Adding a DimensionVar primitive that allows for symbolic identities for array dimensions
Recognizing ... (Ellipsis) as an indicating array broadcasting.

For example, to type np.matmul/@:

from typing import DimensionVar, NDArray, overload

I = DimensionVar('I')
J = DimensionVar('J')
K = DimensionVar('K')

@overload
def matmul(a: NDArray[..., I, J], b: NDArray[..., J, K]) -> NDArray[..., I, K]: ...

@overload
def matmul(a: NDArray[J], b: NDArray[..., J, K]) -> NDArray[..., K]: ...

@overload
def matmul(a: NDArray[..., I, J], b: NDArray[J]) -> NDArray[..., I]: ...

These would be enough to allow for typing generalized ufuncs. See the doc for a more details and examples.

Eric Wieser · Answer 38 · Tue Nov 21 2017 06:51:25 GMT+0800 (China Standard Time)

A possible solution to supporting both dtypes and shapes, if we're already choosing to keep NDArray and ndarray distinct:

NDArray[float].shape[I, J, K]
NDArray[float]
NDArray.shape[I, J, K]

Ivan Smirnov · Answer 39 · Tue Nov 21 2017 20:27:00 GMT+0800 (China Standard Time)

Just a thought, would it make sense to also have a shortcut like this?

NDArray.ndim[2]  # NDArray.shape[..., ...]
NDArray[float].ndim[2]  # NDArray[float].shape[..., ...]

— which could simplify a number of signatures, especially in downstream code.

Stephan Hoyer · Answer 40 · Wed Nov 22 2017 02:58:56 GMT+0800 (China Standard Time)

@aldanor I think you mean NDArray.shape[:, :] (... means "zero or more dimensions", which isn't quite right in this context). But yes, that looks reasonable.

Quick update on typing for dtypes: I wrote a toy module using the approach I described above that uses np.generic subclasses with Generic for parameterized ndarray/dtype types.

This mostly seems to work with mypy as I would expect, including type inference with the equivalent of np.empty(..., dtype=np.float32). It does fails to catch one of my intentional type errors involving a Union type (I'll file a bug report later).

I think this would probably be good enough for dtypes. Without typing support for literal values, we couldn't do type inference with dtype specified as strings (dtype='float32'). Perhaps more problematically, it also doesn't handle type inference from Python types like dtype=float. But these types can be ambiguous (e.g., dtype=int maps to np.int64 on Linux and np.int32 on Windows), so it's probably better to use explicit generic types anyways. It's OK if type inference doesn't work in every possible case, as long as specifications dtype=float are inferred as a dtype of Any rather than raising an error.

Eric Wieser · Answer 41 · Wed Nov 22 2017 03:14:47 GMT+0800 (China Standard Time)

But these types can be ambiguous (e.g., dtype=int maps to np.int64 on Linux and np.int32 on Windows)

That's not ambiguous - in all cases, that maps to np.int_, which is the C long type.

Stephan Hoyer · Answer 42 · Sat Nov 25 2017 16:29:51 GMT+0800 (China Standard Time)

I've written the mailing list to gain consensus on writing type-stubs for NumPy in a separate package:
https://mail.python.org/pipermail/numpy-discussion/2017-November/077429.html

Henry Tanner · Answer 43 · Mon Nov 27 2017 08:33:40 GMT+0800 (China Standard Time)

Amazing, thanks @shoyer !

Stephan Hoyer · Answer 44 · Wed Dec 06 2017 09:03:18 GMT+0800 (China Standard Time)

Per the consensus on the mailing list, I'd like to declare https://github.com/numpy/numpy_stubs open for business!

We'll start with basic annotations (no dtype support). If anyone wants to put together a basic PR to add the PEP 561 scaffolding for the repo that would be appreciated!

Henry Tanner · Answer 45 · Wed Dec 06 2017 09:32:08 GMT+0800 (China Standard Time)

YES, YES, 1000X YES!

Stephan Hoyer · Answer 46 · Sun Dec 10 2017 09:09:39 GMT+0800 (China Standard Time)

Heads up for anyone following this issue: I've opened two issues on the python/typing tracker:

ndarray typing in general (python/typing#513)
syntax for ndarray typing (python/typing#516)

Mark Harfouche · Answer 47 · Thu May 17 2018 07:52:03 GMT+0800 (China Standard Time)

What is the expected release time for for the typing feature?
Is there any reason to attempt to maintain 2.7 compatibility?
An early comment mentionned difficulty in integrating with python 2. Since then, it seems that numpy has changed its stance.

Things are moving targets, I know, but would it make sense to target something like Python 3.4-3.6?

Ivan Levkivskyi · Answer 48 · Thu May 17 2018 07:59:54 GMT+0800 (China Standard Time)

What is the expected release time for for the typing feature?

There were several discussions about this (integer generics a.k.a. simple dependent types) at PyCon, I will write a proto-PEP based on these discussions and the original doc written by @shoyer soon. My target is to get the PEP written, implemented in mypy and accepted in time for Python 3.8 beta 1 (also subsequent backport of the new types in typing for Python 2 is highly likely)

Stephan Hoyer · Answer 49 · Thu May 17 2018 08:02:06 GMT+0800 (China Standard Time)

@hmaarrfk as for writing type annotations for NumPy itself, we're started doing that in a separate repository: https://github.com/numpy/numpy-stubs. You should be able to install and use those stubs in the current state (with the latest version of mypy), but they are far from complete. Help would be appreciated!

Mark Harfouche · Answer 50 · Thu May 17 2018 08:16:03 GMT+0800 (China Standard Time)

Sure, I'm glad to help where I can, and I saw the repository. I just know that these things take time.
I saw the repo and noticed a commit mentioned 2.7 compatibility, which is why I asked.

Python 3.8 beta release time is mid 2019. Numpy mentionned that they would stop new features at the end of 2018.

Typing seems to be a "nice-to-have" feature for numpy as opposed to a "must-have". As such, targeting two languages seems a little hard, especially, if the feature will start to appear well beyond numpy's own support deadline.

I'll be interested in reading what @ilevkivskyi has to say in the PEP.

Stephan Hoyer · Answer 51 · Thu May 17 2018 08:25:35 GMT+0800 (China Standard Time)

@hmaarrfk You raise a good point about Python 2.7 support. To be honest, I haven't thought it through fully yet. I do expect that we will eventually drop it, but probably not before mypy itself drops Python 2.7 support, given that a major use-case for typing is writing Python 2/3 compatible code.

For now, it doesn't seem to require many compromises to support Python 2 in our type annotations, so I'm happy to leave it in, especially given that it came from a contributor who was evidently interested in it.

David Mascharka · Answer 52 · Tue Jun 18 2019 07:15:36 GMT+0800 (China Standard Time)

I wanted to poke this again to see whether discussions have progressed, especially regarding type hinting shape information, which would be particularly useful in a lot of my applications. Is there a status tracker or is this not a high enough priority to have any resources dedicated toward it?

Ashwin V. Mohanan · Answer 53 · Tue Jun 18 2019 16:10:14 GMT+0800 (China Standard Time)

In transonic, a project to generalize numpy accelerators, we have a type-hint syntax as an alternative for Pythran annotations which uses comments. It does not work well with mypy right now, but I wonder if it is useful. See an example: https://transonic.readthedocs.io/en/latest/examples/type_hints.html

Chad Dombrova · Answer 54 · Mon Jul 01 2019 03:41:26 GMT+0800 (China Standard Time)

In case it's useful to this issue, I'll mention that I've made a tool for converting docstrings to type comments: https://pypi.org/project/doc484/

I use this with pre-commit in several projects to keep docstrings in sync with type comments.

You'll still need to convert the types in your docstrings to be PEP484 compliant.

Johan Vergeer · Answer 55 · Wed Nov 13 2019 14:21:07 GMT+0800 (China Standard Time)

Hello everyone,

I wanted to do my part so I forked the repo and started to add type hints. My idea was to work bottom-up, so start with the "simple" functions and work upwards from there. (Starting with the low-hanging fruits)

For example in _string_helpers.py, I added type hints to some variables and function.

LOWER_TABLE: str = "".join(_all_chars[:65] + _ascii_lower + _all_chars[65 + 26:])
UPPER_TABLE: str = "".join(_all_chars[:97] + _ascii_upper + _all_chars[97 + 26:])

def english_lower(s: str) -> str:
    """ Apply English case rules to convert ASCII strings to all lower case.
   ...
    """
    lowered = s.translate(LOWER_TABLE)
    return lowered

What do you think about this?

Ben Samuel · Answer 56 · Thu Nov 14 2019 22:57:11 GMT+0800 (China Standard Time)

I'd recommend doing a little bit and opening a PR to get comments. numpy is targeting older pythons (3.5 introduced annotations, IIRC) and this would break those builds, so maybe look into writing .pyi files or check the mypy docs to see if there's a bit more guidance on best practices.

Stephan Hoyer · Answer 57 · Thu Nov 14 2019 23:01:16 GMT+0800 (China Standard Time)

We have been doing annotations so far in the separate numpy-stubs repository, but it has been a slow process.

…

On Thu, Nov 14, 2019 at 9:57 AM Ben Samuel ***@***.***> wrote: I'd recommend doing a little bit and opening a PR to get comments. numpy is targeting older pythons (3.5 introduced annotations, IIRC) and this would break those builds, so maybe look into writing .pyi files or check the mypy docs to see if there's a bit more guidance on best practices. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7370?email_source=notifications&email_token=AAJJFVVH5CLAHPJKWJHDQ73QTVRMXA5CNFSM4B436CI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEECDL4A#issuecomment-553924080>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJJFVTWTKLP63AK2C2IUW3QTVRMXANCNFSM4B436CIQ> .

crusaderky · Answer 58 · Thu Nov 14 2019 23:05:37 GMT+0800 (China Standard Time)

@bsamuel-ui numpy currently requires Python 3.5+, and the NEP-29 [1] states it should be ok to bump it to 3.6+
[1] https://numpy.org/neps/nep-0029-deprecation_policy.html

Jelle Zijlstra · Answer 59 · Fri Nov 15 2019 00:24:33 GMT+0800 (China Standard Time)

Annotations (for function args and return types) are actually supported in all Python 3 versions; 3.6 only introduced variable annotations. In early Python 3 version (<3.5) you have to use a backport of the typing module.

Johan Vergeer · Answer 60 · Fri Nov 15 2019 04:02:40 GMT+0800 (China Standard Time)

I have added a pull request with my first .pyi file. It needs some work but it would be nice if you guys can take a look at it so I can get some initial feedback

Matti Picus · Answer 61 · Fri Nov 15 2019 07:41:52 GMT+0800 (China Standard Time)

As mentioned in gh-14905, we have the beginnings of a stub library in https://github.com/numpy/numpy-stubs. It would be great to get that going with proper tests and then we could decide how best to package it or merge it into numpy/numpy proper.

Johan Vergeer · Answer 62 · Sat Nov 16 2019 21:52:25 GMT+0800 (China Standard Time)

My bad @mattip. I will remove the pull request from numpy and add a new one to numpy-stubs

mhnatiuk · Answer 63 · Wed Jul 01 2020 15:43:57 GMT+0800 (China Standard Time)

it's still opened, but I believe that numpy already supports that in the master version

Gil Shoshan · Answer 64 · Fri Jul 10 2020 22:00:16 GMT+0800 (China Standard Time)

Hi,
I am trying to define a a type alias for vector 3d, so an numpy array of shape (3,) of dtype int32.

(I know I can type hint with np.ndarray but how do I get more specific ? I read here all and didn't get it, I also search for a tutorial on how to use numpy types for typing in Python but didn't find anything. )

Like it's possible to write:

from typing import Tuple
VectorType = Tuple[int, int, int]

I tried to do:

VectorType = np.ndarray(shape=(3,), dtype=np.int32)

Is it the correct way to do ?

Can someone here point me a tutorial or an example please ?

Gil Shoshan · Answer 65 · Fri Jul 10 2020 22:08:48 GMT+0800 (China Standard Time)

Also, I found this repo which is "Type hints for Numpy" : https://github.com/ramonhagenaars/nptyping

Will Numpy integrate this ?
@ramonhagenaars

johnthagen · Answer 66 · Thu Jul 16 2020 01:19:56 GMT+0800 (China Standard Time)

@mattip

As mentioned in gh-14905, we have the beginnings of a stub library in https://github.com/numpy/numpy-stubs.

It looks like this has been merged into the main repo. Has this been released, or is it on the roadmap? Trying to decide if we should explore something third party like https://github.com/ramonhagenaars/nptyping or (ideally) wait for/use officially supported type hints.

Thanks.

Matti Picus · Answer 67 · Thu Jul 16 2020 01:33:22 GMT+0800 (China Standard Time)

We have merged much of numyp-stubs into the development branch. You can follow the progress by looking for the static typing label. Hopefully this will be part of the next release. You can try out what is currently merged by using a HEAD version of numpy. We are always looking for contributors: constructive review, documentation, and comments on the issues and pull requests are a few ways to help.

Stephan Hoyer · Answer 68 · Thu Jul 16 2020 02:09:47 GMT+0800 (China Standard Time)

(I know I can type hint with np.ndarray but how do I get more specific ? I read here all and didn't get it, I also search for a tutorial on how to use numpy types for typing in Python but didn't find anything. )

There's lots of interest in this area, but more specific typing (dtypes and dimensions) for NumPy arrays isn't supported yet.

Eric Cousineau · Answer 69 · Tue Aug 11 2020 02:49:50 GMT+0800 (China Standard Time)

@GilShoshan94 FWIW I've filed ramonhagenaars/nptyping#27

Eric Cousineau · Answer 70 · Tue Aug 11 2020 03:20:52 GMT+0800 (China Standard Time)

Also FWIW, here's what pybind11 uses for "annotating" type signatures in the __doc__ strings for overloads:
https://github.com/pybind/pybind11/blob/0af7fe6c1943e6a9043e4e01c4bc9059108a6c98/include/pybind11/eigen.h#L195-L208
https://github.com/pybind/pybind11/blob/0af7fe6c1943e6a9043e4e01c4bc9059108a6c98/tests/test_eigen.py#L185
https://github.com/pybind/pybind11/blob/0af7fe6c1943e6a9043e4e01c4bc9059108a6c98/tests/test_numpy_array.py#L290

sfolje0 · Answer 71 · Mon Nov 23 2020 06:17:41 GMT+0800 (China Standard Time)

@shoyer

There's lots of interest in this area, but more specific typing (dtypes and dimensions) for NumPy arrays isn't supported yet.

Hey community,
to encourage you to do this I am trying to show my interest in using this feature by kindly posting some nostalgic (and harmless) links. I vaguely remember this kind of type from school, when we played around a bit (one exercises only) with vectors and matrices in Lean theorem prover. It looked like something in this section in 11'th paragraph when it talks about vectors (..."Vector operations are handled similarly:"...) but really more like in solutions of exercises 3 and 4 in the same documentation text. I know it is an overkill but I am posting just for inspiration.

Eric Wieser · Answer 72 · Mon Nov 23 2020 06:32:38 GMT+0800 (China Standard Time)

Funny you should mention Lean, I've been working with it solidly for the last few months. While interesting from its own right, my impression is that the heavy dependent-typing used by lean would be a significant challenge for mypy to adopt, and arguably not a worthwhile one - at a certain point, these things are better as language features. For the case of numpy, there are plenty of weaker type systems which are good enough role models.

Ryan Peach · Answer 73 · Thu Mar 18 2021 02:37:58 GMT+0800 (China Standard Time)

Have we looked at using PEP 593 to improve numpy typing? For instance we could use Annotated[np.ndarray, Shape[3, N, 5], DType[np.int64]] etc.

Matthew Rahtz · Answer 74 · Wed Apr 14 2021 05:37:00 GMT+0800 (China Standard Time)

For anyone still following along here: one blocker for this has been variadic generics, which we're trying to make some progress on with PEP 646, which is currently in review by the Python steering council.

Some other links that might be of interest are:

TensorAnnotations, a library of shape-aware stubs for TensorFlow and JAX, with NumPy support hopefully coming soon (disclaimer: I'm the main dev)
tsanley which does something similar but with a runtime checker.
torchtyping ditto, but only for PyTorch right now.
PyContracts ditto, but general-purpose and much more flexible.

Have we looked at using PEP 593 to improve numpy typing?

This is definitely one option, and it'd be pretty cool to see a runtime checker which employed these kinds of annotations. The reason I'm personally gunning for the approach suggested in PEP 646 is that it would allow existing tooling like existing static type checkers to verify the kinds of typing things we care about, with (relatively) little extra effort. (OK, we'd have to implement support for 646 in e.g. Mypy, but that's probably simpler than writing a static analysis tool from scratch.)

Matthew Rahtz · Answer 75 · Wed May 05 2021 04:28:00 GMT+0800 (China Standard Time)

Another quick update: Pradeep Kumar Srinivasan and I will be giving a talk on the approach we've been experimenting with over the past 6 months at the PyCon 2021 Typing Summit next week, Catching Tensor Shape Errors Using the Type Checker: https://us.pycon.org/2021/summits/typing/ We'll be discussing how it works, what it looks like in practice, and a few of its current limitations. Hope to see you there!

Bas van Beek · Answer 76 · Sat May 08 2021 00:17:57 GMT+0800 (China Standard Time)

Hi all,

Some time ago, back in #17719, dtype support was introduced for np.ndarray (including a placeholder slot for shapes). As a followup, earlier today a new PR was submitted (#18935) adding a runtime-subscriptable alias for np.ndarray[Any, np.dtype[~Scalar]], the latter providing a convenient (and compact) alias for annotating arrays with a given dtype and unspecified shape:

>>> import numpy as np
>>> import numpy.typing as npt

>>> print(npt.NDArray)
numpy.ndarray[typing.Any, numpy.dtype[~ScalarType]]

>>> print(npt.NDArray[np.float64])
numpy.ndarray[typing.Any, numpy.dtype[numpy.float64]]

>>> NDArrayInt = npt.NDArray[np.int_]
>>> a: NDArrayInt = np.arange(10)

>>> def func(a: npt.ArrayLike) -> npt.NDArray[Any]:
...     return np.array(a)

Neil Girdhar · Answer 77 · Thu Jul 08 2021 03:24:17 GMT+0800 (China Standard Time)

@BvB93 This is awesome. I've been using your change since numpy 1.21 came out. Are you planning on adding runtime subscripting like np.ndarray[Any, Any], np.integer[Any], np.floating[Any], np.dtype[Any]? MyPy complains if I don't put Any since I have the flag for no implicit generics, and I end up having to work around numpy's missing __class_getitem__.

Jan Rüegg · Answer 78 · Thu Jul 08 2021 03:31:38 GMT+0800 (China Standard Time)

@NeilGirdhar excellent idea. As a workaround in the meantime, you could try using from __future__ import annotations, which should make it work at runtime (as usually annotations are not evaluated at runtime anymore).

Neil Girdhar · Answer 79 · Thu Jul 08 2021 03:35:16 GMT+0800 (China Standard Time)

@rggjan Yup, thanks! That works in many cases except when you want to do set aliases like this. Also, pylint rightly complains that these type objects are not subscriptable.

George Sakkis · Answer 80 · Sat Aug 28 2021 19:29:49 GMT+0800 (China Standard Time)

Is it possible to annotate structured arrays and if so how? I tried ndarray[Any, [('i', np.int16), ('q', np.uint16)]] but got Bracketed expression "[...]" is not valid as a type (likewise for other failed attempts).

Jasha Sommer-Simpson · Answer 81 · Sun Aug 29 2021 01:16:23 GMT+0800 (China Standard Time)

Is it possible to annotate structured arrays and if so how?

I believe that as of Numpy 1.21 it is not yet possible.

Bas van Beek · Answer 82 · Mon Aug 30 2021 20:40:35 GMT+0800 (China Standard Time)

Is it possible to annotate structured arrays and if so how? I tried ndarray[Any, [('i', np.int16), ('q', np.uint16)]] but got Bracketed expression "[...]" is not valid as a type (likewise for other failed attempts).

Unfortunately not, and I very much doubt that list-of-tuples-syntax will ever be something that mypy will understand (not without some serious plugin magic, at least).

As for structured arrays in general, there two main challenges here:

How to type the necasary structure into the np.void dtype.

Ideally we'd make it generic w.r.t. to something like TypedDict, so field dtypes can be assigned to each key (i.e. field names). Making np.void generic is however complicated by its flexibility, as it can be used for representing opaque bytes sequences ("V10"), dtypes with a field size ((np.float64, 8)) and structured dtypes with a set of keys and matching field dtypes ([("a", np.float64)] or [("a", np.float64, 8)]).

Only the last category can reasonably be express via a TypedDict, so this raises the question what do about the other two? Make it generic w.r.t. a trinary Union? Treat all three categories as type-check-only subclasses? This is very much an open question.

How to let ndarray.__getitem__ and __setitem__ access the named fields.

Letting ndarray access and use the fields encoded within the dtype will be a challenge of its own. Namelly, the only two types that currently deal with arbitrary named fields (NamedTuple and TypedDict) have, in my experience, proven to be less than cooperative when dispatching with the help of protocols. This is very much a mypy bug, but in this context it's probably going to be a detrimental one.

For example:

from typing import TypedDict, Protocol, TypeVar, TYPE_CHECKING

KT = TypeVar("KT")
VT = TypeVar("VT")
KT_contra = TypeVar("KT_contra", contravariant=True)
VT_co = TypeVar("VT_co", covariant=True)

class SupportsGetItem(Protocol[KT_contra, VT_co]):
    def __getitem__(self, key: KT_contra, /) -> VT_co: ...

class TestDict(TypedDict):
    a: int
    b: str

def getitem(dct: SupportsGetItem[KT, VT], key: KT) -> VT: ...

test_dict: TestDict
if TYPE_CHECKING:
    reveal_type(getitem(test_dict, "a"))  # Revealed type is "builtins.object*"
    reveal_type(getitem(test_dict, "b"))  # Revealed type is "builtins.object*"
    reveal_type(getitem(test_dict, "c"))  # Revealed type is "builtins.object*"

Bas van Beek · Answer 83 · Thu Sep 16 2021 21:11:16 GMT+0800 (China Standard Time)

@NeilGirdhar there is currently a PR up for making number, dtype and ndarray runtime-subscriptable (#19879),
though note that this functionality does have a hard dependency on python >= 3.9.

The hope is to wrap things up before the next 1.22 release.

Jasha Sommer-Simpson · Answer 84 · Sun Nov 21 2021 11:32:11 GMT+0800 (China Standard Time)

That works in many cases except when you want to do set aliases like this.

@NeilGirdhar The workaround I've been using in that case is to put quotes around the subscripted numpy type:

    RealArray = npt.NDArray["np.floating[Any]"]

Why-not-now · Answer 85 · Tue Dec 06 2022 19:25:42 GMT+0800 (China Standard Time)

Is it possible to annotate structured arrays and if so how? I tried ndarray[Any, [('i', np.int16), ('q', np.uint16)]] but got Bracketed expression "[...]" is not valid as a type (likewise for other failed attempts).

Unfortunately not, and I very much doubt that list-of-tuples-syntax will ever be something that mypy will understand (not without some serious plugin magic, at least).

As for structured arrays in general, there two main challenges here:

1. How to type the necasary structure into the `np.void` dtype.
   Ideally we'd make it generic w.r.t. to something like `TypedDict`, so field dtypes can be assigned to each key (_i.e._ field names). Making `np.void` generic is however complicated by its flexibility, as it can be used for representing opaque bytes sequences (`"V10"`), dtypes with a field size (`(np.float64, 8)`) and structured dtypes with a set of keys and matching field dtypes (`[("a", np.float64)]` or `[("a", np.float64, 8)]`).
   Only the last category can reasonably be express via a `TypedDict`, so this raises the question what do about the other two? Make it generic w.r.t. a trinary `Union`? Treat all three categories as type-check-only subclasses? This is very much an open question.

2. How to let `ndarray.__getitem__` and `__setitem__` access the named fields.
   Letting `ndarray` access and use the fields encoded within the dtype will be a challenge of its own. Namelly, the only two types that currently deal with arbitrary named fields (`NamedTuple` and `TypedDict`) have, in my experience, proven to be less than cooperative when dispatching with the help of protocols. This is very much a mypy bug, but in this context it's probably going to be a detrimental one.
   For example:
   ```python
   from typing import TypedDict, Protocol, TypeVar, TYPE_CHECKING
   
   KT = TypeVar("KT")
   VT = TypeVar("VT")
   KT_contra = TypeVar("KT_contra", contravariant=True)
   VT_co = TypeVar("VT_co", covariant=True)
   
   class SupportsGetItem(Protocol[KT_contra, VT_co]):
       def __getitem__(self, key: KT_contra, /) -> VT_co: ...
   
   class TestDict(TypedDict):
       a: int
       b: str
   
   def getitem(dct: SupportsGetItem[KT, VT], key: KT) -> VT: ...
   
   test_dict: TestDict
   if TYPE_CHECKING:
       reveal_type(getitem(test_dict, "a"))  # Revealed type is "builtins.object*"
       reveal_type(getitem(test_dict, "b"))  # Revealed type is "builtins.object*"
       reveal_type(getitem(test_dict, "c"))  # Revealed type is "builtins.object*"
   ```

Sorry for the random reply but is this now supported now that we are beyond1.22? If so, the numpy typing documents have not made it clear, in the dev or in the stable release