Usage of Asserts Breaks Code

Question

Usage of Asserts Breaks Code

dsanders11 opened this issue 5 years ago · comments

This project uses asserts in a lot of places (I found about 100 cases) where they should not be used. They're being used to validate arguments to functions, where the normal Python way would be to use a TypeError or a ValueError. Assertions are essentially debug code, not code for validating arguments.

Any usage of optimize mode or Python (-O or -OO, or PYTHONOPTIMIZE) strips out assertions, making this code useless. That's very problematic if you're generating schema dynamically, as you won't get errors and this library will generate an invalid GraphQL schema.

At this point it's probably too late to change the uses of assert to a proper TypeError or ValueError, without a major version bump as dependent code in other projects may be catching AssertionError.

I'm proposing all instances of assert foo, "Error" be replaced with:

if not foo:
    raise AssertionError("Error")  # Should be a TypeError

This will at least make the code work in optimize modes where assertions are stripped.

I'll gladly make a PR doing so, but it's hard to tell if this project is still actively developed.

Christoph Zwerschke · Answer 1 · Fri Apr 19 2019 00:37:34 GMT+0800 (China Standard Time)

@dsanders11, this project is still active and has maintainers (again). In graphql-core-next, which will become the next version of graphq-core, I'm using TypeErrors and ValueErrors for the same reasons. Your proposed PR for graphql-core sounds reasonable to me.

Eran Kampf · Answer 2 · Sat Apr 20 2019 01:05:42 GMT+0800 (China Standard Time)

I actually disagree with disallowing asserts.
The asserts are not here to test anything that might actually happen in runtime, and should be fun to be removed in production environment (meaning they shouldn't be used for things like user data validation etc)

If any assert does not match the statement above we should fix it...

I would use the following guidelines on when not to use assert (blatantly stolen from https://github.com/emre/notes/blob/master/python/when-to-use-assert.md):

Never use them for testing user-supplied data, or for anything where the check must take place under all circumstances.

Don't use assert for checking anything that you expect might fail in the ordinary use of your program. Assertions are for extraordinary failure conditions. Your users should never see an AssertionError; if they do, it's a bug to be fixed.

In particular, don't use assert just because it's shorter than an explicit test followed by a raise. Assert is not a shortcut for lazy coders.

Don't use them for checking input arguments to public library functions (private ones are okay) since you don't control the caller and can't guarantee that it will never break the function's contract.

Don't use assert for any error which you expect to recover from. In other words, you've got no reason to catch an AssertionError exception in production code.

Don't use so many assertions that they obscure the code.

David Sanders · Answer 3 · Sun Apr 21 2019 01:05:11 GMT+0800 (China Standard Time)

The asserts are not here to test anything that might actually happen in runtime

I disagree with that.

My use-case involves generating schemas dynamically at runtime, based on inputs. With assertions disabled, a bad schema can be created instead of an error occurring. An existing project example of this is graphene-django. It uses assert_valid_name from this project to handle cases of invalid names when generating enums from a model. If the enum values aren't valid GraphQL enum names (starts with a number, is just a number, etc) then it will add an A_ prefix to try to make a valid name.

When assertions are disabled assert_valid_name does nothing (it's a simple assert). So if you're using graphene-django, your code will generate an invalid schema in production (with optimize turned on) when it was working fine in development.

From the section you quoted:

Don't use assert for checking anything that you expect might fail in the ordinary use of your program. Assertions are for extraordinary failure conditions. Your users should never see an AssertionError; if they do, it's a bug to be fixed.

graphql-core is using asserts for many such cases. Checking arguments to a function using assertions is wrong. That's normal usage. Python can be run from an interpreter. Invalid arguments to a function is ordinary use, and by no means would be classified as an 'extraordinary failure condition'.

From the link there's a more appropriate part to quote (emphasis mine):

Many people use asserts as a quick and easy way to raise an exception if an argument is given the wrong value. But this is wrong, dangerously wrong, for two reasons. The first is that AssertionError is usually the wrong error to give when testing function arguments. You wouldn't write code like this...you'd raise TypeError instead. "assert" raises the wrong sort of exception.

But, and more dangerously, there's a twist with assert: it can be compiled away and never executed, if you run Python with the -O or -OO optimization flags, and consequently there is no guarantee that assert statements will actually be run. When using assert properly, this is a feature, but when assert is used inappropriately, it leads to code that is completely broken when running with the -O flag.

Eran Kampf · Answer 4 · Wed Apr 24 2019 05:26:36 GMT+0800 (China Standard Time)

@dsanders11 if you create a schema dynamically you still shouldn't reach most of the cases I see in your PR.
Like create a Connection type with Meta definition? Call NodeField with an object that is not a Node ?

A good example for something that's not an assert is in your PR where you fixed the Date(Scalar) serialize\deserialize methods.
Those checks should not be asserts, but they should throw a proper exception and not AssertionError.

David Sanders · Answer 5 · Wed Apr 24 2019 07:35:28 GMT+0800 (China Standard Time)

@ekampf, I think we're going to continue to have fundamental differences of opinion here.

Asserts shouldn't be used to validate input to a function or method. That's what TypeError and ValueError are for. From the official Python documentation:

Passing arguments of the wrong type (e.g. passing a list when an int is expected) should result in a TypeError, but passing arguments with the wrong value (e.g. a number outside expected boundaries) should result in a ValueError.

What asserts should be used for is to validate assumptions and post-conditions. Assertions triggering should mean there's a bug in your code. Someone passing you bad input is not a bug in your code, it's in theirs. Some good examples from Django:

        else:
            # resolve_expression has already validated the output_field so this
            # assert should never be hit.
            assert False, "Tried to Extract from an invalid type."

            while remaining != 0:
                assert remaining > 0, 'remaining bytes to read should never go negative'

While there's certainly cases of bad (IMO) uses of assert in Django, doing some grepping shows they use asserts less than 100 times and TypeError and ValueError more than 750.

Using assertions for input validation is dangerous since they can be disabled by using optimize mode. Having code that acts differently in development and production is a recipe for bad times. If the code won't work with that input it needs to throw an actual exception, not silently fail.

I run my code during development with -OO for this reason, because it's what it will be run as in production. With that on graphql-core is more than happy to generate invalid schemas silently since asserts are bypassed. The silent failure is the real problem, as it makes bugs slip past. Well, I could simply develop without -OO, right? Another effect of optimize mode is that docstrings are stripped. Developing without -OO leads to production bugs when it turns out that code depends on docstrings. So that's why graphql-core puts me between a rock and a hard place by is its use of asserts.

I didn't open this issue out of overzealous concern for coding style, I did so because it affects my real world usage of graphql-core. That's all I care about at the end of the day, I'm trying to get stuff done, and the usage of assert is breaking that. @Cito already chimed in and mentioned he'd avoided assert due to the same issues.