Python’s “Type Hints” are a bit of a disappointment to me

2022-04-21

Preface

You are reading version 2.0 of this blog post.

Readers shared this link on Hacker News and lobsters, which unexpectedly blew up and sparked many heated discussions. I’ve incorporated some of this feedback into this revised version.

(Some time later, there was also a discussion about this article on The Real Python Podcast.)

Introduction

Over the course of several Python 3.x versions, “type hints” were introduced. You can now annotate functions:

def greeting(name: str) -> str:
    return 'Hello ' + name

And variables:

foo: str = greeting('penguin')

To many people, myself included, static type hints are very useful and allow you to catch errors early on.

There are some surprises and limitations, at least there were for me. Let’s explore them.

Not enforced at runtime

The documentation says:

The Python runtime does not enforce function and variable type annotations. They can be used by third party tools such as type checkers, IDEs, linters, etc.

So, this executes perfectly fine:

foo: int = 'hello'
print(foo)

By “execute”, I mean calling python foo.py. As the documentation says, to perform the actual type checks – that is, the checks of these static type hints –, you must use some other program.

Comparison to comments and Hungarian Notation

You sometimes see this:

# returns str
def greeting(name):
    return 'Hello ' + name

Since the language’s syntax itself has no concept of static typing, people resort to adding a comment which states the return type. Similar to type hints, the runtime does not verify their correctness. People are aware of this, because they’re only comments.

Hungarian Notation, e.g. strName, is another way of reminding the reader about the type of a variable. Again, their correctness is not verified, which is something that people know as well.

Type hints, on the other hand, are a bit special. They are part of the official language and “part of the code”. As a result, they create the impression that they are “binding” or “authoritative”. As we saw above, though, they are not verified by the runtime, either. Instead, they are only metadata that some other tool may or may not use. Whether type hints carry any meaning or are just random comment-like bytes in the file, depends on whether (and how) a type checker is being run or not.

The “break” between runtime and type checker

So, you must run an external program. This “break” or “gap” between these two components can break people’s assumptions or expectations. It at least broke mine and still continues to do so. Checking static types is something that is usually done implicitly when you run/compile a program, but not so in Python.

It gets a little more confusing, because Python does indeed check types during runtime:

>>> 1 + '1'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

This is just completely decoupled from the type hints that you manually put in your code. I think that’s a bit unfortunate and really not what I expected. It’s not just a break of expectations, but it also has consequences, as we’ll see later on.

Basically, I would have hoped the Python runtime does something along the lines of isinstance() behind the scenes, whenever I assign a new value to a variable that has a static type hint. It very well knows the true types at runtime, so please go check if that matches what I wrote in the source code.

Alright, so you have to bite the bullet and run an external tool. I’ll be using mypy here, but there are others.

One option is to add a mypy step to your CI pipeline.

Now, I have made the experience that this might not be as simple as it may sound. (Your experience may differ, of course.) Running mypy is an optional step. When this silently fails, it will probably go unnoticed. This is different from traditional compiled languages: When the compilation step (which implicitly checks static types) fails, then you don’t get an executable program at all. You will notice that.

If the Python runtime enforced type hints, then this could somewhat mitigate this problem: You probably have a test suite, so your code is being run and that might then expose errors in type hints, too. (Sure, running a test suite also suffers from being optional.)

So why would a type checker not run? Or maybe not correctly?

Disabled “temporarily” and then forgotten. This shouldn’t happen, but it does.
Misconfiguration. Maybe you’re not using a flag that you should use.
Maybe it runs and prints errors, but doesn’t cause the build to actually fail. This could also creep in over time.
Maybe there never was a step in your CI to run it and instead people rely on local checkers or their IDE – and then ignore or miss their warnings.
Outdated, incomplete, buggy, …
…

Overall, you get additional maintenance burden and all of this is a somewhat fragile process, since checking static type hints is not tightly integrated into “getting the program to run”.

My personal takeaway from this is to not trust Python’s type hints, just because they’re in the code. Only when I see a type checker really do its job and only when I have inspected which flags that checker uses, do I start paying attention to type hints. This is a very different experience compared to other languages and contributes to my disappointment.

But it also depends a lot on your tooling: If you always use heavy machinery, which immediately runs a type checker, before you’ve even read a single letter with your eyes, then this probably isn’t as much of a problem for you. I like to go lightweight, though. In other words, if using such heavy machinery was mandatory to write meaningful code, then this would be another point that I’m not too happy with.

The `Any` type

Python is a dynamically typed language. By definition, you don’t know the real types of variables until runtime. It is not always possible to statically annotate each variable in the source code.

So, naturally, there now is an Any type. The following program passes mypy validation:

from random import randint
from typing import Any


def foo() -> Any:
    if randint(0, 1) == 0:
        return 42
    else:
        return 'foo,bar,baz'


bar = foo()
print(bar.split(','))

Of course, in 50% of the cases you get an exception, which is fine and matches my expectations.

Any goes both ways, though, which was pretty surprising to me:

A special kind of type is Any. A static type checker will treat every type as being compatible with Any and Any as being compatible with every type.

So, this is valid code and passes mypy:

from typing import Any


def foo() -> int:
    bar: Any = 'hello'
    return bar


result = foo()
print(result)

foo() does not return an integer, though.

Mypy has an option to turn this into an error:

$ mypy --warn-return-any foo.py
foo.py:9: error: Returning Any from function declared to return "int"
Found 1 error in 1 file (checked 1 source file)

I guess they cannot turn this on by default, because, well, the Python docs say it’s valid. And, sure, Python is not a brand-new language, so adoption of type hints takes lots of time, which is why they are probably not willing to enforce each and every check right from the start. (It’s been a few years, but it’s still young.)

Mypy knows a --strict option, which turns on “all optional error checking flags”. It’s probably best to turn this on by default and then add only those inverse flags that you need, like mypy --strict --no-warn-return-any foo.py.

Any can sneak in through libraries as well. What you then have to do is configure your type checker to ignore certain things just for that library. This means you’ll have to keep an eye on said library over time, so you can find out if/when you can enable these checks again.

I’m not very happy that you have to twiddle with those flags. It contributes to the maintenance burden and fragility. But it is what it is and probably hard to avoid.

In general, steer clear of Any in your own code. It should be a last resort. Its usage is surprisingly frequent, though, which is why I tripped over it many times. People have commented that you can do nasty things like this in many languages, sure, I was just surprised to see it happen so often here with Any. Could be just bad luck.

The docs mention a different approach, which is to use object instead:

from random import randint


def foo() -> object:
    if randint(0, 1) == 0:
        return 42
    else:
        return 'foo,bar,baz'


bar = foo()
if isinstance(bar, str):
    print(bar.split(','))
else:
    print(f'bar is something unexpected: {type(bar)}')
    # ... handle this situation gracefully ...

It still allows you to create “dynamic” objects during runtime and to pass them around, but as soon as you want to do something with them, you have to narrow it down to a concrete type – and this is clear from the code.

(In that particular example, Union[int, str] could be a better choice.)

Note that Any is not supposed to be an alternative to object. They are basically the opposite of each other. The docs put it this way:

Use object to indicate that a value could be any type in a typesafe manner. Use Any to indicate that a value is dynamically typed.

In other words, Any throws you back to “untyped” Python while still making your type checker happy. (The wording “throws you back” implies that “untyped” Python, i.e. dynamically typed Python without static type hints, is a bad thing. In the context of this blog post, it kind of is. In general, though, it’s simply your choice.)

Any is also a way to introduce type hints to your project step by step – and then you risk it staying there. It’s tough.

Duck type compatibility

This is just a minor issue, but still a bit baffling.

Example:

foo: int = 123
bar: float = foo

if isinstance(foo, int):
    print('foo is an int')
if isinstance(foo, float):
    print('foo is a float')

if isinstance(bar, int):
    print('bar is an int')
if isinstance(bar, float):
    print('bar is a float')

Passes validation:

$ mypy --strict numeric.py
Success: no issues found in 1 source file

Unexpected result:

$ python numeric.py
foo is an int
bar is an int

How can I “declare” bar as float, but then it accepts an int and actually is an int at runtime (so it’s not like it’s being converted automatically – of course not, the runtime does not care about type hints)?

The reason is duck type compatibility.

This is probably not a big deal in Python, though. When you divide two integers, 1 / 3, the result is a float and not accidentally an int, like in some other languages. And luckily this is limited to just a few built-in types.

And yet … It says bar: float but it’s an int. They could have just called it number, if it’s ambiguous anyway.

Most projects need third-party type hints

Most Python projects out there pre-date type hints, so they don’t contain any type hints at all. This is unavoidable. If do you want to make use of type hints when using such libraries, well, you have to add the hints. typeshed contains hints for a bunch of popular projects.

This means: The library itself and its type hints are out of sync. When a type checker does not report any errors for your code, what does that mean? Do you actually call that library correctly?

This will hopefully get better over time. Type hints are an optional thing, though, so there is a chance that we will always have to deal with this.

Dataclasses could integrate better with type hints

So we were making a client for a REST API. Traditionally, we would have built dicts and then POSTed them:

payload = {
    'cars': [
        {
            'name': 'toy yoda',
            'wheels': 4,
        },
    ],
    'salad': 'potato',
    'version': 8,
}

Code like this can get really messy really fast. Python 3.7 introduced dataclasses. Together with type hints, it might look like this:

from dataclasses import dataclass
from typing import List, Literal


@dataclass
class Car:
    name: str
    wheels: int


@dataclass
class APIRequest:
    cars: List[Car]
    salad: Literal['potato', 'italian']
    version: Literal[8]


payload = APIRequest(
    cars=[
        Car(
            name='toy yoda',
            wheels=4,
        ),
    ],
    salad='potato',
    version=8,
)

I’d argue that this is better code. Dataclasses and type hints allow the reader to know how an API request is supposed to be composed.

If you now turned wheels=4 into wheels='4', mypy would report an error.

But how much of your code handles static data like this? Isn’t it much more likely that this '4' is actually a variable? The big question then becomes where this variable comes from and whether it’s covered by (correct) type hints. If it isn’t (or if it’s Any), then your build is green, but your code might be wrong and will fail somewhere down the road (just as if you didn’t have type hints in the first place).

It would have been really nice if dataclasses automatically honored their type hints and raised errors on mismatches. One way to do this would be to have the runtime enforce type hints – but as that’s not the case right now, dataclasses could maybe perform appropriate isinstance() calls in their constructors.

pydantic was brought up as an alternative to plain dataclasses. It’s worth a closer look. I’m a bit worried when I look at the example on their page, though, because they “declare” friends: List[int] which then happily accepts 'friends': [1, 2, '3']. Maybe this can be tweaked or maybe it is intentional, I don’t know yet.

I’d still argue that it’s better to use dataclasses (or pydantic) than to compose large dicts with lots of different types, but this is probably a personal preference of mine. Many Python programmers praise the language for not having to use classes like this.

A bit of a cultural clash

Let’s have a look at another dict. Consider this:

foo = {
    'hello': 'world',
    'bar': ['baz'],
}

foo['bar'].append('potato')

print(foo)

It executes fine:

$ python bar.py
{'hello': 'world', 'bar': ['baz', 'potato']}

Mypy is not happy with it:

$ mypy --strict bar.py
bar.py:6: error: "Sequence[str]" has no attribute "append"
Found 1 error in 1 file (checked 1 source file)

This is to be expected. The code doesn’t contain type hints, so mypy is forced to guess, and sometimes this goes wrong.

So, how would we annotate this correctly? It could look like this:

from typing import Dict, List, Union


foo: Dict[str, Union[str, List[str]]] = {
    'hello': 'world',
    'bar': ['baz'],
}

foo['bar'].append('potato')

print(foo)

The annotations are “correct” now, mypy still complains, though:

$ mypy --strict bar.py
bar.py:9: error: Item "str" of "Union[str, List[str]]" has no attribute "append"
Found 1 error in 1 file (checked 1 source file)

We couldn’t properly express that only bar is a list and hello is a string. We now first have to resolve the type ambiguity of this Union, so that we’re making sure that what we’re dealing with really is a list:

if isinstance(foo['bar'], list):
    foo['bar'].append('potato')

I think this is great. This is the first step towards better code: It tells you that, whoops, you’re dealing with something that might not be what you think it is. The same happens when you use Optional and don’t check if it’s None. And all of this can happen long before you actually run this thing.

When you reflect on this for a moment, you will notice that Dict[str, Union[str, List[str]]] is kind of a crazy type. These things can get really massive as you expand your dict with other things. I have seen Dict[str, Union[Dict[str, Union[int, str, bool]], List[int]]] in the wild, which is very hard to understand and provides little value.

In my opinion, when you have to write types like this, it should be a sign to you that you should rethink your data structures. Wouldn’t it be better if you moved your data to proper classes which could then have easier to understand type hints? Wouldn’t it be easier to understand how your structure is composed? Just writing such a dict in one go might be fine, but using it later on or composing it with lots of if and update() can confusing quickly.

I think that type hints are doing a great job here: They expose overly complicated data structures. I call these things “chaotic dicts”.

Now, in discussions with other people, I have found that this is mostly personal preference or even a cultural clash. As mentioned above, people often like using “chaotic dicts” very much, because they value the flexibility that this offers. Moving to classes or “structs” is frowned upon.

Thus, what happens in real life is often this:

from typing import Dict


foo: Dict = {
    'hello': 'world',
    'bar': ['baz'],
}

foo['bar'].append('potato')

print(foo)

And now the type checker is happy – but that’s only because foo now has the type Dict[Any, Any], which, as we’ve seen above, can have bad consequences.

My criticism regarding type hints here is that it’s too easy to use Any. I have no idea how, but I’d love if it were more involved to declare something as Any. Or at least don’t call it Any, because that sounds too innocent and legitimate. :-) Maybe UnsafeAny or even CodeSmellAny or … something, that nudges the programmer into using it as little as possible. (I understand that Any is legitimate as a concept, but it really does undermine the point of type hints when you intentionally introduce it to new code.)

Exceptions are not covered

Exceptions are out of scope of type hints: You cannot annotate a function and say, “this might throw a ValueError”.

This is a bit unfortunate in my opinion. In Python, exceptions can be very surprising, because you never know when they are being thrown. In, say, Java, exceptions are part of a function’s signature, and this has helped me a lot in the past.

It has been commented that exceptions are intentionally not covered. The commenter couldn’t provide a source (it’s supposed to be in a talk by Guido himself) and I couldn’t find it either. If I do, I’ll add it.

Still, aren’t type hints better than nothing?

That’s the big question.

Honestly, it has been my experience in the past years, that Python’s type hints do more harm than good, due to their deceptive and misleading aspects and overall fragility. In real life (or at least in my real life at work), they have been pretty disappointing. As a result, at the moment, I am very skeptical, I hardly trust that system, and I’m not overly motivated to use them in my projects.

The previous version of this blog post said: “No, they are not worth it.” The reason being that the net sum was negative in my experience (“more harm than good”). It also had a section that questioned whether you don’t want to simply use one of the traditional compiled languages instead.

On the other hand (and this is where we diverge a lot from the original version):

Are Python’s type hints completely broken and thus utterly useless? I don’t think so. And they can be beneficial. So, overall, do they more harm than good, then? This probably very much depends on your particular projects – there is no universal answer.

Give them a try. Decide for yourself.

Still work in progress

One reader mentioned Meta’s Cinder, of which “Static Python” is a part.

I quote:

It does type checking at bytecode compilation time (so byte-compiling your code will fail if the type annotations are wrong) and at boundaries between code that you’ve opted in to this compilation and code that hasn’t, it automatically inserts runtime checks, so the type annotations in Static Python compiled code can always be trusted, even if it has to interact with untyped code.

This is much more what I originally expected.

I haven’t checked this out in detail yet and Cinder today is not meant to be used as a general replacement for CPython. There is a chance that this might be available as some sort of pip package at some point in the future, the reader said.

Another reader said that there was a discussion about making CPython respect type hints at runtime after all – we couldn’t dig up the source for this, though. (– edit: Link to said discussion – it’s open ended, but has lots of “no” in it.)

It means that there is still work going on in this area. Things might look very different in a few years.

Comments?