Python is great, but stuff like this just drives me up the wall

@renzev@lemmy.world · edit-2 2 months ago

Python is great, but stuff like this just drives me up the wall

macniel · 2 months ago

Well yeah just because they kinda mean the same thing it doesn’t mean that they are the same. I can wholly understand why they won’t “fix” your inconvenience.

@wizardbeard@lemmy.dbzer0.com · 2 months ago

Unless I’m missing something big here, saying they “kinda mean the same thing” is a hell of an understatement.

nickwitha_k (he/him) · 2 months ago

They are two different data types with potentially different in-memory representations.

Ephera · 2 months ago

Well, yeah, but they do mean the exact same thing, hopefully: true or false

Although thinking about it, someone above mentioned that the numpy bool_ is an object, so I guess that is really: true or false or null/None

nickwitha_k (he/him) · 2 months ago

In an abstract sense, they do mean the same things but, in a technical sense, the one most relevant to programming, they do not.

The standard Python bool type is a subclass of the integer type. This means that it is stored as either 4 bytes (int32) or 8 bytes (int64).

The numpy.bool_ type is something closer to a native C boolean and is stored in 1 byte.

So, memory-wise, one could store a numpy.bool_ in a Python bool but that now leaves 3-7 extra bytes that are unused in the variable. This introduces not just unnecessary memory usage but potential space for malicious data injection or extraction. Now, if one tries to store a Python bool in a numpy.bool_, if the interpreter or OS don’t throw an error and kill the process, you now have a buffer overflow/illegal memory access problem.

What about converting on the fly? Well, that can be done but will come at a performance cost as every function that can accept a numpy.bool_ now has to perform additional type checking, validation, and conversion on every single function call. That adds up quick when processing data on scales where numpy is called for.