@o11c

o11c@programming.dev · 1 year ago

It’s because unicode was really broken, and a lot of the obvious breakage was when people mixed the two. So they did fix some of the obvious breakage, but they left a lot of the subtle breakage (in addition to breaking a lot of existing correct code, and introducing a completely nonsensical bytes class).

o11c@programming.dev · 1 year ago

Python 2 had one mostly-working str class, and a mostly-broken unicode class.

Python 3, for some reason, got rid of the one that mostly worked, leaving no replacement. The closest you can get is to spam surrogateescape everywhere, which is both incorrect and has significant performance cost - and that still leaves several APIs unavailable.

Simply removing str indexing would’ve fixed the common user mistake if that was really desirable. It’s not like unicode indexing is meaningful either, and now large amounts of historical data can no longer be accessed from Python.

o11c@programming.dev · 1 year ago

The problem is that there’s a severe hole in the ABCs: there is no distinction between “container whose elements are mutable” and “container whose elements and size are mutable”.

(related, there’s no distinction for supporting slice operations or not, e.g. deque)

o11c@programming.dev · 1 year ago

and I already explained that Union is a thing.

o11c@programming.dev · 1 year ago

That still doesn’t explain why duck typing is ever a thing beyond “I’m too lazy to write extends BaseClass”. There’s simply no reason to want it.

o11c@programming.dev · 1 year ago

Then - ignoring dunders that have weird rules - what, pray tell, is the point of protocols, other than backward compatibility with historical fragile ducks (at the cost of future backwards compatibility)? Why are people afraid of using real base classes?

The fact that it is possible to subclass a Protocol is useless since you can’t enforce subclassing, which is necessary for maintainable software refactoring, unless it’s a purely internal interface (in which case the Union approach is probably still better).

That PEP link includes broken examples so it’s really not worth much as a reference.

(for that matter, the Sequence interface is also broken in Python, in case you need another historical example of why protocols are a bad idea).

o11c@programming.dev · 1 year ago

Aside: Note that requests is sloppy there, it should use either raise ... from e to make the cause explicit, or from None to hide it. Default propagation is supposed to imply that the second exception was unexpected.

o11c@programming.dev · 1 year ago

In practice, Protocols are a way to make “superclasses” that you can never add features to (for example, readinto despite being critical for performance is utterly broken in Python). This should normally be avoided at almost all costs, but for some reason people hate real base classes?

If you really want to do something like the original article, where there’s a C-implemented class that you can’t change, you’re best off using a (named) Union of two similar types, not a Protocol.

I suppose they are useful for operator overloading but that’s about it. But I’m not sure if type checkers actually implement that properly anyway; overloading is really nasty in a dynamically-typed language.

o11c@programming.dev · 1 year ago

All of these can be done with raw strings just fine.

For the first pathlib bug case, PATH-like lookup is common, not just for binaries but also data and conf files. If users explicitly request ./foo they will be very upset if your program instead looks at /defaultpath/foo. Also, God forbid you dare pass a Path("./--help") to some program. If you’re using os.path.dirname this works just fine.

For the second pathlib bug case, dir/ is often written so that you’ll cause explicit errors if there’s a file by that name. Also there are programs like rsync where the trailing slash outright changes the meaning of the command. Again, os.path APIs give you the correct result.

For the article mistake, backslash is a perfectly legal character in non-Windows filenames and should not be treated as a directory component separator. Thankfully, pathlib doesn’t make this mistake at least. OTOH, / is reasonable to treat as a directory component separator on Windows (and some native APIs already handle it, though normalization is always a problem).

I also just found that the pathlib.Path constructor ignores extra kwargs. But Python has never bothered much with safety anyway, and this minor compared to the outright bugs the other issues cause.

o11c@programming.dev · 1 year ago

The default handling is pretty important.

What I find more interesting are 1. the two-argument form of iter, and 2. the __getitem__ auto-implementation that causes there to be two incompatible definitions of Iterable.

(btw your comments are using accidental formatting; use backticks: __next__)

o11c@programming.dev · 1 year ago

The problem with pathlib is that it normalizes away critical information so can’t be used in many situations.

./path should not be path should not be path/.

Also the article is wrong about “Path('some\\path') becomes some/path on Linux/Mac.”

o11c@programming.dev · 1 year ago

Honestly you probably should think about how to translate them. Python at least rolls its own .mo parser so it can support multiple languages in a single process; it’s much more difficult in C unless you push it to the clients (which requires pushing the parameterization as well).

Non-.pot-based internationalization formats are almost always braindead and should be avoided.

o11c@programming.dev · 1 year ago

Note that by messing with a particular module’s __path__ you can turn it into a “package” that loads from arbitrary directories.

o11c@programming.dev · edit-2 1 year ago

No. Duck types (including virtual subclasses) considered harmful; use real inheritance if your language doesn’t provide anything strictly better.

It is incomparably convenient to be able to retroactively add “default implementations” to interface functions (consider for example how broken readinto is in Python). Some statically-typed languages let you do that without inheritance, but no dynamically-typed language can.

This reads more as a rant against inheritance (without any explanation whatsoever) than a legitimate argument.

o11c@programming.dev · 1 year ago

You should have part of your test harness perform a separate import of every module. If your module is idempotent (most good code is) you could do this in a single process by cleaning sys.modules I guess … but it still won’t be part of your pytest process.

Static analyzers can only detect some cases, so can’t be fully trusted.

I’ve also found there are a lot of cases where performant Python code has to be implemented in a distinct way from what the type-checker sees. You can do this with aggressive type: ignore but I often find it cleaner to use separate if blocks.

o11c@programming.dev · 1 year ago

from __future__ import annotations

o11c@programming.dev · 1 year ago

The with approach would work if you use the debugger to change the current line I think.

I don’t understand why this stop using ASTs in favor of buggy regexes - you’re allowed to do whatever you want during the codec …

Don’t forget to handle increment before continue.

The main time I miss C-style for loops is dealing with linked lists and when manipulating the current iteration.

The former should be easy enough - make the advancement provide __getattr__ expressions.

The latter already works since it is in fact being transformed into a while. It’s impossible if you try to use for though.

o11c@programming.dev · 1 year ago

I’m increasingly convinced that Python/JS-style duck typing is always a mistake, since you can’t do default function impls for traits. Just use inheritance.

Rust’s enums are even weirder, since they mix structuring with discrimination. You end up having to write everything twice most of the time. Again, use inheritance, though you’ll have to choose between if chains and virtual function calls.

Python’s pathlib has a major footgun in that ./foo collapses to foo, negating the main point of writing it that way in the first place.

o11c@programming.dev · 1 year ago

The comments in the code explain exactly what the problem is and how to fix it, and it would take less than 30 minutes to implement and test that (assuming you don’t just grab someone else’s version - there’s a reason I know the time), but nobody has committed it in … at least 13 years since the comment was written; I’m not digging any further through imported SVN history.

o11c@programming.dev · edit-2 1 year ago

It’s worth noting that the http.server module is based on socketserver.BaseServer.serve_forever, which is a atrocious.

It uses a busy loop with a delay, so it both burns CPU and is unresponsive.

(The fact that Python has had broken signal handling since 3.5 also hurts - EINTR should never be ignored from blocking calls)