Jump to content


birdman

Member Since 16 Jan 2016
Offline Last Active 16 Oct 2023 17:59
-----

Posts I've Made

In Topic: RC 9.0 - Problems with Windows filenames that contain Umlauts

16 October 2023 - 12:08

You need to read this in the context of the problem, no point having an abstract and therefore not relevant discussion.

I agree. But if the problem is that the model is an incorrect view of reality then it is the model which is the problem and it is the model has to change.
 

In E2, the files in a directory are enumerated in a list of serviceref objects, in which the path is an std::string, which is set from readdir() output. This is how the original sequence of bytes result from the filesystem ends up in a Python str object.

That is what is wrong. The original sequence of bytes has to end up in a Python bytes object, as it is not convertible into a str
 

This creates a sort of catch-22:

  • the string can't be handled in python without causing a crash, in case the string doesn't contain utf-8
  • it is not easy to convert the string if you don't know the original encoding (although you can guess using chardet, which OpenVIX has done, and I also implemented, but not committed yet)
  • you can't alter the path in the serviceref itself, as that is also used to access the file, any conversion of that variable causes file access to fail

It's only a catch-22 if you try to make it a str. If you leave it as bytes the problem goes away (apart from displaying it, but that is a simpler issue).


In Topic: RC 9.0 - Problems with Windows filenames that contain Umlauts

15 October 2023 - 13:44

 

methode is crear
if not real ascii convert it 

 

Wrong.

 

If it is a filename it should always be a bytes stream.


In Topic: RC 9.0 - Problems with Windows filenames that contain Umlauts

15 October 2023 - 13:38

Read above... Umlauts in filenames crash enigma...

No. bytes that are not valid utf8 sequences crash enigma2.

An umlaut (Unicode U+0308, 0xCC 0x88, or U+00A8, 0xC2 0xA8) would be OK. A iso-8858-1 character that includes an umlaut in its display would not be.


In Topic: RC 9.0 - Problems with Windows filenames that contain Umlauts

15 October 2023 - 13:33

And it is nothing new. Windows uses the ISO-8859-1 characterset (at least in the Latin part of the world), Linux (and most of the rest of the world) uses UTF-8. Which means that if you have non-ASCII text (in files, or file names), it needs to be converted.

 

That is incorrect.

 

Linux filename are (in Python terms) bytes, not str. Any Python code dealing with filenames has to treat the filename as bytes otherwise it can fall over.

 

A filename can be any series of bytes (except NUL) in any order.

The fact that this may be displayed as though it is a utf8-string is application dependent.