Jump to content


Photo

RC 9.0 - Problems with Windows filenames that contain Umlauts


  • Please log in to reply
121 replies to this topic

Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #41 DimitarCC

  • PLi® Contributor
  • 1,498 posts

+53
Good

Posted 14 October 2023 - 17:17

In addition c++ have std::wstring that is specially for wide character strings (unicode) that have c_str as well that can do the trick.....

Ofcource if here is better expert in c++ let he says his opinion ;)


Vu+DUO4KSE, DM920UHD, Vu+Uno4KSE, SF8008Mini, 2xPulse4K, Vu+Solo2, Dreambox DM500HD, Triax 78 (7E,9E,13E,19.2E,23.5E) & 2xTriax 78 (39E)


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #42 WanWizard

  • PLi® Core member
  • 69,866 posts

+1,781
Excellent

Posted 14 October 2023 - 19:56

I'm afraid I have to pass it ion, as I don't get it.

 

Python does

self.picload.startDecode(self.filelist[self.index])

and I checked,

self.filelist[self.index]

is an <str> object.

 

However, startDecode() is defined as

RESULT startDecode(const char *filename, int x=0, int y=0, bool async=true);

which wants bytes or a bytearray.

 

However, if I call it with a bytes value, it crashes with an argument error saying it expects a "char *".

 

So this is clearly above my paygrade... :(


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #43 DimitarCC

  • PLi® Contributor
  • 1,498 posts

+53
Good

Posted 14 October 2023 - 23:52

However, startDecode() is defined as

RESULT startDecode(const char *filename, int x=0, int y=0, bool async=true);
which wants bytes or a bytearray.
(

cost char *filename is not a bytes by definition. Char is a struct in c++ that represent 1 byte character. And with utf8 character you pass on 2 byte character and ofcource it doesnt like it.

Vu+DUO4KSE, DM920UHD, Vu+Uno4KSE, SF8008Mini, 2xPulse4K, Vu+Solo2, Dreambox DM500HD, Triax 78 (7E,9E,13E,19.2E,23.5E) & 2xTriax 78 (39E)


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #44 Huevos

  • PLi® Contributor
  • 4,589 posts

+160
Excellent

Posted 15 October 2023 - 09:28

Isn't your problem that you read or copy the files from a Windows system without proper characterset conversion?

Are you serious that a filename conversion is needed?

 

IMS reported this months ago. https://forums.openp...evelop-crashes/

 

At the time I gave a working solution for this problem, but it seems no one was interested.


Edited by Huevos, 15 October 2023 - 09:28.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #45 luisteraar

  • Senior Member
  • 2,480 posts

+24
Neutral

Posted 15 October 2023 - 09:52

https://forums.openp...evelop-crashes/

 

I have no permission to open that forum

 what is the problem ?



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #46 DimitarCC

  • PLi® Contributor
  • 1,498 posts

+53
Good

Posted 15 October 2023 - 10:25

Read above... Umlauts in filenames crash enigma...

Vu+DUO4KSE, DM920UHD, Vu+Uno4KSE, SF8008Mini, 2xPulse4K, Vu+Solo2, Dreambox DM500HD, Triax 78 (7E,9E,13E,19.2E,23.5E) & 2xTriax 78 (39E)


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #47 luisteraar

  • Senior Member
  • 2,480 posts

+24
Neutral

Posted 15 October 2023 - 10:44

https://forums.openp...evelop-crashes/

 

I have no permission to open that forum

 what is the problem ?

 

Read above... Umlauts in filenames crash enigma...

Easy to correct.

Copy byte to byte to new string check each byte if it

is real ascii.

if not real ascii replace umlaut with ascii code without umlaut and

add a "e' after it.


Edited by luisteraar, 15 October 2023 - 10:48.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #48 DimitarCC

  • PLi® Contributor
  • 1,498 posts

+53
Good

Posted 15 October 2023 - 11:06

The question is in this way is the patch going to be correct and is enigma going to find the file

Vu+DUO4KSE, DM920UHD, Vu+Uno4KSE, SF8008Mini, 2xPulse4K, Vu+Solo2, Dreambox DM500HD, Triax 78 (7E,9E,13E,19.2E,23.5E) & 2xTriax 78 (39E)


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #49 luisteraar

  • Senior Member
  • 2,480 posts

+24
Neutral

Posted 15 October 2023 - 11:37

https://forums.openp...evelop-crashes/

 

I have no access to that forum

Mod please give me access so i can check the

patch .

 

I'am C oriented and opening such a file is no problem

.


Edited by luisteraar, 15 October 2023 - 11:38.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #50 Dimitrij

  • PLi® Core member
  • 10,186 posts

+346
Excellent

Posted 15 October 2023 - 12:18

# filename fixes in eServiceReference and MovieList (fixes enigma crash)
https://github.com/OpenViX/enigma2/commit/2c74d42983c6969c0d2ec87b3c48622ba0ff3a45
https://github.com/OpenViX/enigma2/commit/6efb844aba07357c3285e120bde4dc048e2bbad8
https://github.com/OpenViX/enigma2/commit/93fa129f5555bef5e24442d1654d28656359f5e2
https://github.com/OpenViX/enigma2/commit/d9f180b06300586bf7f1e9fa34eed310937e2bef
https://github.com/OpenViX/enigma2/commit/69bb2ba48b5376c5f303b46fd399d8bf53cf28c7

# similar fixes for file_eraser and Trashcan
https://github.com/OpenViX/enigma2/commit/29a92e82ce71df88b6d48af0968df2833cab022b
https://github.com/OpenViX/enigma2/commit/ab86bf81616e75aa3928a82be90d6a91bc5aa220
https://github.com/OpenViX/enigma2/commit/9bb758215a1d7996be4c4a4f880ed9ab1d5758ab
https://github.com/OpenViX/enigma2/commit/69bb2ba48b5376c5f303b46fd399d8bf53cf28c7
https://github.com/OpenViX/enigma2/commit/9624cf8a1df2eca5a11b6810b63f9b972e408076

 


GigaBlue UHD Quad 4K /Lunix3-4K/Duo 4K


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #51 luisteraar

  • Senior Member
  • 2,480 posts

+24
Neutral

Posted 15 October 2023 - 12:39

OK thanks

if (PyBytes_Check($input)) {
                $1 = PyBytes_AsString($input);
        } else {
                $1 = PyBytes_AsString(PyUnicode_AsEncodedString($input, "utf-8", "surrogateescape"));
 
 
methode is crear
if not real ascii convert it 


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #52 birdman

  • Senior Member
  • 25 posts

+1
Neutral

Posted 15 October 2023 - 13:33

And it is nothing new. Windows uses the ISO-8859-1 characterset (at least in the Latin part of the world), Linux (and most of the rest of the world) uses UTF-8. Which means that if you have non-ASCII text (in files, or file names), it needs to be converted.

 

That is incorrect.

 

Linux filename are (in Python terms) bytes, not str. Any Python code dealing with filenames has to treat the filename as bytes otherwise it can fall over.

 

A filename can be any series of bytes (except NUL) in any order.

The fact that this may be displayed as though it is a utf8-string is application dependent.


Edited by birdman, 15 October 2023 - 13:33.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #53 birdman

  • Senior Member
  • 25 posts

+1
Neutral

Posted 15 October 2023 - 13:38

Read above... Umlauts in filenames crash enigma...

No. bytes that are not valid utf8 sequences crash enigma2.

An umlaut (Unicode U+0308, 0xCC 0x88, or U+00A8, 0xC2 0xA8) would be OK. A iso-8858-1 character that includes an umlaut in its display would not be.


Edited by birdman, 15 October 2023 - 13:41.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #54 birdman

  • Senior Member
  • 25 posts

+1
Neutral

Posted 15 October 2023 - 13:44

 

methode is crear
if not real ascii convert it 

 

Wrong.

 

If it is a filename it should always be a bytes stream.



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #55 WanWizard

  • PLi® Core member
  • 69,866 posts

+1,781
Excellent

Posted 15 October 2023 - 15:44

At the time I gave a working solution for this problem, but it seems no one was interested.

 

As I wrote they are not a solution.

 

OpenVIX crashes too on the file in question, so it isn't fixed there too...
 


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #56 WanWizard

  • PLi® Core member
  • 69,866 posts

+1,781
Excellent

Posted 15 October 2023 - 16:09

Linux filename are (in Python terms) bytes, not str. Any Python code dealing with filenames has to treat the filename as bytes otherwise it can fall over.

 

A filename can be any series of bytes (except NUL) in any order.

The fact that this may be displayed as though it is a utf8-string is application dependent.

 

You need to read this in the context of the problem, no point having an abstract and therefore not relevant discussion.

 

In E2, the files in a directory are enumerated in a list of serviceref objects, in which the path is an std::string, which is set from readdir() output. This is how the original sequence of bytes result from the filesystem ends up in a Python str object.

 

This creates a sort of catch-22:

  • the string can't be handled in python without causing a crash, in case the string doesn't contain utf-8
  • it is not easy to convert the string if you don't know the original encoding (although you can guess using chardet, which OpenVIX has done, and I also implemented, but not committed yet)
  • you can't alter the path in the serviceref itself, as that is also used to access the file, any conversion of that variable causes file access to fail

ePicLoad() accepts the path as a char pointer, so imho the entire charset business should be transparent for that part of the code.

 

From a Python perspective, it crashes on

self.picload.startDecode(self.filelist[self.index])

where

self.filelist[self.index]

is an str object containing the binary representation of the filename (with the non-utf8 code point for the u umlaut).

 

If this is converted to bytes (without loss), the startDecode() call triggers an exception

in method 'ePicLoad_startDecode', argument 2 of type 'char const *'

which means passing the string to startDecode() isn't the problem, something in the C code of ePicLoad is, as this call never returns.
 


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #57 Huevos

  • PLi® Contributor
  • 4,589 posts

+160
Excellent

Posted 15 October 2023 - 19:14

 

https://forums.openp...evelop-crashes/

 

I have no permission to open that forum

 what is the problem ?

 

Read above... Umlauts in filenames crash enigma...

Easy to correct.

Copy byte to byte to new string check each byte if it

is real ascii.

if not real ascii replace umlaut with ascii code without umlaut and

add a "e' after it.

 

That is not going to work because once you modify the name you won't be able to access it on the filesystem.



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #58 Huevos

  • PLi® Contributor
  • 4,589 posts

+160
Excellent

Posted 15 October 2023 - 19:18

I don't know what part of what Birdman says is not understandable. You just tell python to read the file from disk as a bytestream.



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #59 Huevos

  • PLi® Contributor
  • 4,589 posts

+160
Excellent

Posted 15 October 2023 - 19:25

 

Linux filename are (in Python terms) bytes, not str. Any Python code dealing with filenames has to treat the filename as bytes otherwise it can fall over.

 

A filename can be any series of bytes (except NUL) in any order.

The fact that this may be displayed as though it is a utf8-string is application dependent.

 

You need to read this in the context of the problem, no point having an abstract and therefore not relevant discussion.

 

In E2, the files in a directory are enumerated in a list of serviceref objects, in which the path is an std::string, which is set from readdir() output. This is how the original sequence of bytes result from the filesystem ends up in a Python str object.

 

This creates a sort of catch-22:

  • the string can't be handled in python without causing a crash, in case the string doesn't contain utf-8
  • it is not easy to convert the string if you don't know the original encoding (although you can guess using chardet, which OpenVIX has done, and I also implemented, but not committed yet)
  • you can't alter the path in the serviceref itself, as that is also used to access the file, any conversion of that variable causes file access to fail

ePicLoad() accepts the path as a char pointer, so imho the entire charset business should be transparent for that part of the code.

 

From a Python perspective, it crashes on

self.picload.startDecode(self.filelist[self.index])

where

self.filelist[self.index]

is an str object containing the binary representation of the filename (with the non-utf8 code point for the u umlaut).

 

If this is converted to bytes (without loss), the startDecode() call triggers an exception

in method 'ePicLoad_startDecode', argument 2 of type 'char const *'

which means passing the string to startDecode() isn't the problem, something in the C code of ePicLoad is, as this call never returns.
 

 

And that is exactly what our fix for movie selection handled. In the cpp code.



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #60 WanWizard

  • PLi® Core member
  • 69,866 posts

+1,781
Excellent

Posted 15 October 2023 - 19:49

This one you mean?

https://github.com/OpenViX/enigma2/commit/2c74d42983c6969c0d2ec87b3c48622ba0ff3a45

isn't that a bit of a hack, calling a python function from C, instead of fixing it on the python side so no invalid data is passed in the first place?

 

I was looking for a cleaner solution, but my C knowledge is too limited.

 

BTW: if you have fixed it, why does OpenVIX still crash on this image? It doesn't give a green screen, but there is some exeception somewhere, as you need to kill E2 to get control of the box back after you've pressed "ok" on the image in the movie list.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.



7 user(s) are reading this topic

0 members, 7 guests, 0 anonymous users