Jump to content


Photo

RC 9.0 - Problems with Windows filenames that contain Umlauts


  • Please log in to reply
121 replies to this topic

Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #81 littlesat

  • PLi® Core member
  • 57,181 posts

+698
Excellent

Posted 27 February 2024 - 15:53

I agree... the trick here is 'target the core'.... and not work-a-round it in multiple places...

 

https://github.com/O...vice/iservice.h

 

So when I understand WanWizard line 58 here in OpenVix code need 'adaptions'.... 


WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #82 ocean04

  • Member
  • 19 posts

0
Neutral

Posted 27 February 2024 - 18:45

why does OpenVIX still crash on this image? It doesn't give a green screen, but there is some exeception somewhere, as you need to kill E2 to get control of the box back after you've pressed "ok" on the image in the movie list.

 

Tested munchen.jpg in OpenVix and it failed like this:

<  8616.965827> 19:14:13.2016 [Skin] Processing screen 'Pic_Full_View', position=(0, 0), size=(1920 x 1080) for module 'Pic_Full_View'.
<  8616.968872> 19:14:13.2047 [ePicLoad] setPara max-X=1920 max-Y=1080 aspect_ratio=1,000000 cache=0 resize=1 bg=#FF000000 auto_orient=0
<  8616.969360> 19:14:13.2052 [MovieSelection] Cannot display in method 'ePicLoad_startDecode', argument 2 of type 'char const *'
Additional information:
Wrong number or type of arguments for overloaded function 'ePicLoad_startDecode'.
  Possible C/C++ prototypes are:
    ePicLoad::startDecode(char const *,int,int,bool)
    ePicLoad::startDecode(char const *,int,int)
    ePicLoad::startDecode(char const *,int)
    ePicLoad::startDecode(char const *)

This is easy to fix simply by adding following to picload.h

#ifdef SWIG
public:
%typemap(in) (const char *filename) {
    if (PyBytes_Check($input)) {
        $1 = PyBytes_AsString($input);
    } else {
        $1 = PyBytes_AsString(PyUnicode_AsEncodedString($input, "utf-8", "surrogateescape"));
    }
}
#endif

Now it tries to show picture, but crashes somewhere later..

<  8678.683173> 19:15:16.8506 [Skin] Processing screen 'Pic_Full_View', position=(0, 0), size=(1920 x 1080) for module 'Pic_Full_View'.
<  8678.686546> 19:15:16.8540 [ePicLoad] setPara max-X=1920 max-Y=1080 aspect_ratio=1,000000 cache=0 resize=1 bg=#FF000000 auto_orient=0
<  8678.687121> 19:15:16.8545 [ePicLoad] thread failed to modify scheduling priority (Function not implemented)
<  8678.687313> 19:15:16.8547 [ePicLoad] decode picture... /media/hdd/movie6/M nchen.jpg
<  8678.687412> 19:15:16.8548 [EXIF] getting exif from JPEG
<  8678.687875> 19:15:16.8553 [Skin] Processing screen 'SimpleSummary' from list 'Pic_Full_View_summary, SimpleSummary', position=(0, 0), size=(132 x 64) for module 'ScreenSummary'.
<  8678.743598> 19:15:16.9110 PC: b63e2b24
<  8678.743711> 19:15:16.9111 Fault Address: 00000000
<  8678.743775> 19:15:16.9112 Error Code: 519
<  8678.744119> 19:15:16.9115 Backtrace:
<  8678.744586> 19:15:16.9120 /tmp/enigma2(_Z17handleFatalSignaliP9siginfo_tPv) [0x78550]
<  8678.744737> 19:15:16.9121 /lib/libc.so.6(__default_rt_sa_restorer) [0xB5D7C070]
<  8678.744800> 19:15:16.9122 -------FATAL SIGNAL (11)
<  8682.714688> 19:15:20.8821 [gRC] Warning: Main thread is busy, displaying spinner!
Killed

I guess only way forward would be to add more debug lines and find out exactly where it crashes..



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #83 WanWizard

  • PLi® Core member
  • 70,542 posts

+1,812
Excellent

Posted 27 February 2024 - 19:49

That is not a fix, that is another workaround.

 

The source isn't utf-8, so that "easy fix" doesn't produce an u umlaut in utf-8.

 

The root cause remains, also here in ePicLoad, that a sequence of bytes (char *) is saved as a python string, and it shouldn't be, it should be bytes or bytearray.

 

This is "old py2" code, where String was the same as Bytes.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #84 ocean04

  • Member
  • 19 posts

0
Neutral

Posted 28 February 2024 - 09:02

That is not a fix, that is another workaround.

Are you talking about adding that typemap?

Currently there are no other options to fix this.

Anyway, I found cause of "fatal signal crash" in post #82 and munchen.jpg is now loading fine here (OpenVix)



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #85 WanWizard

  • PLi® Core member
  • 70,542 posts

+1,812
Excellent

Posted 28 February 2024 - 13:14

And displayed in the movie list with an u-umlaut, as it should?

 

I can't test VIX here, they don't make images for my test box.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #86 littlesat

  • PLi® Core member
  • 57,181 posts

+698
Excellent

Posted 28 February 2024 - 14:38

That it does not crash anymore... or shown correctly... does not mean that the underlaying issue is 'resolved'... It just means it is at least work-a-round....

 

This does not 'say' the suggestions Huevos showed a few post earlier are not interesting..

 

More important here is to 'trigger' this thread when we loss attention to it.


Edited by littlesat, 28 February 2024 - 14:39.

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #87 ocean04

  • Member
  • 19 posts

0
Neutral

Posted 28 February 2024 - 18:16

I'd say there are 3 main things to consider.
1. Data going from Python to C++
2. Data coming from C++ to Python.
3. Displaying data in GUI.

It's not only simple variables passed between python and C++, but fields inside data structure.

Also need to consider where data is coming from, filesystem, EPG, ?

For me it works, but there are probably many more issues hidden. But I would be surprised if this gets fixed differently any time soon, if ever.

This is of course wrong forum, but I'll attach my OpenVix changes related to this anyway.

Fatal signal was caused by filename in wrong format in picinfo inside PictureData. picinfo is not really useful but added "is_valid_utf8" function check (found from net)

Attached Files



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #88 WanWizard

  • PLi® Core member
  • 70,542 posts

+1,812
Excellent

Posted 28 February 2024 - 18:37

Filenames or paths should alsways in bytes, no matter if its 1 or 2.

 

Only for 3 the byte sequence should be converted to string, at which time encoding can be checked so the correct string is displayed.

 

First 1 and 2 need to be fixed, before 3 becomes relevant.

 

That it does not crash anymore... or shown correctly... does not mean that the underlaying issue is 'resolved'... It just means it is at least work-a-round...

 

That is my point.

 

Nothing is fixed in that code, only some errors have been captured.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #89 ocean04

  • Member
  • 19 posts

0
Neutral

Posted 29 February 2024 - 06:23


Nothing is fixed in that code, only some errors have been captured.
 

Well, I disagree about that. Checking picinfo in C++ should be done differently, but I'm not spending more time finding out where or how it was used later on.

About using bytes, how are you planning to start making that change? You still need typemap from in post #82. Btw that map already accepts bytes also.

Other option would be to modify all C++ functions or add new functions.
 



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #90 littlesat

  • PLi® Core member
  • 57,181 posts

+698
Excellent

Posted 29 February 2024 - 08:40

you try to resolve the symptom, not the cause…. So you work-a-round where the crash occurred and not where the cause is induced. And usually when you take a longer time and target for the cause you can come up with easier solutions. So when the cause is a windows format I would target for something that it comes in a correct format instead of fixing it at many many many places where the symptom occurred. So better to stand back and target for the cause. I understand users need to wait longer for a fix…. But it would also be helpful we all should try to search for it instead of stopping at the point the symptom is resolved.

And you explained already need to modify all the swig stuff…. So your solution is likely a bit patch…. So maybe better search for something different?

Edited by littlesat, 29 February 2024 - 08:41.

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #91 ocean04

  • Member
  • 19 posts

0
Neutral

Posted 29 February 2024 - 12:07

So maybe better search for something different?

Already spent some time looking for solutions in 2022, but did not find anything else. Nothing has changed or happened since.

I guess openatv is still supporting py2 in 6.4. Same problem is there too with py3 images.

I'm little surprised there are not more reports of this. in Windows11 FAT32 is still default for USB drives and it means OEM codepage at least for my locale. You are likely have issues if you plug USB drive formatted in Windows to your box.

OpenVix has similar fixes already and Huevos can test and include these too if he wishes. That ascii should probably be replaced with latin-1. Added that because it's better to have some default value for encoding, chardet detect can also return None.

I'm already building images for my own devices, since I need other improvements too, so it's no problem for me to include these too.
 


Edited by ocean04, 29 February 2024 - 12:07.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #92 WanWizard

  • PLi® Core member
  • 70,542 posts

+1,812
Excellent

Posted 29 February 2024 - 12:15

Yes, it is indeed a Python3 specific issue as in Python2, a string is a sequence of bytes. There were a lot of changes needed to the code due to this, most in Python are addressed, but the onces in the C code haven't.

 

My point is that your code doesn't fix the encoding, with which I mean that "München" in iso-8859-1 becomes "München" in utf-8, so retaining the umlaut.

 

It just makes sure Python doesn't crash anymore by stripping anything that isn't utf-8, which imho isn't a fix, it is a workaround, as it doesn't fix the root cause, which is "the variable is a string object with an invalid encoding".

 

Which should be fixed by making that variable not a string, but a bytes object or bytesarray. After this fix, you can (try to) detect the encoding in Python, and convert it to a string in the correct encoding, retaining all relevant data ( in this case the ü ).

 

You don't hear about it because most, if not all, image builders have implemented some workaround to prevent a crash. We for example simply skip files with an invalid encoding, so in OpenPLi you don't see a file called "München" in the movielist...


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #93 ocean04

  • Member
  • 19 posts

0
Neutral

Posted 29 February 2024 - 13:52

It just makes sure Python doesn't crash anymore by stripping anything that isn't utf-8
 
Nothing is stripped in any of the patches (Other than removing problematic filename from picinfo)

Typemap makes the conversion and no information is lost. surrogateescape handles non UTF-8 characters inside UTF-8 string.

Of course it has to be this way, otherwise filename would not work anymore if something was stripped.
 


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #94 WanWizard

  • PLi® Core member
  • 70,542 posts

+1,812
Excellent

Posted 29 February 2024 - 14:10

surrogateescape doesn't convert the iso-8859-1 codepoint for u-umlaut to the equivalent utf-8 codepoint. It just adds "\udc" to make it a valid utf-8 codepoint, that can be converted back if needed. But it doesn't print ü.

 

In iso-8859-1, the u-umlaut is a one-byte character, 0xfc. In utf-8, it is a two-byte character, 0xc3bc.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #95 ocean04

  • Member
  • 19 posts

0
Neutral

Posted 29 February 2024 - 14:32

But it doesn't print ü.

Yes, but now we are talking about displaying in GUI.

That is the chardet detect part, it makes the conversion, nothing stripped.

Conversion only happens when it first checks if string(utf-8) has surrogates.
 



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #96 WanWizard

  • PLi® Core member
  • 70,542 posts

+1,812
Excellent

Posted 29 February 2024 - 14:42

"now"?

 

The entire point of this discussion is about that, to be able to access files with non-utf-8 names, and display them properly.

 

We know we can simply ignore the incorrect encoding, suppress any errors, and displays something Python has come up with as an alternative, but we don't do that, we want this fixed, not worked around.

 

So if this file is called "München.png" at source (i.e. how and where the file was created), it must be displayed in the movie list as "München.png".

 

Which means at some point proper encoding detection and conversion to utf-8 needs to happen, and for that you need to be able to access the variable containing the filename. Which is inpossible as long as it is created as "utf-8 string object" in C, with data that isn't utf-8.

 

You can't use chardet, as it needs a Bytes or Bytearray object, and what you have is an inaccessable String, that is the entire point.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #97 Huevos

  • PLi® Contributor
  • 4,663 posts

+163
Excellent

Posted 29 February 2024 - 16:33

"now"?

 

The entire point of this discussion is about that, to be able to access files with non-utf-8 names, and display them properly.

 

We know we can simply ignore the incorrect encoding, suppress any errors, and displays something Python has come up with as an alternative, but we don't do that, we want this fixed, not worked around.

 

So if this file is called "München.png" at source (i.e. how and where the file was created), it must be displayed in the movie list as "München.png".

 

Which means at some point proper encoding detection and conversion to utf-8 needs to happen, and for that you need to be able to access the variable containing the filename. Which is inpossible as long as it is created as "utf-8 string object" in C, with data that isn't utf-8.

 

You can't use chardet, as it needs a Bytes or Bytearray object, and what you have is an inaccessable String, that is the entire point.

If it is impossible why does it work?

Attached Files

  • Attached File  1.jpg   70.23KB   0 downloads


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #98 ocean04

  • Member
  • 19 posts

0
Neutral

Posted 29 February 2024 - 16:47

You can't use chardet, as it needs a Bytes or Bytearray object, and what you have is an inaccessable String, that is the entire point.

What are you talking about? If you have UTF8 string you can always convert it to bytes.

Python 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '\udcfc'
>>> b = s.encode('UTF-8', 'surrogateescape')
>>> from chardet import detect
>>> e = detect(b)['encoding']
>>> print(b.decode(e))
ü
>>>

And movielist is already working too.

 



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #99 WanWizard

  • PLi® Core member
  • 70,542 posts

+1,812
Excellent

Posted 29 February 2024 - 18:05

That is not what is happening.

 

If you have a filename in iso-8859-1, created for example in Windows, the string object returned by serviceref.getPath() will be an utf-8 object (i.e. in its internals, it is defined as utf-8), but the contents will not be utf-8, it will be iso-8859-1.

 

Unlike in your example, where 's' contains valid utf-8.

 

So:

root@sf8008:~# ls -l /media/nas/Test\ files/M*
-rw-r--r--    1 1024     users        19597 Oct 10 13:20 /media/nas/Test files/M?nchen.png

is displayed on the box with a ? instead of the ü, because of the iso-8859-1 encoding.

 

If you do what you suggest, this is the result:

    path = serviceref.getPath().encode("UTF-8", "surrogateescape")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 27: invalid start byte

 

 

 

 


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #100 ocean04

  • Member
  • 19 posts

0
Neutral

Posted 29 February 2024 - 20:22

But you have not applied OpenVix patches to eServiceReference? It was 2022 but I don't remember getpath() needed fixing in python. getItemDisplayNameText function was added to movielist.




4 user(s) are reading this topic

0 members, 4 guests, 0 anonymous users