Jump to content


Photo

RC 9.0 - Problems with Windows filenames that contain Umlauts


  • Please log in to reply
121 replies to this topic

Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #21 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 11 October 2023 - 15:06

I have now tested on another box, the Xtrend ET7500. It crashes too.

 

If it was related to the SOC it would crash also with Rel.8.3.

I suspect, the issue is how python3 passes the filename to the C code.

 

Possible, but the fact remains it works fine on a box with a HiSilicon box.

 

Does it depend on locale settings?

 

I can't imagine, UTF-8 is not influenced by locale settings. It might have been related to how I unzipped your zip file (on my Laptop and then FTP'd to the box).

 

I can try again with the tarball.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #22 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 11 October 2023 - 15:12

Done so, unpacked the tarball on the box itself, and now it crashes too (on a HiSilicon box). So FTP does some translation appearently.

 

The question now is why, as there is no trace from crashes in the C code.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #23 Stan

  • Senior Member
  • 529 posts

+6
Neutral

Posted 11 October 2023 - 15:29

I believe the filename is coded in LATIN-1.

root@hd51:/tmp# python -c "import sys; print(repr(sys.argv[1]))" M▒nchen.png
'M\udcfcnchen.png'

where the character ü is represented as /udcfc



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #24 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 11 October 2023 - 16:01

Which is logical if the file name was Windows encoding, ISO-8859-1 is Latin-1.

 

I now have a short error message:

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid

which unfortunately doesn't tell me a lot more.
 


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #25 Abu Baniaz

  • PLi® Contributor
  • 2,534 posts

+65
Good

Posted 11 October 2023 - 16:08

@wanwizard @frenske can you please have a look at these and see what is of beneficial/acceptable to PLI

 

https://forums.openp...s/#entry1534332



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #26 littlesat

  • PLi® Core member
  • 57,659 posts

+709
Excellent

Posted 11 October 2023 - 16:43

Sounds like it was indeed an issue we're working on and somehow we forgot about it.... and in between @Vix it was resolved in multiple patches.

 

I think this indeed likely resolved it... but I'm not convinced yet if this is the best method to resolve it... But at this moment we do not have something else.... (All triggered by the py3 conversion and this time in the swig part).

 

I also think that we should keep it simple and adapt it for py3 only. Now it looks like it is still backwards compatible with py2.... but I'm not fully sure.... The descriptions are not that clear...

 

I'm only convinced that this indeed most likely resolved 'decode' related issues.

# filename fixes in eServiceReference and MovieList (fixes enigma crash)
https://github.com/OpenViX/enigma2/commit/2c74d42983c6969c0d2ec87b3c48622ba0ff3a45
https://github.com/OpenViX/enigma2/commit/6efb844aba07357c3285e120bde4dc048e2bbad8
https://github.com/OpenViX/enigma2/commit/93fa129f5555bef5e24442d1654d28656359f5e2
https://github.com/OpenViX/enigma2/commit/d9f180b06300586bf7f1e9fa34eed310937e2bef
https://github.com/OpenViX/enigma2/commit/69bb2ba48b5376c5f303b46fd399d8bf53cf28c7

# similar fixes for file_eraser and Trashcan
https://github.com/OpenViX/enigma2/commit/29a92e82ce71df88b6d48af0968df2833cab022b
https://github.com/OpenViX/enigma2/commit/ab86bf81616e75aa3928a82be90d6a91bc5aa220
https://github.com/OpenViX/enigma2/commit/9bb758215a1d7996be4c4a4f880ed9ab1d5758ab
https://github.com/OpenViX/enigma2/commit/69bb2ba48b5376c5f303b46fd399d8bf53cf28c7
https://github.com/OpenViX/enigma2/commit/9624cf8a1df2eca5a11b6810b63f9b972e408076
 

Edited by littlesat, 11 October 2023 - 16:47.

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #27 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 11 October 2023 - 17:21

Great.

 

I've looked at them, but I'm not sure this is correct.

 

The root cause of the problem is that when the directory entries are enumerated into eServiceReference objects, no checks happen. So once you have an object for the offending image, E2 crashes as soon as you try to do something with the result of getName().

 

So the most logical place to fix it is at the point of creation of the eServiceReference object, not at every point in the code that object might be used...

 

i.e.

print("[MovieList] getName");
name = info.getName(serviceref)
print("[MovieList] print its type");
print(type(name))
print("[MovieList] print it");
print(name)

results in

[MovieList] getName
[MovieList] print its type
<class 'str'>
[MovieList] print it
PC: b56806f8
Fault Address: 00000004
Error Code: 23
Backtrace:
enigma2(_Z17handleFatalSignaliP9siginfo_tPv) [0x77844]
/lib/libc.so.6(__default_rt_sa_restorer) [0xB51B3000]
-------FATAL SIGNAL 11

 

 


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #28 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 11 October 2023 - 22:53

Can't be fixed in eServiceReference, as the python code expects to receive a "valid" path, i.e. something that can be stat().

 

So the only solution is to find all occurences of getPath() and deal with it on the display side. This is going to be complex, as there are 90 occurrences is the base code alone...


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #29 littlesat

  • PLi® Core member
  • 57,659 posts

+709
Excellent

Posted 12 October 2023 - 07:12

And modify getPath instead of fix it at the occurences? That was exactly the gutfeeling huevos’s patch gives me. Most times the real thing is not resolved.
Another thing is why I do not experience it.

Edited by littlesat, 12 October 2023 - 07:13.

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #30 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 12 October 2023 - 13:39

The problem is that getPath() needs to return the real path/filename, as lots of code in Python expects that, so it can do stat(), isdir(), etc.

 

So it can only be conversed when the path is going to be used for display. I fixed the movielist yesterday, which moved the problem to the picture player, where ePicLoad now crashes on it.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #31 Stan

  • Senior Member
  • 529 posts

+6
Neutral

Posted 12 October 2023 - 15:03

[... there are 90 occurrences is the base code alone...]

Then maybe the filename conversion should be outsourced to a module.



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #32 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 12 October 2023 - 21:07

Difficult as you have to be 100% sure the filename is no longer used as "filename", because after recoding it can no longer be used to access the file.

 

To be clear, getPath() returns an instance of <str> encoded in utf-8, which does not contain utf-8. So as soon as the class is accessed, it crashes because of the invalid data.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #33 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 13 October 2023 - 22:41

Just tested OpenATV: they simply don't show the file at all. And OpenVIX (it does display the file, with umlaut due to their autodetection of the encoding which I didn't add as it hides the problem), but it also crashes in the PicturePlayer.

 

So nothing fixed there too...


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #34 Dimitrij

  • PLi® Core member
  • 10,423 posts

+355
Excellent

Posted 14 October 2023 - 07:24

So need fix PicturePlayer


GigaBlue UHD Quad 4K /Lunix3-4K/Duo 4K


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #35 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 14 October 2023 - 12:40

It crahses in ePicLoad, on this line: https://github.com/O...ayer/ui.py#L532, and my C++ is probably worse than my Python.

 

I had a brief look at it, startDecode() needs a char pointer as argument. With my limited knowledge, shouldn't that be bytes instead of str?

 

If so, then we're back to square one: I haven't found any way in Python to convert an str object to bytes, byte for byte in binary, without doing any encoding. Which is needed because encoding alters the filename and renders it unusable.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #36 DimitarCC

  • PLi® Contributor
  • 1,661 posts

+85
Good

Posted 14 October 2023 - 14:06

The problem here is that c++ declaration accepts const char *filename which is standard ASCII char[] pointer. So there can not be added utf8 encoded chars (or at least will not parse them correct) i think.

For that it have to be used std::string in c++. Or in python first the utf string to be converted to ascii one, but that probably will result to file not found.


Edited by DimitarCC, 14 October 2023 - 14:10.

Vu+DUO4KSE, DM920UHD, Vu+Uno4KSE, SF8008Mini, 2xPulse4K, Vu+Solo2, Dreambox DM500HD, Triax 78 (7E,9E,13E,19.2E,23.5E) & 2xTriax 78 (39E)


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #37 DimitarCC

  • PLi® Contributor
  • 1,661 posts

+85
Good

Posted 14 October 2023 - 14:49

according to the specs if you use c++20 compiler c++ level then you need to use char8_t instead of char* so to be able to use unicode chars in strings.


Vu+DUO4KSE, DM920UHD, Vu+Uno4KSE, SF8008Mini, 2xPulse4K, Vu+Solo2, Dreambox DM500HD, Triax 78 (7E,9E,13E,19.2E,23.5E) & 2xTriax 78 (39E)


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #38 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 14 October 2023 - 15:20

The problem here is that c++ declaration accepts const char *filename which is standard ASCII char[] pointer. So there can not be added utf8 encoded chars (or at least will not parse them correct) i think.

For that it have to be used std::string in c++. Or in python first the utf string to be converted to ascii one, but that probably will result to file not found.

 

You want it the other way around, leave the declaration a char pointer (as a filename at the filesystem level is in bytes without any (notion of) encoding. Changing ePicLoad to std:string will open a new can of worms futher down the line.

 

Which means the str object we have in python needs to be converted to bytes. Problem is that I haven't found a way to do that yet without loss of data, because the encoding is unknown, so encode() can't be used.

 

according to the specs if you use c++20 compiler c++ level then you need to use char8_t instead of char* so to be able to use unicode chars in strings.

 

This needs to be avoided, as all file operations in C need a c_str().


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #39 WanWizard

  • PLi® Core member
  • 71,236 posts

+1,842
Excellent

Posted 14 October 2023 - 16:09

Double checked, and indeed, serviceFS uses

filename.c_str()

do execute all file operations, so we need the equivalent function in python. Or an alternative to getPath() that returns a c_str instead of an std::string.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #40 DimitarCC

  • PLi® Contributor
  • 1,661 posts

+85
Good

Posted 14 October 2023 - 16:42

Well i am not expert in c++ but i think c_str is just returning a char array from a string with null terminator at the end. But every char of this array is only 1 byte. The difference between unicode char and ascii char is the bytes required store the data for a char...
Which differs from system to system. But normally one utf char is 2 bytes.
I think the char struct in c++ can not hold the data for utf chars since there is not enough space for that in it. Thats why there are wchar and char8_t.
Also i think even if you provide byte array from python c++ will dont know how to handle it since will not know what is the source encodding.
Also i believe wchar can hold also not utf chars data but it will reserve double space in memory.
So in my opinion the c++ part have to be changed and not the python one.

Vu+DUO4KSE, DM920UHD, Vu+Uno4KSE, SF8008Mini, 2xPulse4K, Vu+Solo2, Dreambox DM500HD, Triax 78 (7E,9E,13E,19.2E,23.5E) & 2xTriax 78 (39E)



0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users