Jump to content


Photo

RC 9.0 - Problems with Windows filenames that contain Umlauts


  • Please log in to reply
121 replies to this topic

Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #61 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 15 October 2023 - 19:59

Applying that commit doesn't fix it, it still crashes.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #62 birdman

  • Senior Member
  • 25 posts

+1
Neutral

Posted 16 October 2023 - 12:08

You need to read this in the context of the problem, no point having an abstract and therefore not relevant discussion.

I agree. But if the problem is that the model is an incorrect view of reality then it is the model which is the problem and it is the model has to change.
 

In E2, the files in a directory are enumerated in a list of serviceref objects, in which the path is an std::string, which is set from readdir() output. This is how the original sequence of bytes result from the filesystem ends up in a Python str object.

That is what is wrong. The original sequence of bytes has to end up in a Python bytes object, as it is not convertible into a str
 

This creates a sort of catch-22:

  • the string can't be handled in python without causing a crash, in case the string doesn't contain utf-8
  • it is not easy to convert the string if you don't know the original encoding (although you can guess using chardet, which OpenVIX has done, and I also implemented, but not committed yet)
  • you can't alter the path in the serviceref itself, as that is also used to access the file, any conversion of that variable causes file access to fail

It's only a catch-22 if you try to make it a str. If you leave it as bytes the problem goes away (apart from displaying it, but that is a simpler issue).



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #63 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 16 October 2023 - 14:24

 

In E2, the files in a directory are enumerated in a list of serviceref objects, in which the path is an std::string, which is set from readdir() output. This is how the original sequence of bytes result from the filesystem ends up in a Python str object.


That is what is wrong. The original sequence of bytes has to end up in a Python bytes object, as it is not convertible into a str

 

 

Agreed that fixing the root cause is the best way forward. But that might have a lot of impact futher down the line.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #64 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 16 October 2023 - 15:35

For those more in the know than I am:

 

wouldn't it be a lot simpler to add

%begin %{
#define SWIG_PYTHON_STRICT_BYTE_CHAR
%}

char *charstring(char *s) {
  return s;
}

which would cause all strings to be retturned to be bytes objects, like in Py2, which would mean the interface would remain compatible, and any utf8 stuff can be dealt with in Python?


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #65 Dutchdude

  • Senior Member
  • 69 posts

+1
Neutral

Posted 24 February 2024 - 18:32

Not a python expert, but i noticed the same behaviour in R9 with exact the same info in the crashlog as mentioned before in this thread.

Was wondering if this issue is still on the backlog for R9.1?


VuDuo4K, Zgemma H2H, DM8000


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #66 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 24 February 2024 - 18:53

Afaik the problem was never addressed, just worked around (i.e. not show files with invalid encoding).


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #67 athoik

  • PLi® Core member
  • 8,458 posts

+327
Excellent

Posted 25 February 2024 - 14:26

What you tried that one: https://tr.opensuse...._FAT_Partitions


Wavefield T90: 0.8W - 1.9E - 4.8E - 13E - 16E - 19.2E - 23.5E - 26E - 33E - 39E - 42E - 45E on EMP Centauri DiseqC 16/1
Unamed: 13E Quattro - 9E Quattro on IKUSI MS-0916

Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #68 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 25 February 2024 - 14:36

You can't deal with it at the mount level, as the problem is with a single filename, all others could be perfectly valid UTF-8.

 

The root cause of this issue is the fact that a variable with raw filename data should be BYTE_CHAR in py3, so it can be converted to string using (in this case) some encoding guesswork.

 

Now it is a string containing invalid data, and any attempt to access that variable causes a python exception.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #69 athoik

  • PLi® Core member
  • 8,458 posts

+327
Excellent

Posted 25 February 2024 - 14:43

Maybe on 8.X the mount options for FAT was iso8859-1.

 

Now on 9.X maybe the default mount options changed to utf8.

 

So ü is causing that crash.

 

I bielieve the problems comes from the default iocharset. ISO vs UTF8 nowadays.

 

FYI https://github.com/s...arset&type=code


Edited by athoik, 25 February 2024 - 14:45.

Wavefield T90: 0.8W - 1.9E - 4.8E - 13E - 16E - 19.2E - 23.5E - 26E - 33E - 39E - 42E - 45E on EMP Centauri DiseqC 16/1
Unamed: 13E Quattro - 9E Quattro on IKUSI MS-0916

Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #70 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 25 February 2024 - 15:07

It has nothing to do with FAT. Or with mount options. The problem is the filename isn't in the encoding the device says it should be.

 

I've tested with a filename encoded in iso8859-1 which was stored on my NAS (where everything is utf-8).

 

Easy to do, for example have same storage exported in utf-8 to your box, and via CIFS to your Windows PC. All files stored by Windows will have Windows encoding.


Edited by WanWizard, 25 February 2024 - 15:09.

Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #71 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 25 February 2024 - 15:22

The issue here is that the files in a directory are enumerated by C++ code in ServiceReference, and that code returned a string in Py2, and should return a bytearray in Py3.

 

But it doesn't, it returns a string not containing a valid encoded sequence of bytes. Making Pyhon throw an exception on any attempt trying to access the string.

 

So

path = serviceref.getPath()

crashes when Python tries to access the return value from the C code.
 


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #72 athoik

  • PLi® Core member
  • 8,458 posts

+327
Excellent

Posted 25 February 2024 - 19:58

root@osmio4kplus:~# mount | grep DOSFAT

/dev/sdb1 on /media/DOSFAT type vfat (rw,relatime,gid=6,fmask=0007,dmask=0007,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)

root@osmio4kplus:~# ls /media/DOSFAT

München.png

Attached File  picture.jpg   81.54KB   2 downloads

 

I cannot reproduce, how filename should be on disk?


Wavefield T90: 0.8W - 1.9E - 4.8E - 13E - 16E - 19.2E - 23.5E - 26E - 33E - 39E - 42E - 45E on EMP Centauri DiseqC 16/1
Unamed: 13E Quattro - 9E Quattro on IKUSI MS-0916

Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #73 Huevos

  • PLi® Contributor
  • 4,680 posts

+166
Excellent

Posted 26 February 2024 - 00:49

root@osmio4kplus:~# mount | grep DOSFAT

/dev/sdb1 on /media/DOSFAT type vfat (rw,relatime,gid=6,fmask=0007,dmask=0007,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)

root@osmio4kplus:~# ls /media/DOSFAT

München.png

attachicon.gif picture.jpg

 

I cannot reproduce, how filename should be on disk?

 

This bug shows up in different places.

 

Here is an example from ViX forum. The problem file will be produced on an ntfs file system and then be ftp'd to the box.

Created folder "mymovies" and copied some mp4 files from Win10 to HDD. But now E2 crashes when I try to access folder /media/hdd/mymovies

Easy to reproduce. In WINDOWS: Create folder "mymovies" and add new empty file "pöllö.mp4" and FTP folder to HDD. Now try to access folder..

https://www.world-of...indows-encoding


Edited by Huevos, 26 February 2024 - 01:07.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #74 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 26 February 2024 - 11:58

With FTP you have to be careful because quite a few clients do conversions, and, if text instead of binary, even convert the contents.

 

What I did was zip the file on Windows, transfer the zip file (to make sure no conversion happened) and unzipped it on the box.

 

This is the original I tested with:

Attached Files


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #75 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 26 February 2024 - 12:01

root@osmio4kplus:~# mount | grep DOSFAT

/dev/sdb1 on /media/DOSFAT type vfat (rw,relatime,gid=6,fmask=0007,dmask=0007,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)

root@osmio4kplus:~# ls /media/DOSFAT

München.png

attachicon.gif picture.jpg

 

I cannot reproduce, how filename should be on disk?

 

 

You still don't seem to understand the problem.

 

Here you mount the entire device with iso8859-1 encoding. But the point is that the device is mounted with utf-8 encoding, like for example the local harddisk, and there is 1 file on it whose name is encoded in iso8859-1, because it was created on Windows, and not converted when it was made accessable by Enigma.
 


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #76 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 26 February 2024 - 12:02

This bug shows up in different places.

 

There are two bugs issues here:

 

The first one is that the movielist enumerates files via the C++ servicerreference code, which returns the path in a ServiceReference object as string instead of bytes or bytearray, which makes it unaccessable if the string has an illegal encoding.

 

The second one is when you have a path of filename in Python in a bytes or bytearray variable (no matter the origin), it needs a check if it is in utf-8, and if not, it needs to be converted.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #77 ocean04

  • Member
  • 19 posts

0
Neutral

Posted 26 February 2024 - 20:53

BTW: if you have fixed it, why does OpenVIX still crash on this image?

OpenVix patches fixed most obvious problems, but not all of them. It's possible to fix remaining issues similar way.

Everything was discussed in Vix forum at that time. These patches is not the best/correct way, but there was no other suggestions.

One thing is clear. Filename or path should always be kept in bytes and only converted when displaying it in GUI, for example.

Current implementation assumes paths/filenames are always utf-8, but filesystem is not limited to utf-8 characters.
 



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #78 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 26 February 2024 - 21:15

One thing is clear. Filename or path should always be kept in bytes and only converted when displaying it in GUI, for example.

 

Agreed, and that is not the case in service enumeration, which is the root cause of the problems.

 

As long as this isn't addressed, it simply is impossible to use python code to detect and convert the encoding, as Python crashes as soon as the code tries to access the variable.

 

Once this is fixed, a generic "convertToUtf8()" function can be added, that accepts a string in any encoding or a byes or bytearray variable, and returns a string in utf8. Which can then be used to display the string in the GUI.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.


Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #79 Huevos

  • PLi® Contributor
  • 4,680 posts

+166
Excellent

Posted 27 February 2024 - 00:58

I think these were the patches that fixed the majority of the bugs/issues. Like Ocean says, probably can be improved.

 

Movielist
https://github.com/O...de4dc048e2bbad8
https://github.com/O...54d28656359f5e2

eServiceReference
https://github.com/O...c48622ba0ff3a45

file eraser
https://github.com/O...68df2833cab022b

Trashcan
https://github.com/O...63f9b972e408076
https://github.com/O...399d8bf53cf28c7

 



Re: RC 9.0 - Problems with Windows filenames that contain Umlauts #80 WanWizard

  • PLi® Core member
  • 70,562 posts

+1,813
Excellent

Posted 27 February 2024 - 12:06

None of these fix the original problem that exists in OpenPLi, which is that

serviceref.getPath()

crashes, because the value returned to Python is a string with invalid encoding.

 

As long as that isn't fixed, doing anything in Python is pointless.


Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Ultimate (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)

Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.

Many answers to your question can be found in our new and improved wiki.



1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users