Jump to content


Erik Slagter

Member Since 3 Oct 2008
Offline Last Active 03 Sep 2023 16:47
****-

Posts I've Made

In Topic: Enigma 2 freezes "[eFilePushThreadRecorder] thread could not be stopped...

3 September 2023 - 16:46

Yes indeed. I would suggest that anyone having recording issues, for the moment disable timeshifting (either automatic or manual). If the error is gone then, we know where to search.

 

Also, the huge amounts of interrupted reads seem to be a sign that the stopping, later on, will not succeed. Can you confirm that? That during a normal, succesful recording, you do not get the huge amount of interrupted reads? That would also be an interesting pointer.


In Topic: Enigma 2 freezes "[eFilePushThreadRecorder] thread could not be stopped...

27 August 2023 - 16:20

box got just stuck, trying to start playback of a recording.
box was in timeshift, but no signs of a record stopping in the debug log:

2023-08-27T16:41:56+02:00 stb1.swabian.net enigma2: [Network] Getting attribute:  ip  for adapter:  eth0
2023-08-27T16:41:56+02:00 stb1.swabian.net enigma2: [Network] Getting attribute:  netmask  for adapter:  eth0
2023-08-27T16:42:11+02:00 stb1.swabian.net enigma2: [Network] Getting attribute:  ip  for adapter:  eth0
2023-08-27T16:42:11+02:00 stb1.swabian.net enigma2: [Network] Getting attribute:  netmask  for adapter:  eth0
2023-08-27T16:42:14+02:00 stb1.swabian.net enigma2: [RDS] radiotext str: (+++ Merz sieht Deutschland vor großen Herausforderungen)
2023-08-27T16:42:14+02:00 stb1.swabian.net enigma2: [Screen] Showing screen 'RdsInfoDisplaySummary'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [ActionMap] Keymap 'OkCancelActions' -> Action = 'ok'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [eDVBServicePlay] timeshift
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Screen] Showing screen 'InfoBarSummary'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Screen] Showing screen 'InfoBar'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [eDVBServicePlay] timeshift
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [eDVBServicePlay] timeshift
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Parsing embedded skin '<embedded-in-Screensaver>'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Processing screen '<embedded-in-Screensaver>', position=(0, 0), size=(1920 x 1080) for module 'Screensaver'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Parsing embedded skin '<embedded-in-HideVBILine>'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Processing screen '<embedded-in-HideVBILine>', position=(0, 0), size=(1920 x 7) for module 'HideVBILine'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Screen] Showing screen 'HideVBILine'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: resolved to PLAY
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [eDVBServicePlay] unpause
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Processing screen 'VideoMode', position=(127, 67), size=(600 x 75) for module 'VideoMode'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Processing screen 'PVRState', position=(192, 172), size=(225 x 45) for module 'PVRState'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [HelpActionMap] removed duplicity: InfobarSeekActions ('jumpPreviousMark', 'Jump to previous marked position')
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [HelpActionMap] removed duplicity: InfobarSeekActions ('jumpNextMark', 'Jump to next marked position')
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: RemovePopup, id = ZapError
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Navigation] playing:  1:0:0:0:0:0:0:0:0:0:/media/hdd/movie/20230824 2125 - NGC HD - Air Crash Investigation.ts
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [eFilePushThreadRecorder] stopping thread
2023-08-27T16:42:26+02:00 stb1.swabian.net enigma2: [gRC] main thread is non-idle! display spinner!

strace available as well
(by the way, disabled deep standby by change in /etc/rc0.d/S90halt - box will just reboot.)

Can you change the setting "show zap errors" to "disable" (just to check)?

 

Also can you, when this happens, login to the receiver and check whether enigma is waiting for the buffer cache to flush?

You can do that by writing a chunk of data to the same disk (local, afaik) as enigma is writing it's recordings to and see if it returns in a reasonable amount of time. Something like

 

dd count=1024 if=/dev/zero of=/hdd/testfile status=progress; sync

 

You can remove the file afterwards. I am suspecting harddisk can't keep up with the data, for whatever reason, data is filling memory, and at some moment, the kernel decides there is no more memory available for buffering and blocks all writing threads until enough data has been written and cleared from the buffer cache.

 

That could also explain why we (Wanwizard and I) are not experiencing this, as we're recording to a NAS (using NFS).

 

There is also another difference between recording on local disk and a NAS: signals can interrupt I/O system calls and if that happens, the calling process gets a status "this system call has been interrupted" (EINTR) and how much of the data has actually been written or received. The process can then handle the signal and continue where it left. The premisse in the code is that this works. There is a caveat though. For devices that are deemed to be always available and quick to write (and a local disk is one of them), the I/O system calls cannot be interrupted. Not even a ^C or sigterm or sigkill will abort the system call. That means the code does not 100% works like it's designed; as long as data is ready to write (and not written), the thread will not gain control, it will remain in the system call until it's finished, however many signals are sent. As most of the time, a write system call to harddisk is quick to return, that is not necessarily a problem, but if the disk is slow, for whatever reason, the system call can be long to complete and maybe, just maybe, a blocked signal can get lost in that situation.

 

There are a few other paths I am exploring here:

 

- it's very interesting that the read() gets interrupted many times, even though all signals besides sigusr1 are blocked. What is that signal we can't block (as system calls can only be interrupted by signals). If that is a sigusr1 from another source (for whatever reason), we can have the situation where one spurious and one proper signal is to be delivered and at that moment the blocking is still active (at the start). Blocked signals always only have one instance queued, so the proper signal might get lost in that case. Maybe we should try using another signal, like sigusr2, as enigma is using sigusr1 at other locations as well. Better be safe than sorry.

- if we really don't get this working, I am willing to try a construction where we're not using signals at all, that is possible.

 

In the meantime I now have an up-and-running OE environment again, so I can try and test some things.


In Topic: Enigma 2 freezes "[eFilePushThreadRecorder] thread could not be stopped...

27 August 2023 - 16:19

ran a 5 minute strace, afer box got stuck this morning.

don't know what exactly to look for.

it's mostly repeating:

[pid  1530] 10:55:53.097054 ioctl(8, _IOC(_IOC_NONE, 0, 0x22, 0), 0x12) = 0
[pid  1530] 10:55:53.116929 futex(0x343be0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  1530] 10:55:53.117073 futex_time64(0x343c20, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1692953753, tv_nsec=502132989093597259}, FUTEX_BITSET_MATCH_ANY) = -1 ENOSYS (Function not implemented)
[pid  1530] 10:55:53.117215 futex(0x343c20, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1692953753, tv_nsec=216911947}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)

attachicon.gif strace1.txt

 

maybe I will run enigma2 with strace tonight, to see what happens meanwhile the box gets stuck again.

 

Interessing that in this trace, there are no reads or writes whatsoever. There are some poll()'s, but I can't relate them to filepush.c.


In Topic: Enigma 2 freezes "[eFilePushThreadRecorder] thread could not be stopped...

27 August 2023 - 16:13

ran a 5 minute strace, afer box got stuck this morning.

don't know what exactly to look for.

it's mostly repeating:

[pid  1530] 10:55:53.097054 ioctl(8, _IOC(_IOC_NONE, 0, 0x22, 0), 0x12) = 0
[pid  1530] 10:55:53.116929 futex(0x343be0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  1530] 10:55:53.117073 futex_time64(0x343c20, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1692953753, tv_nsec=502132989093597259}, FUTEX_BITSET_MATCH_ANY) = -1 ENOSYS (Function not implemented)
[pid  1530] 10:55:53.117215 futex(0x343c20, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1692953753, tv_nsec=216911947}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)

attachicon.gif strace1.txt

 

maybe I will run enigma2 with strace tonight, to see what happens meanwhile the box gets stuck again.

 

I will have look. Please in the future exclude calls to futex (-e !futex) because they tend to fill up the log, without adding any information.


In Topic: Enigma 2 freezes "[eFilePushThreadRecorder] thread could not be stopped...

22 August 2023 - 17:33

At this moment I am suspecting the read system call to not be interrupted by the signal. According to the manpage system calls are only interrupted by signals when they're expected to take "some time". There is no hard specification, but they mention some examples, e.g. reads/writes to/from a local disk are expected to be a "quick" operation and cannot be interrupted. Read/writes to/from a remote file system (like NFS and SMB) can be considered to be a "slow operation" and can probably be interrupted.

 

At the same time I am seeing that I don't have problems, while I am using an NFS mount. The same goes for Wanwizard. Maybe the people that are experiencing issue are using a local disk, which could explain the difference. The exact behaviour of read/write can also be influenced by glibc, which, hey, happens to be updated when we moved to OE Zeus, from which point we are seeing the issues.

 

So I think the next step really should be debugging this. I'd like to ask anyone who has a frozen enigma, to login to the box, start strace on enigma, and check if any thread is blocking on a read(). If that turns out to be difficult, I could add some debugging code to show this.