box got just stuck, trying to start playback of a recording.
box was in timeshift, but no signs of a record stopping in the debug log:
2023-08-27T16:41:56+02:00 stb1.swabian.net enigma2: [Network] Getting attribute: ip for adapter: eth0
2023-08-27T16:41:56+02:00 stb1.swabian.net enigma2: [Network] Getting attribute: netmask for adapter: eth0
2023-08-27T16:42:11+02:00 stb1.swabian.net enigma2: [Network] Getting attribute: ip for adapter: eth0
2023-08-27T16:42:11+02:00 stb1.swabian.net enigma2: [Network] Getting attribute: netmask for adapter: eth0
2023-08-27T16:42:14+02:00 stb1.swabian.net enigma2: [RDS] radiotext str: (+++ Merz sieht Deutschland vor großen Herausforderungen)
2023-08-27T16:42:14+02:00 stb1.swabian.net enigma2: [Screen] Showing screen 'RdsInfoDisplaySummary'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [ActionMap] Keymap 'OkCancelActions' -> Action = 'ok'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [eDVBServicePlay] timeshift
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Screen] Showing screen 'InfoBarSummary'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Screen] Showing screen 'InfoBar'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [eDVBServicePlay] timeshift
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [eDVBServicePlay] timeshift
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Parsing embedded skin '<embedded-in-Screensaver>'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Processing screen '<embedded-in-Screensaver>', position=(0, 0), size=(1920 x 1080) for module 'Screensaver'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Parsing embedded skin '<embedded-in-HideVBILine>'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Processing screen '<embedded-in-HideVBILine>', position=(0, 0), size=(1920 x 7) for module 'HideVBILine'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Screen] Showing screen 'HideVBILine'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: resolved to PLAY
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [eDVBServicePlay] unpause
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Processing screen 'VideoMode', position=(127, 67), size=(600 x 75) for module 'VideoMode'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Skin] Processing screen 'PVRState', position=(192, 172), size=(225 x 45) for module 'PVRState'.
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [HelpActionMap] removed duplicity: InfobarSeekActions ('jumpPreviousMark', 'Jump to previous marked position')
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [HelpActionMap] removed duplicity: InfobarSeekActions ('jumpNextMark', 'Jump to next marked position')
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: RemovePopup, id = ZapError
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [Navigation] playing: 1:0:0:0:0:0:0:0:0:0:/media/hdd/movie/20230824 2125 - NGC HD - Air Crash Investigation.ts
2023-08-27T16:42:24+02:00 stb1.swabian.net enigma2: [eFilePushThreadRecorder] stopping thread
2023-08-27T16:42:26+02:00 stb1.swabian.net enigma2: [gRC] main thread is non-idle! display spinner!
strace available as well
(by the way, disabled deep standby by change in /etc/rc0.d/S90halt - box will just reboot.)
Can you change the setting "show zap errors" to "disable" (just to check)?
Also can you, when this happens, login to the receiver and check whether enigma is waiting for the buffer cache to flush?
You can do that by writing a chunk of data to the same disk (local, afaik) as enigma is writing it's recordings to and see if it returns in a reasonable amount of time. Something like
dd count=1024 if=/dev/zero of=/hdd/testfile status=progress; sync
You can remove the file afterwards. I am suspecting harddisk can't keep up with the data, for whatever reason, data is filling memory, and at some moment, the kernel decides there is no more memory available for buffering and blocks all writing threads until enough data has been written and cleared from the buffer cache.
That could also explain why we (Wanwizard and I) are not experiencing this, as we're recording to a NAS (using NFS).
There is also another difference between recording on local disk and a NAS: signals can interrupt I/O system calls and if that happens, the calling process gets a status "this system call has been interrupted" (EINTR) and how much of the data has actually been written or received. The process can then handle the signal and continue where it left. The premisse in the code is that this works. There is a caveat though. For devices that are deemed to be always available and quick to write (and a local disk is one of them), the I/O system calls cannot be interrupted. Not even a ^C or sigterm or sigkill will abort the system call. That means the code does not 100% works like it's designed; as long as data is ready to write (and not written), the thread will not gain control, it will remain in the system call until it's finished, however many signals are sent. As most of the time, a write system call to harddisk is quick to return, that is not necessarily a problem, but if the disk is slow, for whatever reason, the system call can be long to complete and maybe, just maybe, a blocked signal can get lost in that situation.
There are a few other paths I am exploring here:
- it's very interesting that the read() gets interrupted many times, even though all signals besides sigusr1 are blocked. What is that signal we can't block (as system calls can only be interrupted by signals). If that is a sigusr1 from another source (for whatever reason), we can have the situation where one spurious and one proper signal is to be delivered and at that moment the blocking is still active (at the start). Blocked signals always only have one instance queued, so the proper signal might get lost in that case. Maybe we should try using another signal, like sigusr2, as enigma is using sigusr1 at other locations as well. Better be safe than sorry.
- if we really don't get this working, I am willing to try a construction where we're not using signals at all, that is possible.
In the meantime I now have an up-and-running OE environment again, so I can try and test some things.
Edited by Erik Slagter, 27 August 2023 - 16:38.