Hello,
I was watching a movie in media player.
I tried to lower the volume, but I guess remote sent some other command and now my subtitles are corrupted. I tried restarting the STB but that doesn't helped!
How do I fix this?
Posted 11 October 2015 - 15:47
If you play this same movie on the PC (using VLC for example), are the subtitles correct there?
Currently in use: VU+ Duo 4K (2xFBC S2), VU+ Solo 4K (1xFBC S2), uClan Usytm 4K Pro (S2+T2), Octagon SF8008 (S2+T2), Zgemma H9.2H (S2+T2)
Due to my bad health, I will not be very active at times and may be slow to respond. I will not read the forum or PM on a regular basis.
Many answers to your question can be found in our new and improved wiki.
Posted 11 October 2015 - 15:54
Posted 11 October 2015 - 15:59
everything was ok before i touched the remote. (i have to say that my remote is kinda failing. even i only touch volume down button, remote may have send some other command)
I use .srt subtitles always converted with notepad++
but before i never converted them to UTF-8 BOM only UTF-8. never had problem before with UTF-8.
will try now with BOM version.
Posted 11 October 2015 - 17:23
Posted 12 October 2015 - 18:27
Athoik can you explain to me what's actually the problem, I can't see it. UTF-8=UTF-8, there is no "encoding" because UTF-8 IS the encoding. The only thing I can think of is byte order or even bit order. Byte order would surprise me because UTF-8 doesn't have a sense of "words", it's just "fifo", one byte at a time. Machines with reversed bit order, do they still exist? So?
* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.
Posted 12 October 2015 - 19:15
static gchar * detect_encoding (const gchar * str, gsize len) { if (len >= 3 && (guint8) str[0] == 0xEF && (guint8) str[1] == 0xBB && (guint8) str[2] == 0xBF) return g_strdup ("UTF-8"); .... .... self->detected_encoding = detect_encoding ((gchar *) map.data, map.size); .... .... static gchar * convert_encoding (GstSubParse * self, const gchar * str, gsize len, gsize * consumed) { .... /* First try any detected encoding */ if (self->detected_encoding) { ret = gst_convert_to_utf8 (str, len, self->detected_encoding, consumed, &err); ...
/* Otherwise check if it's UTF8 */ if (self->valid_utf8) { if (g_utf8_validate (str, len, NULL)) { GST_LOG_OBJECT (self, "valid UTF-8, no conversion needed"); *consumed = len; return g_strndup (str, len); }Also we need to consume only the valid data from gst_utf8_validate. Eg we have 12 Greek characters (== 24 bytes), but buffer contains 23 bytes, only the first 22 bytes (== 11 characters) are valid UTF-8. The next time we are going to fill buffer we have 1 byte from previous run and one more from new read and it will be a valid Greek character now (== 2 bytes).
From: "Reynaldo H. Verdejo Pinochet" <reynaldo@osg.samsung.com> Date: Fri, 28 Nov 2014 13:26:13 -0300 Subject: [PATCH] subparse: avoid false negatives dealing with UTF-8 g_utf8_validate() chokes at any NUL among max_len bytes so we should avoid passing null character terminators if present. Additionally, only part of the available data might be valid UTF-8. For example a byte at the end might be the start of a valid UTF-8 run (ie: d0) but not be a valid UTF-8 character by itself. In this case, we consume only the valid portion of the run. https://bugzilla.gnome.org/show_bug.cgi?id=740784
Edited by athoik, 12 October 2015 - 19:17.
Posted 13 October 2015 - 13:44
Athoik, I still don't understand. What "encoding" are we talking about? Enigma and gstreamer do always assume UTF-8 unless overriden, right? Whenever UTF-8 is established, there is no further encoding, byte ordering (byte-ordering-mark?) etc.
* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.
Posted 13 October 2015 - 19:11
Athoik, I still don't understand. What "encoding" are we talking about? Enigma and gstreamer do always assume UTF-8 unless overriden, right? Whenever UTF-8 is established, there is no further encoding, byte ordering (byte-ordering-mark?) etc.
g_object_class_install_property (object_class, PROP_ENCODING, g_param_spec_string ("subtitle-encoding", "subtitle charset encoding", "Encoding to assume if input subtitles are not in UTF-8 or any other " "Unicode encoding. If not set, the GST_SUBTITLE_ENCODING environment " "variable will be checked for an encoding to use. If that is not set " "either, ISO-8859-15 will be assumed.", DEFAULT_ENCODING, G_PARAM_READWRITE | G_PARAM_STATIC_STRINGS));1. By default everything is UTF-8
self->valid_utf8 = TRUE;2. Uness BOM is detected, if BOM detected, it uses detected encoding from BOM, and tries to convert to detected encoding without validating buffers.
self->detected_encoding = detect_encoding ((gchar *) map.data, map.size); ... /* First try any detected encoding */ if (self->detected_encoding) { ret = gst_convert_to_utf8 (str, len, self->detected_encoding, consumed, &err);3. If No BOM detected, every UTF-8 buffer is validated
/* Otherwise check if it's UTF8 */ if (self->valid_utf8) { if (g_utf8_validate (str, len, NULL)) { GST_LOG_OBJECT (self, "valid UTF-8, no conversion needed"); *consumed = len; return g_strndup (str, len); } GST_INFO_OBJECT (self, "invalid UTF-8!"); self->valid_utf8 = FALSE; }4. If validation fails, it will never try again to validate UTF-8 (valid_utf8 = FALSE)
/* Else try fallback */ encoding = self->encoding; if (encoding == NULL || *encoding == '\0') { encoding = g_getenv ("GST_SUBTITLE_ENCODING"); } if (encoding == NULL || *encoding == '\0') { /* if local encoding is UTF-8 and no encoding specified * via the environment variable, assume ISO-8859-15 */ if (g_get_charset (&encoding)) { encoding = "ISO-8859-15"; } } ret = gst_convert_to_utf8 (str, len, encoding, consumed, &err); if (err) { GST_WARNING_OBJECT (self, "could not convert string from '%s' to UTF-8: %s", encoding, err->message); g_clear_error (&err); /* invalid input encoding, fall back to ISO-8859-15 (always succeeds) */ ret = gst_convert_to_utf8 (str, len, "ISO-8859-15", consumed, NULL); } GST_LOG_OBJECT (self, "successfully converted %" G_GSIZE_FORMAT " characters from %s to UTF-8" "%s", len, encoding, (err) ? " , using ISO-8859-15 as fallback" : "");So, if we have UTF-8 without BOM, and g_utf8_validate fails (Eg ends with NULL) and every valid UTF-8 is converted to failback encoding (ISO-8859-15****).
Posted 14 October 2015 - 18:22
This sounds like very dirty workarounds for very dirty practises. I am with that you can't really determine the encoding of a SRT file (external or embedded). That is a major flaw in the design. So the problem is actually not UTF-8, but using encodings other than UTF-8. I mean, if everybody would use UTF-8, there would be no problem. I guess the option to run every SRT through iconv to UTF-8 is not user-friendly?
I really wish all these encoding would be gone for once and for all. Everyone (including Windows, please) should be using UTF-8.
* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.
0 members, 0 guests, 0 anonymous users