Subtitle and Italic
libolibo 25 Jan 2011
Hi
Let's say I have a mymoview.mkv and a mymovie.srt
In this srt I have some text marked as <i>italic</i>
How can it be that the sub parser/render actually renders the <i> tag on screen?
I have looked into it and in /usr/share/enigma2/skin_subtitles.xml actually we do have settings for Subtitle_Italic. Is it gst-plugin-subparse rendering it wrong?
I am wondering if you experience the same problem...
Let's say I have a mymoview.mkv and a mymovie.srt
In this srt I have some text marked as <i>italic</i>
How can it be that the sub parser/render actually renders the <i> tag on screen?
I have looked into it and in /usr/share/enigma2/skin_subtitles.xml actually we do have settings for Subtitle_Italic. Is it gst-plugin-subparse rendering it wrong?
I am wondering if you experience the same problem...
littlesat 26 Jan 2011
Currently the subtitles cannot be rendered in Italic as this has not been coded (yet) in the correct way by DMM... We did made a lot of changes to the teletekst subtitles.
What should be done is that in eSubtitle.cpp the font has to be changed to a correct font in italian -or- what also could be an improvement is simply remove all the tags.
Do you have an example of those subtitles (srt file)?
What should be done is that in eSubtitle.cpp the font has to be changed to a correct font in italian -or- what also could be an improvement is simply remove all the tags.
Do you have an example of those subtitles (srt file)?
littlesat 26 Jan 2011
Do you really see the <i> in the subtitle.... when the <1> and </i> signs are in the begin of the line and at the end it should work fine....When it is in the middle it seems to be that the line is cut in the begin and at the end and then we see <i>'s...
See the code in /lib/gui/esubtitle.cpp
See the code in /lib/gui/esubtitle.cpp
262 face = Subtitle_Regular; 263 ePangoSubtitlePageElement &element = m_pango_page.m_elements[i]; 264 std::string text = element.m_pango_line; 265 std::string::size_type loc = text.find("<", 0 ); 266 if ( loc != std::string::npos ) 267 { 268 switch (char(text.at(1))) 269 { 270 case 'i': 271 face = Subtitle_Italic; 272 break; 273 case 'b': 274 face = Subtitle_Bold; 275 break; 276 } 277 text = text.substr(3, text.length()-7); 278 }
libolibo 26 Jan 2011
This is rendered on screen as <i>
Let me know if you want the entire .srt file to try it yourself
23
00:01:14,648 --> 00:01:16,248
<i>Move it.</i>
<i>you dare fall.</i>
Let me know if you want the entire .srt file to try it yourself
23
00:01:14,648 --> 00:01:16,248
<i>Move it.</i>
<i>you dare fall.</i>
libolibo 26 Jan 2011
Anyway I am quite confused. What do we use to render subtitles?
1)the code in /lib/gui/esubtitle.cpp in the Enigma2 repository
or
2) the code in gst/subparse/gstsubparse.c in the gst-plugins-base repository (gstreamer) (git://anongit.freedesktop.org/gstreamer/gst-plugins-base)
If you do both remember that gstsubparse escapes the <i> into "<i>"
So the code at line 265 in /lib/gui/esubtitle.cpp will never match < as the char at position 0, since it's &.
I refer to this line:
This is stated in /lib/gui/esubtitle.cpp at line 719 in the comment of the method subrip_unescape_formatting
1)the code in /lib/gui/esubtitle.cpp in the Enigma2 repository
or
2) the code in gst/subparse/gstsubparse.c in the gst-plugins-base repository (gstreamer) (git://anongit.freedesktop.org/gstreamer/gst-plugins-base)
If you do both remember that gstsubparse escapes the <i> into "<i>"
So the code at line 265 in /lib/gui/esubtitle.cpp will never match < as the char at position 0, since it's &.
I refer to this line:
std::string::size_type loc = text.find("<", 0 );
This is stated in /lib/gui/esubtitle.cpp at line 719 in the comment of the method subrip_unescape_formatting
/* we want to escape text in general, but retain basic markup like * <i></i>, <u></u>, and <b></b>. The easiest and safest way is to * just unescape a white list of allowed markups again after * escaping everything (the text between these simple markers isn't * necessarily escaped, so it seems best to do it like this) */ static void subrip_unescape_formatting (gchar * txt)
pieterg 27 Jan 2011
you might be on to the problem, but your conclusion is not entirely correct;
finds the location of the first occurance of '<' in the text. It doesn't just check the first character of the string.
std::string::size_type loc = text.find("<", 0 );
finds the location of the first occurance of '<' in the text. It doesn't just check the first character of the string.
littlesat 27 Jan 2011
But afterwards it removes the first three characters and the last 4 (7-3=4)... seems to be a bit ghosting this code ;-)
libolibo 27 Jan 2011
@pieterg
the std::string::size_type loc is used in the next line for an if statement so actually it's not just a find.
so if the string starts with an & the switch is never triggered and the string is never parsed for italic or bold
the std::string::size_type loc is used in the next line for an if statement so actually it's not just a find.
std::string::size_type loc = text.find("<", 0 ); if ( loc != std::string::npos ) { switch (char(text.at(1))) { case 'i': face = Subtitle_Italic;
so if the string starts with an & the switch is never triggered and the string is never parsed for italic or bold
pieterg 27 Jan 2011
ok, your conclusion seemed to be about the find, I hadn't looked at the remainder of the code yet
littlesat 27 Jan 2011
So in short you mean the < is replaced by an & something previously.... below in the code some & something are converted... problable we should replace this above this code and in addition add something so get the < and / also in a proper format.
Also removing from the beginning and the end is strange... we could afterwards remove <i>, </i>, <b> and </b> from the complete string
But then still the whole line is italic or bold or normal. We cannot do (yet) a part of it.
Also removing from the beginning and the end is strange... we could afterwards remove <i>, </i>, <b> and </b> from the complete string
But then still the whole line is italic or bold or normal. We cannot do (yet) a part of it.
libolibo 27 Jan 2011
We could do some regexp matching trying to match <i> in addition with the find < in position 0.
In this way we can handle escaped and not escaped strings.
I am quite rusty at C++ and quite spoiled with Ruby.. so I am not sure how to write it myself :-)
In this way we can handle escaped and not escaped strings.
I am quite rusty at C++ and quite spoiled with Ruby.. so I am not sure how to write it myself :-)
littlesat 27 Jan 2011
Just as suggestion....
But somewhere else the <> etc are changed by &-signs... Why not remove this manupulation there then all replace_alls could be removed here. I did not find yet were the characters are changed to &-signs.
In addition I would change 'text = text.substr(3, text.length()-7); ' by just removing <i> <b> </i> </b>,,,,,
But somewhere else the <> etc are changed by &-signs... Why not remove this manupulation there then all replace_alls could be removed here. I did not find yet were the characters are changed to &-signs.
In addition I would change 'text = text.substr(3, text.length()-7); ' by just removing <i> <b> </i> </b>,,,,,
text = replace_all(text, "'", "'"); text = replace_all(text, """, "\""); text = replace_all(text, "&", "&"); text = replace_all(text, "<", "<"); text = replace_all(text, ">", ">"); std::string::size_type loc = text.find("<", 0 ); if ( loc != std::string::npos ) { switch (char(text.at(1))) { case 'i': face = Subtitle_Italic; break; case 'b': face = Subtitle_Bold; break; } text = text.substr(3, text.length()-7); }
littlesat 27 Jan 2011
This was just a brain storm... but I think it is better to get rid of the &-codes completely.... as in the orriginal file they are also not there.Sometimes a quick fix is a dirty fix.
pieterg 27 Jan 2011
Just as suggestion....
yes, that looks like the quickest workaround.
Did not find eString html conversion routines, most likely they only existed in e1's eString, as e2 doesn't do html in the c++ part.
But somewhere else the <> etc are changed by &-signs... Why not remove this manupulation there then all replace_alls could be removed here. I did not find yet were the characters are changed to &-signs.
No, that's probably a gstreamer element. We'd rather not patch that.
In addition I would change 'text = text.substr(3, text.length()-7); ' by just removing <i> <b> </i> </b>,,,,,
yes, the current aproach is very error-prone. But a proper pango parser would require a lot more code.
On the other hand, coding a slightly better parser would allow us to introduce \i \b escapecodes, just like we do with colors...