←  [EN] Enduser support

Forums

»

Subtitle and Italic

libolibo's Photo libolibo 25 Jan 2011

Hi

Let's say I have a mymoview.mkv and a mymovie.srt
In this srt I have some text marked as <i>italic</i>

How can it be that the sub parser/render actually renders the <i> tag on screen?

I have looked into it and in /usr/share/enigma2/skin_subtitles.xml actually we do have settings for Subtitle_Italic. Is it gst-plugin-subparse rendering it wrong?

I am wondering if you experience the same problem...
Quote

ficaz's Photo ficaz 26 Jan 2011

I have the same problem too.
Quote

littlesat's Photo littlesat 26 Jan 2011

Currently the subtitles cannot be rendered in Italic as this has not been coded (yet) in the correct way by DMM... We did made a lot of changes to the teletekst subtitles.

What should be done is that in eSubtitle.cpp the font has to be changed to a correct font in italian -or- what also could be an improvement is simply remove all the tags.

Do you have an example of those subtitles (srt file)?
Quote

littlesat's Photo littlesat 26 Jan 2011

Do you really see the <i> in the subtitle.... when the <1> and </i> signs are in the begin of the line and at the end it should work fine....When it is in the middle it seems to be that the line is cut in the begin and at the end and then we see <i>'s...

See the code in /lib/gui/esubtitle.cpp
262						   face = Subtitle_Regular;
263						   ePangoSubtitlePageElement &element = m_pango_page.m_elements[i];
264						   std::string text = element.m_pango_line;
265						   std::string::size_type loc = text.find("<", 0 );
266						   if ( loc != std::string::npos )
267						   {
268								 switch (char(text.at(1)))
269								 {
270								 case 'i':
271									    face = Subtitle_Italic;
272									    break;
273								 case 'b':
274									    face = Subtitle_Bold;
275									    break;
276								 }
277								 text = text.substr(3, text.length()-7);
278						   }
Quote

libolibo's Photo libolibo 26 Jan 2011

This is rendered on screen as <i>
Let me know if you want the entire .srt file to try it yourself

23
00:01:14,648 --> 00:01:16,248
<i>Move it.</i>
<i>you dare fall.</i>
Quote

libolibo's Photo libolibo 26 Jan 2011

Anyway I am quite confused. What do we use to render subtitles?

1)the code in /lib/gui/esubtitle.cpp in the Enigma2 repository

or

2) the code in gst/subparse/gstsubparse.c in the gst-plugins-base repository (gstreamer) (git://anongit.freedesktop.org/gstreamer/gst-plugins-base)

If you do both remember that gstsubparse escapes the <i> into "&lt;i&gt;"
So the code at line 265 in /lib/gui/esubtitle.cpp will never match < as the char at position 0, since it's &.
I refer to this line:
std::string::size_type loc = text.find("<", 0 );

This is stated in /lib/gui/esubtitle.cpp at line 719 in the comment of the method subrip_unescape_formatting
/* we want to escape text in general, but retain basic markup like
* <i></i>, <u></u>, and <b></b>. The easiest and safest way is to
* just unescape a white list of allowed markups again after
* escaping everything (the text between these simple markers isn't
* necessarily escaped, so it seems best to do it like this) */
static void
subrip_unescape_formatting (gchar * txt)
Quote

littlesat's Photo littlesat 26 Jan 2011

You indeed discovered the bug (induced by DMM)...
Quote

libolibo's Photo libolibo 27 Jan 2011

then i deserve at least one golden star instead of blue I have now :-)
Quote

libolibo's Photo libolibo 27 Jan 2011

then i deserve at least one golden star instead of blue I have now :-)
Quote

pieterg's Photo pieterg 27 Jan 2011

you might be on to the problem, but your conclusion is not entirely correct;

std::string::size_type loc = text.find("<", 0 );

finds the location of the first occurance of '<' in the text. It doesn't just check the first character of the string.
Quote

littlesat's Photo littlesat 27 Jan 2011

But afterwards it removes the first three characters and the last 4 (7-3=4)... seems to be a bit ghosting this code ;-)
Quote

libolibo's Photo libolibo 27 Jan 2011

@pieterg
the std::string::size_type loc is used in the next line for an if statement so actually it's not just a find.
std::string::size_type loc = text.find("<", 0 );
if ( loc != std::string::npos )
{
	 switch (char(text.at(1)))
	 {
	 case 'i':
		   face = Subtitle_Italic;

so if the string starts with an & the switch is never triggered and the string is never parsed for italic or bold
Quote

pieterg's Photo pieterg 27 Jan 2011

ok, your conclusion seemed to be about the find, I hadn't looked at the remainder of the code yet ;)
Quote

littlesat's Photo littlesat 27 Jan 2011

So in short you mean the < is replaced by an & something previously.... below in the code some & something are converted... problable we should replace this above this code and in addition add something so get the < and / also in a proper format.

Also removing from the beginning and the end is strange... we could afterwards remove <i>, </i>, <b> and </b> from the complete string

But then still the whole line is italic or bold or normal. We cannot do (yet) a part of it.
Quote

libolibo's Photo libolibo 27 Jan 2011

We could do some regexp matching trying to match &lt;i&gt; in addition with the find < in position 0.
In this way we can handle escaped and not escaped strings.

I am quite rusty at C++ and quite spoiled with Ruby.. so I am not sure how to write it myself :-)
Quote

littlesat's Photo littlesat 27 Jan 2011

Just as suggestion....

But somewhere else the <> etc are changed by &-signs... Why not remove this manupulation there then all replace_alls could be removed here. I did not find yet were the characters are changed to &-signs.

In addition I would change 'text = text.substr(3, text.length()-7); ' by just removing <i> <b> </i> </b>,,,,,

						   text = replace_all(text, "&apos;", "'");
						   text = replace_all(text, "&quot;", "\"");
						   text = replace_all(text, "&amp;", "&");
						   text = replace_all(text, "&lt", "<");
						   text = replace_all(text, "&gt", ">");
						   std::string::size_type loc = text.find("<", 0 );
						   if ( loc != std::string::npos )
						   {
								 switch (char(text.at(1)))
								 {
								 case 'i':
									    face = Subtitle_Italic;
									    break;
								 case 'b':
									    face = Subtitle_Bold;
									    break;
								 }
								 text = text.substr(3, text.length()-7);
						   }
Quote

pieterg's Photo pieterg 27 Jan 2011

quite sure there already is a (eString?) function to remove html escapes?
Quote

libolibo's Photo libolibo 27 Jan 2011

I think littlesat patch deserve a try!
Quote

littlesat's Photo littlesat 27 Jan 2011

This was just a brain storm... but I think it is better to get rid of the &-codes completely.... as in the orriginal file they are also not there.Sometimes a quick fix is a dirty fix.
Quote

pieterg's Photo pieterg 27 Jan 2011

Just as suggestion....


yes, that looks like the quickest workaround.
Did not find eString html conversion routines, most likely they only existed in e1's eString, as e2 doesn't do html in the c++ part.

But somewhere else the <> etc are changed by &-signs... Why not remove this manupulation there then all replace_alls could be removed here. I did not find yet were the characters are changed to &-signs.


No, that's probably a gstreamer element. We'd rather not patch that.

In addition I would change 'text = text.substr(3, text.length()-7); ' by just removing <i> <b> </i> </b>,,,,,


yes, the current aproach is very error-prone. But a proper pango parser would require a lot more code.
On the other hand, coding a slightly better parser would allow us to introduce \i \b escapecodes, just like we do with colors...
Quote