Jump to content


Photo

Teletext subtitles - missing spaces

teletext subtitles

  • Please log in to reply
56 replies to this topic

#1 Stan

  • Senior Member
  • 312 posts

0
Neutral

Posted 24 December 2014 - 13:36

With teletext subtitles, sometimes the spaces between words are missing.

 

Attached File  Sub_TTX_02.jpg   49.36KB   4 downloads

 

The issue has been mentioned by Erik some time ago in another thread.

 

[...]

Sometimes many words in a sub get concatenated without whitespace, sometimes part of the first and last character on a (long) are cropped, sometimes a line gets wrapped at an illogical place, the next word is then in dark grey ink, and the rest of the line is okay. I've seen this only on BBC. Looks like some control character is interpreted incorrectly.

 

It seems to occur when lines with different colours are displayed...

 



Re: Teletext subtitles - missing spaces #2 littlesat

  • PLi® Core member
  • 56,259 posts

+691
Excellent

Posted 24 December 2014 - 13:50

I do not see different colors there.... But a log of enigma2 could be helpful here....

 

Here on Dutch TV we do not see this happening....


Edited by littlesat, 24 December 2014 - 13:50.

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Teletext subtitles - missing spaces #3 ims

  • PLi® Core member
  • 13,623 posts

+212
Excellent

Posted 24 December 2014 - 13:52

here it works well too... (tried on recorded movies)


Kdo nic nedělá, nic nezkazí!

Re: Teletext subtitles - missing spaces #4 Stan

  • Senior Member
  • 312 posts

0
Neutral

Posted 24 December 2014 - 14:41

Yesterday I had it on a Freesat channel (itv2). I did record it.

 

It is strange because on the teletext page itself (888) it looks fine.

 

@littlesat

I could not find the output of the subtitles content in the enigma2 logfile. What are we looking for?



Re: Teletext subtitles - missing spaces #5 littlesat

  • PLi® Core member
  • 56,259 posts

+691
Excellent

Posted 24 December 2014 - 14:43

It should be investigated why these spaces are removed.... wierd...


WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Teletext subtitles - missing spaces #6 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 24 December 2014 - 16:13

It's something that only shows on BBC (for me). I have changed some of the colour subtitle code which helped a lot, but it still appears now and then. I can't explain it. A slight possibility is that two lines are joined (do you have re-formatting enabled?) and no space is inserted, where it should.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Teletext subtitles - missing spaces #7 Stan

  • Senior Member
  • 312 posts

0
Neutral

Posted 24 December 2014 - 17:29

The option "Rewrap teletext subtitles" set to "no", so I guess re-formatting is disabled.



Re: Teletext subtitles - missing spaces #8 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 24 December 2014 - 18:32

Okay good to know, so we can rule that out. Do you see it often? I only notice it very rarely now.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Teletext subtitles - missing spaces #9 Stan

  • Senior Member
  • 312 posts

0
Neutral

Posted 24 December 2014 - 19:59

I had a look at the subtitle teletext page 888 (same scene from first post).

 

Attached File  Page888_02.txt   1000bytes   19 downloads

 

Between the words there is a control character (colour definition).

 

The problem is, that this character is being removed without getting replaced by a space.

 



Re: Teletext subtitles - missing spaces #10 littlesat

  • PLi® Core member
  • 56,259 posts

+691
Excellent

Posted 25 December 2014 - 00:03

Pare you able to give the complete delivered string here?

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Teletext subtitles - missing spaces #11 ims

  • PLi® Core member
  • 13,623 posts

+212
Excellent

Posted 25 December 2014 - 00:13

I had a look at the subtitle teletext page 888 (same scene from first post).

 

attachicon.gifPage888_02.txt

 

Between the words there is a control character (colour definition).

 

The problem is, that this character is being removed without getting replaced by a space.

provider saving chars ?


Kdo nic nedělá, nic nezkazí!

Re: Teletext subtitles - missing spaces #12 Stan

  • Senior Member
  • 312 posts

0
Neutral

Posted 25 December 2014 - 00:23

@littlesat

The string is in the file Page_888_02.txt posted before. Just open the file with a Hex Editor.



Re: Teletext subtitles - missing spaces #13 ims

  • PLi® Core member
  • 13,623 posts

+212
Excellent

Posted 25 December 2014 - 01:03

0x02

Attached Files


Kdo nic nedělá, nic nezkazí!

Re: Teletext subtitles - missing spaces #14 ims

  • PLi® Core member
  • 13,623 posts

+212
Excellent

Posted 25 December 2014 - 02:18

weird, it seems as dropped b3-b0 ... but in space in sentence only  :D ( 0x20 ... 0010 0000b   ... dropped b0-b3  => 0010 = 0x02 )


Kdo nic nedělá, nic nezkazí!

Re: Teletext subtitles - missing spaces #15 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 25 December 2014 - 10:22

The problem is, that this character is being removed without getting replaced by a space.

Yes that is what I expected. On a teletext page the (colour) attribute characters also occupy a space (the frame buffer is very strict 40x22). So the BBC uses them as delimiter just like space would do. On the other hand, I was under the impression I handled this case correctly. But if some can find the bug please go ahead ;)


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Teletext subtitles - missing spaces #16 betacentauri

  • PLi® Core member
  • 7,185 posts

+323
Excellent

Posted 25 December 2014 - 10:35

Mhmm. Here http://sourceforge.n...letext.cpp#l397 in line 628 you check whether color has changed. It has not changed. So you do nothing. Then in line 694 the line is converted into a string. I guess in this step the "color" chars are not replaced by spaces. So I would add an else branch to line 628 and replace the color char with space.
But is it according to the spec? Or does BBC deliver wrong subtitles?

Edited by betacentauri, 25 December 2014 - 10:36.

Xtrend ET-9200, ET-8000, ET-10000, OpenPliPC on Ubuntu 12.04

Re: Teletext subtitles - missing spaces #17 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 25 December 2014 - 10:46

I wouldn't ever accuse the BBC of being non-standard-adhering. They're usually very compliant.

 

Now I think of it, I think I "solved" the issue in "rewrapping" mode (which I use), but the issue may still be there in "non-rewrapping" mode. One of the problems is that you can't randomly replace attribute characters by spaces, because often more then one are sent in sequence. That would mean a string of spaces between two words which look ugly. The rewrap mode would drop the multi-space sequence, but non-rewrap keeps the spaces. So I guess the solution is twofold:

 

- replace every attribute character by a space

- compact multi-space sequences to one

 

Still I think it's prone to error to try to keep the original formatting at all, rendering it to a proportional font and then centering it. I think teletext should be overlayed exactly as-is (which is quite ugly but works), or all the visible text should be extracted from the page, loose all of the layout information and render it just like any other text (which is what the re-wrap code does).


Edited by Erik Slagter, 25 December 2014 - 10:46.

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Teletext subtitles - missing spaces #18 Stan

  • Senior Member
  • 312 posts

0
Neutral

Posted 25 December 2014 - 14:11

For a quick'n dirty fix I'd go with betacentauri's proposal.

 

Although it would be nice and more reliable to stick to the teletext standard and replace every contol character with space,..

In standard mode (no re-wrap) I would expect the original formatting (no matter how many spaces between words), discarding only leading and trailing spaces.



Re: Teletext subtitles - missing spaces #19 Erik Slagter

  • PLi® Core member
  • 46,951 posts

+541
Excellent

Posted 26 December 2014 - 10:50

But then    you would  get sentences     like this, which I think is ugly. Not only need the leading and trailing whitespace to be removed, also the multiple whitespaces between words. Either that, or 100% stick to the original layout, on a 40x22 grid.

 

I will happily apply a patch that will fix this in a nice way.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Teletext subtitles - missing spaces #20 Stan

  • Senior Member
  • 312 posts

0
Neutral

Posted 26 December 2014 - 11:59

There may be situations when multiple whitespaces are in there intentionally.

Besides, sequences of more than one control character between words are not common in subtitles..

 

One could always switch to "rewrap" mode, where multiple whitespaces are elimiinated.

 

But that's just my opinion, I'm quite confident to let you decide... :)


Edited by Stan, 26 December 2014 - 12:03.



0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users