Springen naar inhoud


Foto

Teletext subtitles - missing spaces

teletext subtitles

  • Please log in to reply
Er zijn 56 reacties in dit onderwerp

#1 Stan

  • Senior Member
  • 346 berichten

0
Neutral

Geplaatst op 24 december 2014 - 13:36

With teletext subtitles, sometimes the spaces between words are missing.

 

Bijlage  Sub_TTX_02.jpg   49,36K   4 Aantal bijlagen

 

The issue has been mentioned by Erik some time ago in another thread.

 

[...]

Sometimes many words in a sub get concatenated without whitespace, sometimes part of the first and last character on a (long) are cropped, sometimes a line gets wrapped at an illogical place, the next word is then in dark grey ink, and the rest of the line is okay. I've seen this only on BBC. Looks like some control character is interpreted incorrectly.

 

It seems to occur when lines with different colours are displayed...

 



Re: Teletext subtitles - missing spaces #2 littlesat

  • PLi® Core member
  • 56432 berichten

+692
Excellent

Geplaatst op 24 december 2014 - 13:50

I do not see different colors there.... But a log of enigma2 could be helpful here....

 

Here on Dutch TV we do not see this happening....


Veranderd door littlesat, 24 december 2014 - 13:50

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Teletext subtitles - missing spaces #3 ims

  • PLi® Core member
  • 13626 berichten

+212
Excellent

Geplaatst op 24 december 2014 - 13:52

here it works well too... (tried on recorded movies)


Kdo nic nedělá, nic nezkazí!

Re: Teletext subtitles - missing spaces #4 Stan

  • Senior Member
  • 346 berichten

0
Neutral

Geplaatst op 24 december 2014 - 14:41

Yesterday I had it on a Freesat channel (itv2). I did record it.

 

It is strange because on the teletext page itself (888) it looks fine.

 

@littlesat

I could not find the output of the subtitles content in the enigma2 logfile. What are we looking for?



Re: Teletext subtitles - missing spaces #5 littlesat

  • PLi® Core member
  • 56432 berichten

+692
Excellent

Geplaatst op 24 december 2014 - 14:43

It should be investigated why these spaces are removed.... wierd...


WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Teletext subtitles - missing spaces #6 Erik Slagter

  • PLi® Core member
  • 46960 berichten

+541
Excellent

Geplaatst op 24 december 2014 - 16:13

It's something that only shows on BBC (for me). I have changed some of the colour subtitle code which helped a lot, but it still appears now and then. I can't explain it. A slight possibility is that two lines are joined (do you have re-formatting enabled?) and no space is inserted, where it should.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Teletext subtitles - missing spaces #7 Stan

  • Senior Member
  • 346 berichten

0
Neutral

Geplaatst op 24 december 2014 - 17:29

The option "Rewrap teletext subtitles" set to "no", so I guess re-formatting is disabled.



Re: Teletext subtitles - missing spaces #8 Erik Slagter

  • PLi® Core member
  • 46960 berichten

+541
Excellent

Geplaatst op 24 december 2014 - 18:32

Okay good to know, so we can rule that out. Do you see it often? I only notice it very rarely now.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Teletext subtitles - missing spaces #9 Stan

  • Senior Member
  • 346 berichten

0
Neutral

Geplaatst op 24 december 2014 - 19:59

I had a look at the subtitle teletext page 888 (same scene from first post).

 

Bijlage  Page888_02.txt   1000bytes   19 Aantal bijlagen

 

Between the words there is a control character (colour definition).

 

The problem is, that this character is being removed without getting replaced by a space.

 



Re: Teletext subtitles - missing spaces #10 littlesat

  • PLi® Core member
  • 56432 berichten

+692
Excellent

Geplaatst op 25 december 2014 - 00:03

Pare you able to give the complete delivered string here?

WaveFrontier 28.2E | 23.5E | 19.2E | 16E | 13E | 10/9E | 7E | 5E | 1W | 4/5W | 15W


Re: Teletext subtitles - missing spaces #11 ims

  • PLi® Core member
  • 13626 berichten

+212
Excellent

Geplaatst op 25 december 2014 - 00:13

I had a look at the subtitle teletext page 888 (same scene from first post).

 

attachicon.gifPage888_02.txt

 

Between the words there is a control character (colour definition).

 

The problem is, that this character is being removed without getting replaced by a space.

provider saving chars ?


Kdo nic nedělá, nic nezkazí!

Re: Teletext subtitles - missing spaces #12 Stan

  • Senior Member
  • 346 berichten

0
Neutral

Geplaatst op 25 december 2014 - 00:23

@littlesat

The string is in the file Page_888_02.txt posted before. Just open the file with a Hex Editor.



Re: Teletext subtitles - missing spaces #13 ims

  • PLi® Core member
  • 13626 berichten

+212
Excellent

Geplaatst op 25 december 2014 - 01:03

0x02

Bijgevoegde Bestanden


Kdo nic nedělá, nic nezkazí!

Re: Teletext subtitles - missing spaces #14 ims

  • PLi® Core member
  • 13626 berichten

+212
Excellent

Geplaatst op 25 december 2014 - 02:18

weird, it seems as dropped b3-b0 ... but in space in sentence only  :D ( 0x20 ... 0010 0000b   ... dropped b0-b3  => 0010 = 0x02 )


Kdo nic nedělá, nic nezkazí!

Re: Teletext subtitles - missing spaces #15 Erik Slagter

  • PLi® Core member
  • 46960 berichten

+541
Excellent

Geplaatst op 25 december 2014 - 10:22

The problem is, that this character is being removed without getting replaced by a space.

Yes that is what I expected. On a teletext page the (colour) attribute characters also occupy a space (the frame buffer is very strict 40x22). So the BBC uses them as delimiter just like space would do. On the other hand, I was under the impression I handled this case correctly. But if some can find the bug please go ahead ;)


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Teletext subtitles - missing spaces #16 betacentauri

  • PLi® Core member
  • 7185 berichten

+323
Excellent

Geplaatst op 25 december 2014 - 10:35

Mhmm. Here http://sourceforge.n...letext.cpp#l397 in line 628 you check whether color has changed. It has not changed. So you do nothing. Then in line 694 the line is converted into a string. I guess in this step the "color" chars are not replaced by spaces. So I would add an else branch to line 628 and replace the color char with space.
But is it according to the spec? Or does BBC deliver wrong subtitles?

Veranderd door betacentauri, 25 december 2014 - 10:36

Xtrend ET-9200, ET-8000, ET-10000, OpenPliPC on Ubuntu 12.04

Re: Teletext subtitles - missing spaces #17 Erik Slagter

  • PLi® Core member
  • 46960 berichten

+541
Excellent

Geplaatst op 25 december 2014 - 10:46

I wouldn't ever accuse the BBC of being non-standard-adhering. They're usually very compliant.

 

Now I think of it, I think I "solved" the issue in "rewrapping" mode (which I use), but the issue may still be there in "non-rewrapping" mode. One of the problems is that you can't randomly replace attribute characters by spaces, because often more then one are sent in sequence. That would mean a string of spaces between two words which look ugly. The rewrap mode would drop the multi-space sequence, but non-rewrap keeps the spaces. So I guess the solution is twofold:

 

- replace every attribute character by a space

- compact multi-space sequences to one

 

Still I think it's prone to error to try to keep the original formatting at all, rendering it to a proportional font and then centering it. I think teletext should be overlayed exactly as-is (which is quite ugly but works), or all the visible text should be extracted from the page, loose all of the layout information and render it just like any other text (which is what the re-wrap code does).


Veranderd door Erik Slagter, 25 december 2014 - 10:46

* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Teletext subtitles - missing spaces #18 Stan

  • Senior Member
  • 346 berichten

0
Neutral

Geplaatst op 25 december 2014 - 14:11

For a quick'n dirty fix I'd go with betacentauri's proposal.

 

Although it would be nice and more reliable to stick to the teletext standard and replace every contol character with space,..

In standard mode (no re-wrap) I would expect the original formatting (no matter how many spaces between words), discarding only leading and trailing spaces.



Re: Teletext subtitles - missing spaces #19 Erik Slagter

  • PLi® Core member
  • 46960 berichten

+541
Excellent

Geplaatst op 26 december 2014 - 10:50

But then    you would  get sentences     like this, which I think is ugly. Not only need the leading and trailing whitespace to be removed, also the multiple whitespaces between words. Either that, or 100% stick to the original layout, on a 40x22 grid.

 

I will happily apply a patch that will fix this in a nice way.


* Wavefrontier T90 with 28E/23E/19E/13E via SCR switches 2 x 2 x 6 user bands
I don't read PM -> if you have something to ask or to report, do it in the forum so others can benefit. I don't take freelance jobs.
Ik lees geen PM -> als je iets te vragen of te melden hebt, doe het op het forum, zodat anderen er ook wat aan hebben.


Re: Teletext subtitles - missing spaces #20 Stan

  • Senior Member
  • 346 berichten

0
Neutral

Geplaatst op 26 december 2014 - 11:59

There may be situations when multiple whitespaces are in there intentionally.

Besides, sequences of more than one control character between words are not common in subtitles..

 

One could always switch to "rewrap" mode, where multiple whitespaces are elimiinated.

 

But that's just my opinion, I'm quite confident to let you decide... :)


Veranderd door Stan, 26 december 2014 - 12:03



1 gebruiker(s) lezen dit onderwerp

0 leden, 1 bezoekers, 0 anonieme gebruikers