Avidemux

Extracting DVD Subtitles

Software name : Avidemux
Software version : 2.4

If you want to extract subtitle files from a DVD you should understand a little how they work. Subtitles in DVDs are contained in VOB files along with the main video and audio streams. We can call them all streams here to account for the difference between a self contained file and a stream. Several streams can be included in a file.

The subtitles you see on a DVD are streams of images files which appear one after the other. Each stream displays a different language. When we extract these streams of subtitles the most handy format we can save them as is actually a text file which has the timecode of when the text appears. If the subtitle file you have is in text rather than image format it makes it easier to edit it and translate it. You can easily send that file via the internet or put it on a website for others to download.

In order to create a text-based subtitle file we first need to extract the images files from the DVD to two files:

  1. an *.idx file which has the time code of the image subtitles (this is called a VobSub file)
  2. and a *.sub file and contains the image information. 

We can then convert those files into a single text based subtitle file. There are many different formats but Avidemux uses a very compatible one with the '.srt' extention.

note : Screenshots in the following explanation are a combination of Ubuntu (Linux) and Windows operating systems. Avidemux works well in both and the interface looks the same except for a few color differences.

Extracting to an idx / VobSub file

From the Tools menu select 'VOB' and then 'VobSub' 

avi_demux_xdvd_1

Then you should see the following screen asking you to Browse for three things.

  1. VOB file(s)
  2. IFO file
  3. VobSub file

avi_demux_xdvd_2

Finding the VOB Files

When you click on the first Browse button in the above image we are asked to browse for the VOB files :

browsevob

However sometimes it's not that clear where they are. The files we want are in a folder on the DVD (if you are doing this for files on a DVD) called VIDEO_TS folder.

Normally for a short film there is only one VOB file with video data in it. For longer films there is normally more than one, because there is a maximum file size for the VOB files.

Let's have a look at a complicated DVD structure. There are some small entries in the structure which are system files and files for the menu - we should ignore these. The files with the video, audio and subtitle files we need are the big ones. They start with names like VTS_02_1.VOB,VTS_02_2.VOB, VTS_02_3.VOB, VTS_02_4.VOB. If you click 'Browse' next to 'VOB Files' then you should browse to the appropriate directory ('VIDEO_TS') and you should see something like this :

avi_demux_xdvd_3

For this task we need to select the first big VOB which in this case is VTS_02_1.VOB. The ones following it will be selected automatically. When you have selected the right one click on 'open' :

open

Locating the IFO file

If you click on the second button :

button2

you will be asked to look for the IFO file. The IFO file has information on what language the different subtitle streams are, so we need to browse to find this file. If there is more that one IFO file in the DVD we need find the one that has the same beginning as the large VOB files. In this case it is VTS_02_0.IFO

When you have found it click on 'open' :

open

Select where to save the VobSub files

The third button :

button3

will ask you to browse for a place to save the VobSub file. When you have found the right directory write the name of it in the box next to 'Name:' and make sure it ends with '.idx'. The below is an example (you can use any name, 'subs' is just my example) :

subsname

When you have done this, and if the other three boxes are complete, then press 'Save' :

save 

Saving your files

When you have found or selected all the files. Then click 'OK' to shut the small window with the small buttons :

ok

and you'll get a window telling you how long the process will take.

avi_demux_xdvd_5

When this process is complete you will have created a new .idx file and and new .sub file. These will be saved in the directory you choose for saving the .idx file. In my case I saved them to the desktop :

idxsub3  

Making the '.srt' File

Now we want to merge the idx file and the .sub file into a '.srt' file. Click on the top menu 'Tools' and then 'OCR (VobSub -> Srt)':

avi_demux_xdvd_6

You should see a window titled 'MiniOCR'. 

avi_demux_xdvd_7

Click on the 'Open' button under 'VobSub'. You will then see a window called 'VobSub Settings'.

avi_demux_xdvd_8

Click on 'Select .idx' and browse for and select the idx file you created in the 'Extracting to an idx / VobSub file' section.

avi_demux_xdvd_9

Click on 'Open' when you have selected the idx file. You should return to the 'VobSub Settings' window :

miniocr

If the DVD you are using has more than one language it should be displayed in the 'Select Language' drop down box. Select the language you want to create a subtitle file for.

avi_demux_xdvd_10

When you have the right language selected click 'OK', and you should return to the 'MiniOCR' window. Now you need to select a place on your computer to save the target *.srt file to. Click on the 'Save' button in the 'Output srt' section :

ocrsave

You will see a window asking you to choose a folder to save the srt file in.

avi_demux_xdvd_11'

Browse until you find the right place. When you have, give the file a name by typing in a name in the box at the top

name

make sure the name ends in '.srt' and then click 'Save'

save_1

Now you have set your input and output files you can start the process of converting the images file in to a text file. This process is called OCR. Click 'Start OCR'.

start 

You should see a window like this: 

avi_demux_xdvd_13

The OCR (Optical Character Recognition) process needs you to tell it what the characters (letters and numbers + symbols) in the subtitles are. It will display a character from the image subtitle and you have to then tell the application what the corresponding text character is. Avidemux will show you a phrase and one character for that phrase like this:

subs 

Now you must type the right character in the empty text field.

subs2

You do this because it is more accurate for you to specific exactly what the characters are than for the application to guess.

Where it says 'Current Glyph Text:' and shows an image of a character you need to enter that character using the keyboard in the box below and then click 'OK'. It does make a difference if it is a capital letter or a lower case letter. Also this process is very unforgiving at the moment. There is no undo option, so don't get it wrong!­

Sometimes 2 characters well be selected. You should enter those two characters and click enter. This may seem to be taking a long time but when you have entered all the characters and numbers the program should fly through the subtitles. You should be able to process a 90 minute film in 5 -10 minutes.

When you are finished the '.srt' file you saved will have the right ­timecode and subtitle information in it. You can open it with a text editor and it should look something like this:

1
00:00:10,991 --> 00:00:13,991
 Man:
 Mick Jagger
2
00:00:18,565 --> 00:00:21,565
 - Mick Jagger
 - Thank you
3
00:00:32,479 --> 00:00:35,479
 - Man: Mick Jagger.
 - ( police radio squelch )
4
00:01:04,778 --> 00:01:06,011
 Man:
 one minute! one minute!