Help need to convert bi-lingual files in sub-title format


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help need to convert bi-lingual files in sub-title format
# 1  
Old 08-05-2016
Help need to convert bi-lingual files in sub-title format

I have a large number of files in the standard subtitle format with the additional proviso that the files are bi-lingual i.e. English and a second language: in this case Hindi. A small sample is given below:
Code:
00 04 07 08
00 04 11 00
I mean very high fever...
He even vomited.
00 04 07 08
00 04 11 00
मैं कहता हूँ कि बहुत ही तेज   बुख़ार....
उन्हों ने उलटी भी कर दी थी  
00 04 11 08
00 04 14 00

Blood tests have also been done.
00 04 11 08
00 04 14 00

ख़ून की जाँच भी की है

What I need is as under
Code:
I mean very high fever... He even vomited.=मैं कहता हूँ कि बहुत ही तेज   बुख़ार....उन्हों ने उलटी भी कर दी थी

that the time code is deleted and the English text would be on one single line and the corresponding Hindi text be provided on the same line with equal to as a delimiter
I have written a macro to the job, but since the data is huge, a Perl or Awk script would run much faster.
Many thanks
A small sample for testing is provided below:
Code:
00 04 20 05
00 04 22 08

Can we see uncle now?
00 04 20 05
00 04 22 08

क्या हम मामाजी को अभी देख सकते हैं?
00 04 22 16
00 04 25 00
When it's impossible to see him
in day time, now at midnight...
00 04 22 16
00 04 25 00
उनको दिन में मिलना भी मुश्किल है,
तो रात के समय में.....
00 04 25 10
00 04 27 12
His accountant constantly
keeps guarding his room.
00 04 25 10
00 04 27 12
उन का मुनीम सतत 
उन के कमरे पर नजर रख रहा है  
00 04 27 20
00 04 30 08
When I got your telegram, I
thought he was serious.
00 04 27 20
00 04 30 08
ज   मुझे तुम्हारा तार मिला,
मैंने सोचा कि उन की हालत गंभीर है  
00 04 30 12
00 04 34 00
Why trouble us with ordinary
cold and fever?
00 04 30 12
00 04 34 00
मामूली सर्दी-  बुखार के लिए
हमें क्यों परेशान किया?
00 04 34 08
00 04 38 14
These days, one must be
careful even with a cold.
00 04 34 08
00 04 38 14
आज कल, मामूली सर्दी से भी
सावधान रहना चाहिए  
00 04 38 24
00 04 45 08
Don't complain later, if
anything untoward happens.
00 04 38 24
00 04 45 08
कुछ उल्टा-सीधा हो गया तो,
बाद में शिकायत मत करना   
00 04 45 18
00 04 47 08
Hasn't sister's family
arrived as yet?
00 04 45 18
00 04 47 08

अभी तक   हन का परिवार नहीं आया ?

# 2  
Old 08-05-2016
Please show your attempts.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 08-05-2016
The macro was written within the framework of a text editor: Ultraedit. It runs but is very slow and takes too long.
The file has a regular structure and the the logic of the macro is as under:
Delete the time-code i.e. the first two lines
Go to the next line. Go to the end and delete the hard return. This ensures that the English lines are now reduced to one single line.
Once again delete the time-code
Next repeat the same action for the Hind file
Now conjoin the the English file and the Hindi file with the equal to sign
This brings you back to the time-code of the next sub-title.
Save the macro and run it on the file
Here is the output
Code:
Screenplay, dialogue. A.K.Lohithadas.=पटकथा, संवाद ए. के. लोहिथादास  
Some vehicle has come. - Is it?=कोई गाड़ी आई है   -अच्छा?
I mean very high fever... He even vomited.=मैं कहता हूँ कि   हुत ही तेज   बुख़ार.... उन्हों ने उलटी भी कर दी थी  
Blood tests have also been done.= ख़ून की जाँच भी की है  
I don't know it's result. - Is he O.K. now?=उस के परिणाम का मुझे पता नहीं है   -क्या अ   वह ठीक है ?
Slightly better.= थोड़ा ठीक है

The only hitch was that it ran too slowly under UltraEdit and hence the request.
I am reproducing the macro below for what it's worth, since Macros in Ultraedit use their own logic:
Code:
InsertMode
ColumnModeOff
HexOff
StartSelect
Key DOWN ARROW
Key DOWN ARROW
EndSelect
Key DEL
Key END
" "
Key DEL
Key DOWN ARROW
Key HOME
StartSelect
Key DOWN ARROW
Key DOWN ARROW
EndSelect
Key DEL
Key END
" "
Key DEL
Key UP ARROW
Key END
"="
Key DEL
Key DOWN ARROW
Key HOME

# 4  
Old 08-05-2016
How about
Code:
awk '/[^0-9 ]|^$/{printf "%s%s", $0, (++CNT%2)?" ":CNT%4?"=":RS}' file

This User Gave Thanks to RudiC For This Post:
# 5  
Old 08-05-2016
Many thanks. I will try it and get back to you in case of a glitch. I will also try to understand the logic of the awk script.

---------- Post updated at 06:07 AM ---------- Previous update was at 06:04 AM ----------

It worked beautifully, and ripped through over 100,000 lines in around 10 seconds. Thanks a lot
# 6  
Old 08-05-2016
Or try
Code:
awk '(NR-1)%4>1 {printf "%s%s", $0, NR%2?" ":NR%8?"=":RS} ' file

This User Gave Thanks to RudiC For This Post:
# 7  
Old 08-06-2016
Am out. And replying from my phone. I will test it out and get back to you. Many thanks for both solutions

---------- Post updated 08-06-16 at 06:08 AM ---------- Previous update was 08-05-16 at 08:17 AM ----------

Sorry for the late reply. Due to heavy rains my broadband which is from a fixed line was down.
I have tested the alternative solution and it works just as well
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert text between exact matching patterns to Title case

Hi Folks, I have a large text file with multiple similar patterns on each line like: blank">PATTERN1 some word PATTERN2 title=">PATTERN1 some word PATTERN2 blank">PATTERN1 another word PATTERN2 title=">PATTERN1 another word PATTERN2 blank">PATTERN1 one more time PATTERN2 title=">PATTERN1... (10 Replies)
Discussion started by: martinsmith
10 Replies

2. Shell Programming and Scripting

Need a script for automation the convert a lot number audio files to another format

I have a lot number audio files in the MP3 proprietary format, I want to convert them to 'opus' the free and higher quality format, with keep metadata also. My selection command-line programs are SoX (Sound eXchange) for convert MP3 files to 'AIFF' format in order to keep quality and metadata*... (1 Reply)
Discussion started by: temp-usr
1 Replies

3. UNIX for Advanced & Expert Users

Shell script to convert words to Title case

Hi :) I have a .txt file with thousands of words. I was wondering if i could use a simple sed or awk command to convert / replace all words in the text file to Title Case format ? Example: from: this is line one this is line two this is line three to desired output: This Is Line... (8 Replies)
Discussion started by: martinsmith
8 Replies

4. Shell Programming and Scripting

perl module to convert xlsx format to xls format

Hi Folks, I have written a perl script that reads data from excel sheet(.xls) using Spreadsheet::ParseExcel module. But the problem is this module doesn't work for excel sheets with extension .xlsx. I have gone through Spreadsheet::XLSX module with which we can read from .xlsx file directly.... (1 Reply)
Discussion started by: giridhar276
1 Replies

5. Shell Programming and Scripting

convert files into csv format using perl

Hi all perl gurus, I need your help to get the desired output in perl. I have a file which has text in it in the format Connection request start timestamp = 12/08/2008 00:58:36.956700 Connect request completion timestamp = 12/08/2008 00:58:36.959729 Application idle time ... (10 Replies)
Discussion started by: azs0309
10 Replies

6. UNIX for Dummies Questions & Answers

To convert multi format file to a readable ascii format

Hi I have a file which has ascii , binary, binary decimal coded,decimal & hexadecimal data with lot of special characters (like öƒ.ƒ.„İİ¡Š·œƒ.„İİ¡Š· ) in it. I want to standardize the file into ASCII format & later use that as source . Can any one suggest a way a logic to convert such... (5 Replies)
Discussion started by: gaur.deepti
5 Replies

7. UNIX for Advanced & Expert Users

Shell script to convert to Title case

I need a shell script which will convert the given string to Title case. E.g "hi man" to "Hi man" (5 Replies)
Discussion started by: SankarV
5 Replies

8. UNIX for Dummies Questions & Answers

Convert UTF8 Format file to ANSI format

:confused: Hi i am trying to convert a file which is in UTF8 format to ANSI format i tried to use the function ICONV but it is throwing error Function i used it as $ iconv -f UTF8 -t ANSI filename Error iam getting is NOT Supported UTF8 to ANSI please some help me out on... (9 Replies)
Discussion started by: rajreddy
9 Replies

9. UNIX for Advanced & Expert Users

Convert UTF8 Format file to ANSI format

:) Hi i am trying to convert a file which is in UTF8 format to ANSI format i tried to use the function ICONV but it is throwing error Function i used it as $ iconv -f UTF8 -t ANSI filename Error iam getting is NOT Supported UTF8 to ANSI please some help me out on this.........Let me... (1 Reply)
Discussion started by: rajreddy
1 Replies
Login or Register to Ask a Question