Replacing lines between two files with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Replacing lines between two files with awk
# 1  
Old 04-18-2009
MySQL Replacing lines between two files with awk

Hello Masters,

I have two subtitles file with different language like below

First file :
Code:
1
00:00:41,136 --> 00:00:43,900
[<i># Underdog theme</i>]

2
00:00:55,383 --> 00:00:58,477
<i>[man] Ladies and gentlemen,</i>
<i>this is Simon Barsinister,</i>

3
00:00:58,553 --> 00:01:00,521
<i>the wickedest man in the world.</i>

4
00:01:00,588 --> 00:01:02,021
<i>He was evil and crazy.</i>

5
00:01:02,090 --> 00:01:06,026
<i>Simon and his wacky henchman, Cad,</i>
<i>schemed to rule the universe.</i>

6
00:01:06,094 --> 00:01:08,289
<i>But each time they were foiled by me,</i>

Second file :

Code:
1
00:00:35,060 --> 00:00:37,708
*** UNDERDOG ***

2
00:00:48,714 --> 00:00:51,668
Dame in gospodje,
to je Simon Barsinister,

3
00:00:51,745 --> 00:00:53,625    
najzlobnejši človek
na svetu.

4
00:00:53,701 --> 00:00:55,084
Bil je zloben in blazen.

5
00:00:55,160 --> 00:00:58,918
Simon in njegov sluga Cad
sta spletkarila proti univerzi.

6
00:00:58,994 --> 00:01:01,106
Ampak vedno sem jima
načrte prekrižal jaz,

I want to overwrite lines that contains time on first file with the apropriate lines from the second file so the final subtitles file will look like this :

Code:
1
00:00:35,060 --> 00:00:37,708
[<i># Underdog theme</i>]

2
00:00:48,714 --> 00:00:51,668
<i>[man] Ladies and gentlemen,</i>
<i>this is Simon Barsinister,</i>

3
00:00:51,745 --> 00:00:53,625    
<i>the wickedest man in the world.</i>

4
00:00:53,701 --> 00:00:55,084
<i>He was evil and crazy.</i>

5
00:00:55,160 --> 00:00:58,918
<i>Simon and his wacky henchman, Cad,</i>
<i>schemed to rule the universe.</i>

6
00:00:58,994 --> 00:01:01,106
<i>But each time they were foiled by me,</i>

How to do it with awk/gawk ?

TIA.
# 2  
Old 04-18-2009
Perl alternative
Code:
my %f2;
open(F2,"<","file2") or die "Cannot open file2: $!\n";
while ( <F2> ){chomp; $f2{++$d}=$_ if /-->/;}
close(F2);
open(F1,"<","file1") or die "Cannot open file1: $!\n";
while ( <F1> ){ chomp;  print /-->/ ? $f2{++$e}."\n" : $_."\n"; }

# 3  
Old 04-18-2009
Another way with awk:

Code:
awk 'FNR==NR{ if( /^[0-9][0-9]:/ ) a[++c]=$0; next }
     /[0-9]:/ && /-->/{print a[++n]; next}1' file2 file1

# 4  
Old 04-18-2009
Multiline records

The code proposed by Rubin should work almost always, but it does not work when by chance a particular dialog line is like a time line.

I think the right way to cope with the problem is to realize that the input files are in fact composed of multiline records, where the record separator is one ore more blank lines and the field separator is the newline.
In such a structure the time line is always the second field.

To make GNU awk deal with multiline records it is required to properly set some built-in variables, as explained here: Multiple Line - The GNU Awk User's Guide

This is my proposal for a program p.awk
Code:
BEGIN { RS = "" ; FS = "\n" ; ORS = "\n\n" ; OFS = "\n" }
FNR == 1 { fn++ } # track the file number
fn == 1 {     # if in file 1
  a[FNR] = $2 # then save field 2
  }
fn == 2 {     # if in file 2
  $2 = a[FNR] # then overwrite field 2 with same field from file 1
  print $0
  }

The command line is as follows:
Code:
awk -f p.awk second.txt first.txt > result.txt

# 5  
Old 04-18-2009
Quote:
Originally Posted by colemar
The code proposed by Rubin should work almost always, but it does not work when by chance a particular dialog line is like a time line.
...
Good point, it might happen ..., based on the OP's actual files, one file seems to have sentences in one language, and the other file their respective english translations, so the chances of having a double time line, I think are small.

Quote:
In such a structure the time line is always the second field.
What happens if there are other records in between the first field and the time line ( not always a fixed field ) ? Maybe, this is not the case, but what if the records are not multilined ?
Codes can also be modified again to fit a particular situation ..., anyway I think the OP has a few options to choose from Smilie.

Last edited by rubin; 04-18-2009 at 08:51 PM.. Reason: record -> field
# 6  
Old 04-19-2009
Quote:
Originally Posted by rubin
What happens if there are other records in between the first field and the time line ( not always a fixed field ) ? Maybe, this is not the case, but what if the records are not multilined ?
I can't understand your argument... but I believe the input files as given are some standard subtitle format whose name I can't remember.
So the input can be safely characterized by stating that:
  • there are groups of three or more lines, and the groups are delimited by at least one blank line
  • the second line of any group always represents the time of the dialog
# 7  
Old 04-19-2009
Quote:
Originally Posted by colemar
I can't understand your argument...
If my argument wasn't understood, I'd better wait for the OP's response and let him state that the timelines are duplicated somewhere in the records, and modify the codes accordingly.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replacing matched patterns in multiple files with awk

Hello all, I have since given up trying to figure this out and used sed instead, but I am trying to understand awk and was wondering how someone might do this in awk. I am trying to match on the first field of a specific file with the first field on multiple files, and append the second field... (2 Replies)
Discussion started by: karlmalowned
2 Replies

2. Shell Programming and Scripting

Replacing lines matching a multi-line pattern (sed/perl/awk)

Dear Unix Forums, I am hoping you can help me with a pattern matching problem. What am I trying to do? I want to replace multiple lines of a text file (that match a multi-line pattern) with a single line of text. These patterns can span several lines and do not always have the same number of... (10 Replies)
Discussion started by: thefang
10 Replies

3. Shell Programming and Scripting

Summing over specific lines and replacing the lines with the sum

Hi friends, This is sed & awk type question. It is slightly different from my previous question. I have a text file which has numbers spread all over the file. I want to sum the series of numbers (but no more than 10 numbers in series) whenever i find it and produce an output file with the... (4 Replies)
Discussion started by: kaaliakahn
4 Replies

4. Shell Programming and Scripting

Summing over specific lines and replacing the lines with the sum using sed, awk

Hi friends, This is sed & awk type question. I have a text file which has numbers spread all over the file. I want to sum the series of numbers whenever i find it and produce an output file with the sum. For example ###start of input text file #### abc def ghi 1 2 3 4 kjld random... (3 Replies)
Discussion started by: kaaliakahn
3 Replies

5. UNIX for Dummies Questions & Answers

Finding lines with a regular expression, replacing them with blank lines

So the tag for this forum says all newbies welcome... All I want to do is go through my file and find lines which contain a given string of characters then replace these with a blank line. I really tried to find a simple command to do this but failed. Here's what I did come up with though: ... (2 Replies)
Discussion started by: Golpette
2 Replies

6. Shell Programming and Scripting

Help in replacing two blank lines with two lines of diff data

Hi.. I'm facing a trouble in replacing two blank lines in a file using shell script... I used sed to search a line and insert two blank lines after the searchd line using the following sed command. sed "/data/{G;G;}/" filename . In the file, after data tag, two lines got inserted blank lines..... (4 Replies)
Discussion started by: arjun_arippa
4 Replies

7. Shell Programming and Scripting

inserting and replacing lines with awk

Hello, I need to insert varying lines (i.e. these lines are an output of another script) between lines starting with certain fields. An example to make it more clear. This is the file where I wanna insert lines: (save it as "input.txt") ContrInMi_c_mir 2 10066 181014 200750... (12 Replies)
Discussion started by: tempestas
12 Replies

8. UNIX for Dummies Questions & Answers

best method of replacing multiple strings in multiple files - sed or awk? most simple preferred :)

Hi guys, say I have a few files in a directory (58 text files or somthing) each one contains mulitple strings that I wish to replace with other strings so in these 58 files I'm looking for say the following strings: JAM (replace with BUTTER) BREAD (replace with CRACKER) SCOOP (replace... (19 Replies)
Discussion started by: rich@ardz
19 Replies

9. Shell Programming and Scripting

replacing new lines in all files of a directory containing old lines

Hi all, I am trying to replace a few lines with other lines of all files in a directory which contain those few lines. say - there are some 10 files in a dir having the same 4 lines as 1.txt at the starting 1.txt line 1 line 2 line 3 line 4 ....................................... (1 Reply)
Discussion started by: rooster005
1 Replies

10. Shell Programming and Scripting

Replacing lines in text files

Hi, I have 2 sets of text files. I need to take a field from a certain line in set 1 and put it in the same place in set b. The line appears once per file, in different places but is a set format and has the unique word "ANTENNA" in it and is always 81 characters long. Example from set a: ... (7 Replies)
Discussion started by: Jonny2Vests
7 Replies
Login or Register to Ask a Question