Extract two file contents and consolidate in the third file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract two file contents and consolidate in the third file
# 1  
Old 01-29-2010
Extract two file contents and consolidate in the third file

Hi all

I have two files (first_file.txt) and ( second_file.txt). The first file consists of sentences line by line given as :

Code:
Established in 1905 , Las Vegas officially became a city in 1911 . 
With the growth that followed , at the close of the century Las Vegas was the most populous American city founded in the 20th century ( a distinction held by Chicago in the 19th century ) . 
On the other hand, Las Vegas also has the highest number of churches per capita of any major U.S. city .

The second file contains details of each sentences in column fashion like: [There is a blank line in between two sentence details as given below]

Code:
ccomp (city -10, Established -1)
prep_in (Established -1, 1905 -3)
nn (Vegas -6, Las -5)
nsubj (city -10, Vegas -6)
advmod (city -10, officially -7)
cop (city -10, became -8)
det (city -10, a -9)
prep_in (city -10, 1911 -12)

det (growth -3, the -2)
prep_with (city -20, growth -3)
rel (followed -5, that -4)
rcmod (growth -3, followed -5)
det (close -9, the -8)
prep_at (followed -5, close -9)
det (century -12, the -11)
prep_of (close -9, century -12)
nn (Vegas -14, Las -13)
nsubj (city -20, Vegas -14)
cop (city -20, was -15)
det (city -20, the -16)
advmod (populous -18, most -17)
amod (city -20, populous -18)
amod (city -20, American -19)
partmod (city -20, founded -21)
det (century -25, the -23)
amod (century -25, 20th -24)
prep_in (founded -21, century -25)
det (distinction -28, a -27)
dep (century -25, distinction -28)
partmod (distinction -28, held -29)
agent (held -29, Chicago -31)
det (century -35, the -33)
amod (century -35, 19th -34)
prep_in (held -29, century -35)

det (hand -4, the -2)
amod (hand -4, other -3)
prep_on (has -9, hand -4)
nn (Vegas -7, Las -6)
nsubj (has -9, Vegas -7)
advmod (has -9, also -8)
det (number -12, the -10)
amod (number -12, highest -11)
dobj (has -9, number -12)
prep_of (number -12, churches -14)
prep_per (churches -14, capita -16)
det (city -21, any -18)
amod (city -21, major -19)
nn (city -21, U.S. -20)
prep_of (capita -16, city -21)

There are five columns separated by a blank space for each line in the second file e.g. considering the first line:

Code:
ccomp[column1] (city[column2] -10,[column3] Established[column4] -1)[coulmn5]

My problem is considering the sentences of the first file, search each word of the sentences of the first file for a match in the [column4] of the corresponding location of the second file (i.e., first sentence of first file with the first details of the second file and so on]. If the word matches with column4, then pick up the entry of the first column[column1] and then write output to a third file for each word such as:

column4|column1 column4|column1 column4|column1

If there is no match found of the word of sentences of the first file with the column4 of the second file, then write the output as:

WORD|empty

[Please note : WORD is the word which is not found in the column4 of the second file]


The expected output of the above example is

Code:
Established|ccomp in|empty 1905|prep_in ,| Las|nn Vegas|nsubj officially|advmod became|cop a|det city|empty in|empty 1911|prep_in .|
With|empty the|det growth|prep_with that|rel followed|rcmod ,| at|empty the|det close|prep_at of|empty the|det century|prep_of Las|nn Vegas|nsubj was|cop the|det most|advmod populous|amod American|amod city|empty founded|partmod in|empty the|det 20th|amod century|prep_in (| a|det distinction|dep held|partmod by|empty Chicago|agent in|empty the|det 19th|amod century|prep_in )| .|
On|empty the|det other|amod hand|prep_on ,| Las|nn Vegas|nsubj also|advmod has|empty the|det highest|amod number|dobj of|empty churches|prep_of per|empty capita|prep_per of|empty any|det major|amod U.S.|nn city|prep_of .|

I would like to write a Perl script on this problem. I have attached the sample files [first_file.txt and second_second.txt are the input files and my_final_output.txt is the expected output file]

I need urgent help.
Thanks in advance.

Last edited by my_Perl; 01-29-2010 at 07:02 AM..
# 2  
Old 01-29-2010
What have you tried and this does seems like a homework
# 3  
Old 01-29-2010
My difficulty is the sentencewise handling from first_file.txt and its corresponding details which is in the second_file.txt.
# 4  
Old 01-29-2010
I don't understand the criteria you use to output 'With|' and 'at|'.
# 5  
Old 01-29-2010
Thanks agn. You are right. I have done the editing and made the correction.
# 6  
Old 01-29-2010
Try,

Code:
#!/usr/bin/perl

open FILE1, '<', buf1 or die "$!";
open FILE2, '<', buf2 or die "$!";
while (<FILE1>) {
	my @words = grep {$_ ne ' ' and $_ ne '' and $_ ne "\n" } split /([,.()]|(?: ))/;
	while (my $word = shift @words) {
	my $empty = 0;
		while (<FILE2>) {
			my ($col_1, $col_4) = (split)[0,3];
			if ($word eq $col_4) {
				$empty = 0;
				print "$word|$col_1 ";
				last;
			} else {
				$empty = 1;
			}
		}
		if ($empty) {
			if ($word =~ /^\w/) {
				print "$word|empty ";
			} else {
				print "$word| ";
			}
		}

		seek(FILE2,1,0);
	}
	print "\n";
}

close FILE1;
close FILE2;

I must mention that I got some help on that split regex.
# 7  
Old 01-29-2010
I got the following error.

Code:
Number found where operator expected at first_file.txt line 1, near "in 1911."
        (Do you need to predeclare in?)
Semicolon seems to be missing at first_file.txt line 1.
Bareword found where operator expected at first_file.txt line 2, near "20th"
        (Missing operator before th?)
Number found where operator expected at first_file.txt line 2, near "the 19"
        (Do you need to predeclare the?)
Bareword found where operator expected at first_file.txt line 2, near "19th"
        (Missing operator before th?)
syntax error at first_file.txt line 1, near "in 1911."
Execution of first_file.txt aborted due to compilation errors.


Last edited by my_Perl; 01-29-2010 at 06:49 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script (sh file) logic to compare contents of one file with another file and output to file

Shell script logic Hi I have 2 input files like with file 1 content as (file1) "BRGTEST-242" a.txt "BRGTEST-240" a.txt "BRGTEST-219" e.txt File 2 contents as fle(2) "BRGTEST-244" a.txt "BRGTEST-244" b.txt "BRGTEST-231" c.txt "BRGTEST-231" d.txt "BRGTEST-221" e.txt I want to get... (22 Replies)
Discussion started by: pottic
22 Replies

2. Shell Programming and Scripting

Extract both contents from a html file and do printing

Hi there, Print IP Address: grep 'HostID :' 10.244.9.124\ nessus.html | awk -F '<br>' '{print $12}' | tr -s ' ' | awk -F ':' '{print "<tr><td>" $2 "</td><td>"}' Print Respective Ports: grep 'classsubsection\|./tcp\|./udp' 10.244.9.124\ nessus.html | grep -v 'h2.classsubsection... (3 Replies)
Discussion started by: alvinoo
3 Replies

3. Shell Programming and Scripting

Consolidate several lines of a CSV file with firewall rules, in order to parse them easier?

Consolidate several lines of a CSV file with firewall rules Hi guys. I have a CSV file, which I created using an HTML export from a Check Point firewall policy. Each rule is represented as several lines, in some cases. That occurs when a rule has several address sources, destinations or... (4 Replies)
Discussion started by: starriol
4 Replies

4. Shell Programming and Scripting

Folder contents getting appended as strings while redirecting file contents to a variable

Hi one of the output of the command is as below # sed -n "/CCM-ResourceHealthCheck:/,/---------/{/CCM-ResourceHealthCheck:/d;/---------/d;p;}" Automation.OutputZ$zoneCounter | sed 's/$/<br>/' Resource List : <br> *************************** 1. row ***************************<br> ... (2 Replies)
Discussion started by: vivek d r
2 Replies

5. Shell Programming and Scripting

Consolidate 2 file in 1 file

Hi I have to file in with 1 similar colomn. from both files i want similar values from column1 and col2 from file1 and col2 from file2 in file 3 file1 colomn 1 colomn2 rmoved (8 Replies)
Discussion started by: mirwasim
8 Replies

6. Shell Programming and Scripting

Replace partial contents of file with contents read from other file

Hi, I am facing issue while reading data from a file in UNIX. my requirement is to compare two files and for the text pattern matching in the 1st file, replace the contents in second file by the contents of first file from start to the end and write the contents to thrid file. i am able to... (2 Replies)
Discussion started by: seeki
2 Replies

7. Shell Programming and Scripting

extract the contents from file to a new file

Hi i would like to extract some part of the file to a new file. first I would like to search for a string in the file and then i want two or three line above from the string to the the required string(can be any character) I would like this to be made generalized in a script Thanks (5 Replies)
Discussion started by: wedng.bell
5 Replies

8. Shell Programming and Scripting

I want to delete the contents of a file which are matching with contents of other file

Hi, I want to delete the contents of a file which are matching with contents of other file in shell scripting. Ex. file1 sheel,sumit,1,2,3,4,5,6,7,8 sumit,rana,2,3,4,5,6,7,8,9 grade,pass,2,3,4,5,6,232,1,1 name,sur,33,1,4,12,3,5,6,8 sheel,pass,2,3,4,5,6,232,1,1 File2... (3 Replies)
Discussion started by: ranasheel2000
3 Replies

9. UNIX for Dummies Questions & Answers

compare 2 file contents , if same delete 2nd file contents

Give shell script....which takes two file names as input and compares the contents, is both are same delete second file's contents..... I try with "diff"...... but confusion how to use "diff" with if ---else Thanking you (5 Replies)
Discussion started by: krishnampkkm
5 Replies

10. Shell Programming and Scripting

consolidate file in unix

hi, i am trying to consolodate the files in the unix with the '>>' i have some 50 or 60 files.is there any another way of consolidating the alll 50 or 60 files in to one file. actually the way i m doing creaating the problem while loading the file with teradat tpump and fasload. so if there is... (3 Replies)
Discussion started by: narang.mohit
3 Replies
Login or Register to Ask a Question