Extract two file contents and consolidate in the third file

01-29-2010

Registered User

109, 1

Join Date: Jun 2009

Last Activity: 23 February 2016, 3:52 PM EST

Location: India

Posts: 109

Thanks Given: 40

Thanked 1 Time in 1 Post

Extract two file contents and consolidate in the third file

Hi all

I have two files (first_file.txt) and ( second_file.txt). The first file consists of sentences line by line given as :

Code:

Established in 1905 , Las Vegas officially became a city in 1911 . 
With the growth that followed , at the close of the century Las Vegas was the most populous American city founded in the 20th century ( a distinction held by Chicago in the 19th century ) . 
On the other hand, Las Vegas also has the highest number of churches per capita of any major U.S. city .

The second file contains details of each sentences in column fashion like: [There is a blank line in between two sentence details as given below]

Code:

ccomp (city -10, Established -1)
prep_in (Established -1, 1905 -3)
nn (Vegas -6, Las -5)
nsubj (city -10, Vegas -6)
advmod (city -10, officially -7)
cop (city -10, became -8)
det (city -10, a -9)
prep_in (city -10, 1911 -12)

det (growth -3, the -2)
prep_with (city -20, growth -3)
rel (followed -5, that -4)
rcmod (growth -3, followed -5)
det (close -9, the -8)
prep_at (followed -5, close -9)
det (century -12, the -11)
prep_of (close -9, century -12)
nn (Vegas -14, Las -13)
nsubj (city -20, Vegas -14)
cop (city -20, was -15)
det (city -20, the -16)
advmod (populous -18, most -17)
amod (city -20, populous -18)
amod (city -20, American -19)
partmod (city -20, founded -21)
det (century -25, the -23)
amod (century -25, 20th -24)
prep_in (founded -21, century -25)
det (distinction -28, a -27)
dep (century -25, distinction -28)
partmod (distinction -28, held -29)
agent (held -29, Chicago -31)
det (century -35, the -33)
amod (century -35, 19th -34)
prep_in (held -29, century -35)

det (hand -4, the -2)
amod (hand -4, other -3)
prep_on (has -9, hand -4)
nn (Vegas -7, Las -6)
nsubj (has -9, Vegas -7)
advmod (has -9, also -8)
det (number -12, the -10)
amod (number -12, highest -11)
dobj (has -9, number -12)
prep_of (number -12, churches -14)
prep_per (churches -14, capita -16)
det (city -21, any -18)
amod (city -21, major -19)
nn (city -21, U.S. -20)
prep_of (capita -16, city -21)

There are five columns separated by a blank space for each line in the second file e.g. considering the first line:

Code:

ccomp[column1] (city[column2] -10,[column3] Established[column4] -1)[coulmn5]

My problem is considering the sentences of the first file, search each word of the sentences of the first file for a match in the [column4] of the corresponding location of the second file (i.e., first sentence of first file with the first details of the second file and so on]. If the word matches with column4, then pick up the entry of the first column[column1] and then write output to a third file for each word such as:

column4|column1 column4|column1 column4|column1

If there is no match found of the word of sentences of the first file with the column4 of the second file, then write the output as:

WORD|empty

[Please note : WORD is the word which is not found in the column4 of the second file]

The expected output of the above example is

Code:

Established|ccomp in|empty 1905|prep_in ,| Las|nn Vegas|nsubj officially|advmod became|cop a|det city|empty in|empty 1911|prep_in .|
With|empty the|det growth|prep_with that|rel followed|rcmod ,| at|empty the|det close|prep_at of|empty the|det century|prep_of Las|nn Vegas|nsubj was|cop the|det most|advmod populous|amod American|amod city|empty founded|partmod in|empty the|det 20th|amod century|prep_in (| a|det distinction|dep held|partmod by|empty Chicago|agent in|empty the|det 19th|amod century|prep_in )| .|
On|empty the|det other|amod hand|prep_on ,| Las|nn Vegas|nsubj also|advmod has|empty the|det highest|amod number|dobj of|empty churches|prep_of per|empty capita|prep_per of|empty any|det major|amod U.S.|nn city|prep_of .|

I would like to write a Perl script on this problem. I have attached the sample files [first_file.txt and second_second.txt are the input files and my_final_output.txt is the expected output file]

I need urgent help.
Thanks in advance.

second_file.txt (1.4 KB)

my_final_output.txt (739 Bytes)

first_file.txt (366 Bytes)

Last edited by my_Perl; 01-29-2010 at 07:02 AM..

my_Perl

View Public Profile for my_Perl

Find all posts by my_Perl

01-29-2010

Registered User

232, 7

Join Date: May 2008

Last Activity: 13 December 2011, 4:12 AM EST

Posts: 232

Thanks Given: 1

Thanked 7 Times in 6 Posts

What have you tried and this does seems like a homework

dinjo_jo

View Public Profile for dinjo_jo

Find all posts by dinjo_jo

01-29-2010

Registered User

109, 1

Join Date: Jun 2009

Last Activity: 23 February 2016, 3:52 PM EST

Location: India

Posts: 109

Thanks Given: 40

Thanked 1 Time in 1 Post

My difficulty is the sentencewise handling from first_file.txt and its corresponding details which is in the second_file.txt.

my_Perl

View Public Profile for my_Perl

Find all posts by my_Perl

01-29-2010

Registered User

345, 21

Join Date: Feb 2008

Last Activity: 6 August 2013, 7:49 AM EDT

Posts: 345

Thanks Given: 0

Thanked 21 Times in 21 Posts

I don't understand the criteria you use to output 'With|' and 'at|'.

agn

View Public Profile for agn

Find all posts by agn

01-29-2010

Registered User

109, 1

Join Date: Jun 2009

Last Activity: 23 February 2016, 3:52 PM EST

Location: India

Posts: 109

Thanks Given: 40

Thanked 1 Time in 1 Post

Thanks agn. You are right. I have done the editing and made the correction.

my_Perl

View Public Profile for my_Perl

Find all posts by my_Perl

01-29-2010

Registered User

345, 21

Join Date: Feb 2008

Last Activity: 6 August 2013, 7:49 AM EDT

Posts: 345

Thanks Given: 0

Thanked 21 Times in 21 Posts

Try,

Code:

#!/usr/bin/perl

open FILE1, '<', buf1 or die "$!";
open FILE2, '<', buf2 or die "$!";
while (<FILE1>) {
	my @words = grep {$_ ne ' ' and $_ ne '' and $_ ne "\n" } split /([,.()]|(?: ))/;
	while (my $word = shift @words) {
	my $empty = 0;
		while (<FILE2>) {
			my ($col_1, $col_4) = (split)[0,3];
			if ($word eq $col_4) {
				$empty = 0;
				print "$word|$col_1 ";
				last;
			} else {
				$empty = 1;
			}
		}
		if ($empty) {
			if ($word =~ /^\w/) {
				print "$word|empty ";
			} else {
				print "$word| ";
			}
		}

		seek(FILE2,1,0);
	}
	print "\n";
}

close FILE1;
close FILE2;

I must mention that I got some help on that split regex.

agn

View Public Profile for agn

Find all posts by agn

01-29-2010

Registered User

109, 1

Join Date: Jun 2009

Last Activity: 23 February 2016, 3:52 PM EST

Location: India

Posts: 109

Thanks Given: 40

Thanked 1 Time in 1 Post

I got the following error.

Code:

Number found where operator expected at first_file.txt line 1, near "in 1911."
        (Do you need to predeclare in?)
Semicolon seems to be missing at first_file.txt line 1.
Bareword found where operator expected at first_file.txt line 2, near "20th"
        (Missing operator before th?)
Number found where operator expected at first_file.txt line 2, near "the 19"
        (Do you need to predeclare the?)
Bareword found where operator expected at first_file.txt line 2, near "19th"
        (Missing operator before th?)
syntax error at first_file.txt line 1, near "in 1911."
Execution of first_file.txt aborted due to compilation errors.

Last edited by my_Perl; 01-29-2010 at 06:49 AM..

my_Perl

View Public Profile for my_Perl

Find all posts by my_Perl

Shell Programming and Scripting

Extract two file contents and consolidate in the third file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script (sh file) logic to compare contents of one file with another file and output to file

Discussion started by: pottic

2. Shell Programming and Scripting

Extract both contents from a html file and do printing

Discussion started by: alvinoo

3. Shell Programming and Scripting

Consolidate several lines of a CSV file with firewall rules, in order to parse them easier?

Discussion started by: starriol

4. Shell Programming and Scripting

Folder contents getting appended as strings while redirecting file contents to a variable

Discussion started by: vivek d r

5. Shell Programming and Scripting

Consolidate 2 file in 1 file

Discussion started by: mirwasim

6. Shell Programming and Scripting

Replace partial contents of file with contents read from other file

Discussion started by: seeki

7. Shell Programming and Scripting

extract the contents from file to a new file

Discussion started by: wedng.bell

8. Shell Programming and Scripting

I want to delete the contents of a file which are matching with contents of other file

Discussion started by: ranasheel2000

9. UNIX for Dummies Questions & Answers

compare 2 file contents , if same delete 2nd file contents

Discussion started by: krishnampkkm

10. Shell Programming and Scripting

consolidate file in unix

Discussion started by: narang.mohit