The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. Shell Script Page.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
compare 2 files.. amon Shell Programming and Scripting 8 4 Weeks Ago 07:34 AM
compare two files charandevu Shell Programming and Scripting 7 03-30-2008 12:20 PM
Compare files kharen11 UNIX for Advanced & Expert Users 25 03-14-2007 01:35 AM
compare files and beyond MizzGail UNIX for Dummies Questions & Answers 2 04-25-2003 10:34 AM
compare files ingunix UNIX for Dummies Questions & Answers 3 05-24-2001 08:44 AM

Reply
 
Submit Tools LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 01-27-2007
Registered User
 

Join Date: Dec 2006
Posts: 58
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
compare two files

I have file1 and file2:

file1:

11 xxx kksd ...
22 kkk kdsglg...
33 sss kdfjdksa...
44 kdsf dskjfkas ...
hh kdkf kdkkd..
jg dkf dfkdk ...
...

file2:

jg
22
hh
...

I need to check each line of file1. if the field one is in file2, I will keep it; if not, the whole line will be discarded. The result file will be:

jg dkf dfkdk ...
22 kkk kdsglg...
hh kdkf kdkkd..
...

please tell me how I can do this, thanks!
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 01-27-2007
aigles's Avatar
Registered User
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,073
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
A possible solution :
Code:
awk 'NR==FNR { keys[$1]++ ; next } $1 in keys' file2 file1

Jean-Pierre.
Reply With Quote
  #3 (permalink)  
Old 01-27-2007
Registered User
 

Join Date: Dec 2006
Posts: 58
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
Quote:
Originally Posted by aigles
A possible solution :
Code:
awk 'NR==FNR { keys[$1]++ ; next } $1 in keys' file2 file1

Jean-Pierre.
It works on the above question. however, my real problem is more complicated: the file1 is actually an XML file like this:

...
<object
type="user"
id="000039BF228B"
encryptedPassword=""
maxConnections=""
>
<checkListAttributes>
</checkListAttributes>
</object>

...
<object
type="user"
id="0000E2801BFD"
encryptedPassword=""
>
<checkListAttributes>
</checkListAttributes>
</object>
...

and file2 is a list of id, as:

...
000039BF228B
0000E2801BFD
...

I want to delete all the blocks whose id is not in file2, and keep those with id in file2. I think we can change the RS (record separator to </object>), but I do not know how to do the whole job. would you help again?
Reply With Quote
  #4 (permalink)  
Old 01-27-2007
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milan, Italy/Varna, Bulgaria
Posts: 1,333
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
With GNU awk!

Code:
patt="$(printf "id=\"%s\"|" $(<file2))"
awk '$0 ~ patt{print $0RS}' RS="</object>" patt="${patt%|}" file1
Reply With Quote
  #5 (permalink)  
Old 01-27-2007
Registered User
 

Join Date: Dec 2006
Posts: 58
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
Quote:
Originally Posted by radoulov
With GNU awk!

Code:
patt="$(printf "id=\"%s\"|" $(<file2))"
awk '$0 ~ patt{print $0RS}' RS="</object>" patt="${patt%|}" file1
how can I enter these two lines as a command, can you make it clear?
Reply With Quote
  #6 (permalink)  
Old 01-27-2007
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milan, Italy/Varna, Bulgaria
Posts: 1,333
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
Quote:
Originally Posted by fredao
how can I enter these two lines as a command, can you make it clear?
I'm not sure that I understand the question, but:

Code:
$ cat file1
<object
type="user"
id="000039BF228B"
encryptedPassword=""
maxConnections=""
>
<checkListAttributes>
</checkListAttributes>
</object>
<object
type="user"
id="0000E2801BFD_NOO"
encryptedPassword=""
>
<checkListAttributes>
</checkListAttributes>
</object>
<object
type="user"
id="0000E2801BFD"
encryptedPassword=""
>
<checkListAttributes>
</checkListAttributes>
</object>

$ cat file2
000039BF228B
0000E2801BFD

$ patt="$(printf "id=\"%s\"|" $(<file2))"
$ awk '$0 ~ patt{print $0RS}' RS="</object>" patt="${patt%|}" file1
<object
type="user"
id="000039BF228B"
encryptedPassword=""
maxConnections=""
>
<checkListAttributes>
</checkListAttributes>
</object>

<object
type="user"
id="0000E2801BFD"
encryptedPassword=""
>
<checkListAttributes>
</checkListAttributes>
</object>
Reply With Quote
  #7 (permalink)  
Old 01-27-2007
Registered User
 

Join Date: Dec 2006
Posts: 58
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
I think there is a misunderstanding, as I only want to keep the blocks whose id has a match in the second file. If I have a block as:

object
type="user"
id="999999999999"
encryptedPassword=""
>
<checkListAttributes>
</checkListAttributes>
</object>

and 999999999999 is not in the second file, the whole block should be discarded. but after I run your code, it is still there. any idea?
Reply With Quote
  #8 (permalink)  
Old 01-27-2007
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milan, Italy/Varna, Bulgaria
Posts: 1,333
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
I said GNU awk,
are you using GNU awk?
It's hard to troubleshoot, unless I can see the entire file1 and file2 content.
Could you also post the output from this commands:

Code:
patt="$(printf "id=\"%s\"|" $(<file2))" ; echo "${patt%|}"

Code:
$ awk --version| head -2
GNU Awk 3.1.5
Copyright (C) 1989, 1991-2005 Free Software Foundation.
$ cat file1
<object
type="user"
id="0000E2801BFD"
encryptedPassword=""
>
<checkListAttributes>
</checkListAttributes>
</object>
<object
type="user"
id="999999999999"
encryptedPassword=""
>
<checkListAttributes>
</checkListAttributes>
</object>

$ cat file2
000039BF228B
0000E2801BFD

$ awk '$0 ~ patt{print $0RS}' RS="</object>" patt="${patt%|}" file1
<object
type="user"
id="0000E2801BFD"
encryptedPassword=""
>
<checkListAttributes>
</checkListAttributes>
</object>
Reply With Quote
  #9 (permalink)  
Old 01-27-2007
Registered User
 

Join Date: Dec 2006
Posts: 58
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
sorry, I was wrong to enter the command incorrectly. It works, comsumes a lot of computational power though. thanks!
would you explain your code?
Reply With Quote
  #10 (permalink)  
Old 01-27-2007
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milan, Italy/Varna, Bulgaria
Posts: 1,333
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
Quote:
Originally Posted by fredao
[...]
It works, comsumes a lot of computational power though. thanks!
would you explain your code?
Yep,
it's an "ugly" and "buggy" code (think what happens if your file2 is big ).
I'm not able to write a good code in 2 minutes .
The first command generate your pattern list with a various "or" ("|").
The second tests all the records (RS="</object>" assumed) in file1 against it.
Reply With Quote
  #11 (permalink)  
Old 01-27-2007
Registered User
 

Join Date: Dec 2006
Posts: 58
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
comment

Quote:
Originally Posted by fredao
sorry, I was wrong to enter the command incorrectly. It works, comsumes a lot of computational power though. thanks!
would you explain your code?
nawk 'NR==FNR { keys[$1]++;next }; RS="</object>"; $3 in keys' file2 file1
Reply With Quote
  #12 (permalink)  
Old 01-27-2007
Registered User
 

Join Date: Dec 2006
Posts: 58
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
Quote:
Originally Posted by radoulov
Yep,
it's an "ugly" and "buggy" code (think what happens if your file2 is big ).
I'm not able to write a good code in 2 minutes .
The first command generate your pattern list with a various "or" ("|").
The second tests all the records (RS="</object>" assumed) in file1 against it.
patt is a shell variable or gawk variable? as the 1st command seems to have nothing to do with awk? is there any link for this grammar?
Reply With Quote
  #13 (permalink)  
Old 01-28-2007
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milan, Italy/Varna, Bulgaria
Posts: 1,333
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
Quote:
Originally Posted by fredao
patt is a shell variable or gawk variable?
[...]
patt is a shell variable here:

Code:
patt="$(printf "id=\"%s\"|" $(<file2))"
... and becomes an awk variable here:

Code:
awk '$0 ~ patt{print $0RS}' RS="</object>" patt="${patt%|}" file1
Reply With Quote
  #14 (permalink)  
Old 01-28-2007
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milan, Italy/Varna, Bulgaria
Posts: 1,333
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
And, of course, (given your input format) with GNU grep:

Code:
grep -B2 -A5 -f file2 file1

Last edited by radoulov; 01-28-2007 at 07:07 AM.
Reply With Quote
  #15 (permalink)  
Old 01-28-2007
Registered User
 

Join Date: Dec 2006
Posts: 58
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit! Stumble this Post!Spurl this Post!
Quote:
Originally Posted by radoulov
And, of course, (given your input format) with GNU grep:

Code:
grep -B2 -A5 -f file2 file1
what does this mean?
Reply With Quote
Google UNIX.COM
Reply

Thread Tools
Display Modes


The 50 most popular UNIX and Linux searches.
Google Search Cloud for The UNIX and Linux Forums
"inappropriate ioctl for device" 421 service not available, remote server has closed connection ^m autosys awk trim bash eval bash exec bash for loop boot: cannot open kernel/sparcv9/unix close_wait command copy/move folder in unix curses.h cut command in unix dead.letter find grep find null character in a unix file grep multiple lines grep or grep recursive grep unique inappropriate ioctl for device logrotate.conf lynx javascript mailx attachment mget mtime