extracting lines from a file1 which maches a pattern in file2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extracting lines from a file1 which maches a pattern in file2
# 1  
Old 07-29-2008
extracting lines from a file1 which maches a pattern in file2

Hi guys,
Can you help me in solving ths problem?
I have two files file1 and file2 as following:
===FILE1====
>LOC21
MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
>LOC05
MASSKFSTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
GRAFYSAPIQIWDSTTGKVASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
AKVLITYDSSTKLLVASLVYPSGS
>LOC48
MASLQTQMISFYAIFLSILLTTILFFKVNSTGEITSFSIPKFRPDQPNLIFQGGGYTTKEKLTLTKAVK

====FILE2====
LOC21
LOC48

I want to write the complete record form FILE1 (which starts from '>' sign) which matches the pattern in FILE2 into a new file FILE3 which shold look like -
>LOC21
MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
>LOC48
MASLQTQMISFYAIFLSILLTTILFFKVNSTGEITSFSIPKFRPDQPNLIFQGGGYTTKEKLTLTKAVK


your help is highly appretiated Smilie

Thanks

Last edited by smriti_shridhar; 07-29-2008 at 04:30 AM.. Reason: to make it more informative
# 2  
Old 07-29-2008
Try this awk program :
Code:
awk '
NR==FNR { keys[">" $1]++ ; next }
/^>/    { selected = ($1 in keys) }
selected
' FILE2 FILE1

Jean-Pierre.
# 3  
Old 07-29-2008
I could get 'aigles' code to work on my cygwin

Anyway, this is my version:
Code:
#! /bin/sh

if [ $# -ne 2 ]; then
        echo "Usage: $0 <file1> <file2>"
        exit 1
fi

awk -v f2=$2 '
BEGIN {
        ok=0
        count=1
        while ( getline < f2 ) {
                file2[count]=sprintf(">%s",$0)
                ++count
        }
}
/^>LOC/ {
        for (i=1;i<count;++i) {
                if ($0 == file2[i]) {
                        print $0
                        ok=1
                        next
                }
        }
        ok=0
}
ok==1 {
        print
}' $1

Run it
Code:
$ ./file.sh file1 file2
>LOC21
MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
>LOC48
MASLQTQMISFYAIFLSILLTTILFFKVNSTGEITSFSIPKFRPDQPNLIFQGGGYTTKEKLTLTKAVK


Last edited by chihung; 07-29-2008 at 07:03 AM..
# 4  
Old 07-30-2008
thanks!

Thanks Jean-Pierre..

I tried to run ur code but it didn't produce any output or error.

smriti.
# 5  
Old 07-30-2008
Thanks! chihung but

Thanks for your help.. the code is running perfect but i hv one more problem.

actually the line begining with '>' contain other words also and i have different files in which LOC can be smthn els like ABC or GNL but the first three letters after '>' will be same. I solved that by replacing the line
/^>LOC/ {
with
/^>/ {

my file is like this..
>LOC21 ths is a seq of protein bla-bla-bla
MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP

so whn i tried it on my actual file it could't work as far as i understood words with spaces in header line(begining with '>') is causing a trouble.

I will be thankful if you can help me to solve this out.

cheers!
Smilie
smriti
# 6  
Old 07-30-2008
Quote:
Originally Posted by smriti_shridhar
Thanks Jean-Pierre..

I tried to run ur code but it didn't produce any output or error.

smriti.
The script works fine on my box with your example data files.
Code:
> cat -n smriti1.dat
     1  >LOC21
     2  MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
     3  VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
     4  >LOC05
     5  MASSKFSTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
     6  GRAFYSAPIQIWDSTTGKVASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
     7  AKVLITYDSSTKLLVASLVYPSGS
     8  >LOC48
     9  MASLQTQMISFYAIFLSILLTTILFFKVNSTGEITSFSIPKFRPDQPNLIFQGGGYTTKEKLTLTKAVK
> cat -n smriti2.dat
     1  LOC21
     2  LOC48
> cat -n smriti.sh
     1  awk '
     2  NR==FNR { keys[">" $1]++ ; next }
     3  /^>/    { selected = ($1 in keys) }
     4  selected
     5  ' smriti2.dat smriti1.dat
     6
> smriti.sh
>LOC21
MASSKFCTVLSLALFLVLLTHANSAELFSFNFQTFNAANLILQGNASVSSSGQLRLTEVKSNGEPKVASL
VASFATAFTFNILAPILSNSADGLAFALVPVGSQPKFNGGFLGLFQNVTYDP
>LOC48
MASLQTQMISFYAIFLSILLTTILFFKVNSTGEITSFSIPKFRPDQPNLIFQGGGYTTKEKLTLTKAVK
>

Beware, the order for input files is FILE2 FILE1.
if you specify FILE1 FILE2, the script doesn't product any result.


Jean-Pierre.
# 7  
Old 07-30-2008
Thanks a lot!

Hi aigles,

Its working now although I hd put the order of files correctly before as well.

Actually i tried to run it as single line on command line. I think it shudn't make any
difference.
But anyways its working fine nw and it solved my other problem also as it works even if my header line (the one begining with '>' ) contain more words.

Thanks Jean Smilie
smriti
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Compare file1 and file2, print matching lines in same order as file1

I want to print only the lines in file2 that match file1, in the same order as they appear in file 1 file1 file2 desired output: I'm getting the lines to match awk 'FNR==NR {a++}; FNR!=NR && a' file1 file2 but they are in sorted order, which is not what I want: Can anyone... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

2. Shell Programming and Scripting

Help with Shell Script to identify lines in file1 and write them to file2

Hi, I am running my pipeline and capturing all stout from multiple programs to a .txt file. I want to go into that .txt file and search for specific lines, and finally print those lines in a second .txt file. I can do this using grep, awk, or sed for each line, but have not been able to get... (2 Replies)
Discussion started by: hmortens
2 Replies

3. Shell Programming and Scripting

Looking for lines, which is present in file1 but not in file2 using UNIX and awk

I have 2 files with 7 fields and i want to print the lines which is present in file1 but not in file2 based on field1 and field2. Logic: I want to print all the lines, where there is a particular column1 and column2. And we do not find the set of column1 and column2 in file2. Example: "sc2/10... (3 Replies)
Discussion started by: NamS
3 Replies

4. Shell Programming and Scripting

Pattern Matching & replacing of content in file1 with file2

I have file 1 & file 2 with content mentioned below. I want to get the output as shown in file3. Requirement: check the content of column 1 & column 2, if value of column 1 in file1 matches with first column of file2 then remaining columns(2&3) of file2 should get replaced, also if value of... (4 Replies)
Discussion started by: siramitsharma
4 Replies

5. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

6. Shell Programming and Scripting

Remove lines in file1 with values from file2

Hello, I have two data files: file1 12345 aa bbb cccc 98765 qq www uuuu 76543 pp rrr bbbbb 34567 nn ccc sssss 87654 qq ppp rrrrr file2 98765 34567 I need to remove the lines from file1 if the first field contains a value that appears in file2: output 12345 aa bbb cccc 76543 pp... (2 Replies)
Discussion started by: palex
2 Replies

7. Shell Programming and Scripting

Display lines from file1 that are not in file2

Hi there, I know the command diff but what I want is slightly different. I have two files containing lines that look like md5sums. file1 5a1e8cee2eb2157c86e7266ee38e47c3 /tmp/file1 a254c48bdd064a40b82477b9fa5be05d /tmp/file2 2d57c72ec898acddf8a6bacb3f821572 /tmp/file3... (5 Replies)
Discussion started by: chebarbudo
5 Replies

8. UNIX for Dummies Questions & Answers

Extracting 482/300k columns no's with respective info. listed in file2 from file1

Hi, I have 2 files File 1: 1 2 3 4 5 6 .......etc until column 300K 1 23 21 24 12 22 1 23 21 24 12 22 1 23 21 24 12 22 1 23 21 24 12 22 1 23 21 24 12 22 1 23 21 24 12 22 1 23 21 24 12 22 . . etc until row 1411 File 2: (14 Replies)
Discussion started by: sogi
14 Replies

9. UNIX for Advanced & Expert Users

print contents of file2 for matching pattern in file1 - AWK

File1 row is same as column 2 in file 2. Also file 2 will either start with A, B or C. And 3rd column in file 2 is always F2. When column 2 of file 2 matches file1 column, print all those rows into a separate file. Here is an example. file 1: 100 103 104 108 file 2: ... (6 Replies)
Discussion started by: i.scientist
6 Replies

10. Shell Programming and Scripting

delete lines from file2 beginning w/file1

I've been searching around here and other places, but can't put this together... I've got a unique list of words in file 1 (one word on each line). I need to delete each line in file2 that begins with the word in file1. I started this way, but want to know how to use file1 words instead... (13 Replies)
Discussion started by: michieka
13 Replies
Login or Register to Ask a Question