extracting lines from a file1 which maches a pattern in file2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extracting lines from a file1 which maches a pattern in file2
# 8  
Old 07-30-2008
one more thing aigles...

I tried the same script on a little modified file where headre line is
>LOC_Os01g57570.1|12001.m11908|protein minor allergen Alt a 7, putative, [expressed]
MAVKVYVVYYSMYGHVAKLAEEIKKGASSIEGVEAKIWQVPETLHEEVLGKMGAPPKPDV
PTITPQELTEADGILFGFP

in place of

>LOC575
MAVKVYVVYYSMYGHVAKLAEEIKKGASSIEGVEAKIWQVPETLHEEVLGKMGAPPKPDV
PTITPQELTEADGILFGFP

script is not working on ths.. can you please help me out??

Thanks
Smriti
# 9  
Old 07-30-2008
What is the contents of FILE2 ?
# 10  
Old 07-30-2008
file detail

Hi aigles,

here are the file details -

===FILE1===
>LOC_Os01g57570.1|12001.m11908|protein minor allergen Alt a 7, [expressed]
MAVKVYVVYYSMYGHVAKLAEEIKKGASSIEGVEAKIWQVPETLHEEVLGKMGAPPKPDV
PTITPQELTEADGILFGFP
>LOC_Os01g57640.1|12001.m11908|protein lectin 7, (putative), expressed
MAVKVYVVYYSMYGHVAKLAEEIKKGASSIEGVEAKIWQVPETLHEEVLGKMGAPPKPDV
PTITPQELTEADGILFGFPTRFGMMAAQMKAFFDATGGLWSEQSLAGKPAGIFFS
>LOC_Os01g57000.2|12001.m43222|protein minor allergen Alt a 7
MAVKVYVVYYSMYGHVAKLAEEIKKGASSIEGVEAKIWQVPETLHEEVLGKMGAPPKPDV
PTITPQELTEADGILFGFPTRFGMMAAQMKAFFDATGGLWSEQSL

====FILE2====
LOC_Os01g57570
LOC_Os01g57000

and ths LOC can be any three letters such as ABC or GNL but they will be same in every header (line with a '>' symbol)

Thanks Smilie
smriti
# 11  
Old 07-30-2008
If the first character '.' in records of FILE1 can act as a field separator :
Code:
awk -F. '
NR==FNR { keys[">" $1]++ ; next }
/^>/    { selected = ($1 in keys) }
selected
' FILE2 FILE1

If it is not the case, if all values in FILE2 have the same length :
Code:
awk '
NR==1   { key_length = length($1)+1 }
NR==FNR { keys[">" $1]++ ; next }
/^>/    { selected = (substr($0, 1, key_length) in keys) }
selected
' FILE2 FILE1

Otherwise :
Code:
awk '
NR==FNR { keys[">" $1] = length($1)+1 ; next }
/^>/    {
   selected = 0;
   for (k in keys) {
      if (substr($0,1,keys[k]) == k) {
         selected = 1;
         break;
      }
   }
}
selected
' FILE2 FILE1

Jean-Pierre.
# 12  
Old 07-31-2008
Thanks Jean

The code is running perfect. Thanks a lot.

smriti Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Compare file1 and file2, print matching lines in same order as file1

I want to print only the lines in file2 that match file1, in the same order as they appear in file 1 file1 file2 desired output: I'm getting the lines to match awk 'FNR==NR {a++}; FNR!=NR && a' file1 file2 but they are in sorted order, which is not what I want: Can anyone... (4 Replies)
Discussion started by: pathunkathunk
4 Replies

2. Shell Programming and Scripting

Help with Shell Script to identify lines in file1 and write them to file2

Hi, I am running my pipeline and capturing all stout from multiple programs to a .txt file. I want to go into that .txt file and search for specific lines, and finally print those lines in a second .txt file. I can do this using grep, awk, or sed for each line, but have not been able to get... (2 Replies)
Discussion started by: hmortens
2 Replies

3. Shell Programming and Scripting

Looking for lines, which is present in file1 but not in file2 using UNIX and awk

I have 2 files with 7 fields and i want to print the lines which is present in file1 but not in file2 based on field1 and field2. Logic: I want to print all the lines, where there is a particular column1 and column2. And we do not find the set of column1 and column2 in file2. Example: "sc2/10... (3 Replies)
Discussion started by: NamS
3 Replies

4. Shell Programming and Scripting

Pattern Matching & replacing of content in file1 with file2

I have file 1 & file 2 with content mentioned below. I want to get the output as shown in file3. Requirement: check the content of column 1 & column 2, if value of column 1 in file1 matches with first column of file2 then remaining columns(2&3) of file2 should get replaced, also if value of... (4 Replies)
Discussion started by: siramitsharma
4 Replies

5. UNIX for Dummies Questions & Answers

if matching strings in file1 and file2, add column from file1 to file2

I have very limited coding skills but I'm wondering if someone could help me with this. There are many threads about matching strings in two files, but I have no idea how to add a column from one file to another based on a matching string. I'm looking to match column1 in file1 to the number... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

6. Shell Programming and Scripting

Remove lines in file1 with values from file2

Hello, I have two data files: file1 12345 aa bbb cccc 98765 qq www uuuu 76543 pp rrr bbbbb 34567 nn ccc sssss 87654 qq ppp rrrrr file2 98765 34567 I need to remove the lines from file1 if the first field contains a value that appears in file2: output 12345 aa bbb cccc 76543 pp... (2 Replies)
Discussion started by: palex
2 Replies

7. Shell Programming and Scripting

Display lines from file1 that are not in file2

Hi there, I know the command diff but what I want is slightly different. I have two files containing lines that look like md5sums. file1 5a1e8cee2eb2157c86e7266ee38e47c3 /tmp/file1 a254c48bdd064a40b82477b9fa5be05d /tmp/file2 2d57c72ec898acddf8a6bacb3f821572 /tmp/file3... (5 Replies)
Discussion started by: chebarbudo
5 Replies

8. UNIX for Dummies Questions & Answers

Extracting 482/300k columns no's with respective info. listed in file2 from file1

Hi, I have 2 files File 1: 1 2 3 4 5 6 .......etc until column 300K 1 23 21 24 12 22 1 23 21 24 12 22 1 23 21 24 12 22 1 23 21 24 12 22 1 23 21 24 12 22 1 23 21 24 12 22 1 23 21 24 12 22 . . etc until row 1411 File 2: (14 Replies)
Discussion started by: sogi
14 Replies

9. UNIX for Advanced & Expert Users

print contents of file2 for matching pattern in file1 - AWK

File1 row is same as column 2 in file 2. Also file 2 will either start with A, B or C. And 3rd column in file 2 is always F2. When column 2 of file 2 matches file1 column, print all those rows into a separate file. Here is an example. file 1: 100 103 104 108 file 2: ... (6 Replies)
Discussion started by: i.scientist
6 Replies

10. Shell Programming and Scripting

delete lines from file2 beginning w/file1

I've been searching around here and other places, but can't put this together... I've got a unique list of words in file 1 (one word on each line). I need to delete each line in file2 that begins with the word in file1. I started this way, but want to know how to use file1 words instead... (13 Replies)
Discussion started by: michieka
13 Replies
Login or Register to Ask a Question