Search for a pattern,extract value(s) from next line, extract lines having those extracted value(s)
I have hundreds of files to process. In each file
I need to look for a pattern then
extract value(s) from next line and then
search for value(s) selected from point (2) in the same file at a specific position.
Code:
HEADER ELECTRON TRANSPORT 18-MAR-98 1A7V
TITLE CYTOCHROME C' FROM RHODOPSEUDOMONAS PALUSTRIS
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: CYTOCHROME C';
COMPND 3 CHAIN: A, B
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: RHODOPSEUDOMONAS PALUSTRIS;
SOURCE 3 ORGANISM_TAXID: 1076
KEYWDS ELECTRON TRANSPORT
EXPDTA X-RAY DIFFRACTION
AUTHOR N.SHIBATA,S.IBA,S.MISAKI,T.E.MEYER,R.G.BARTSCH,
AUTHOR 2 M.A.CUSANOVICH,Y.HIGUCHI,N.YASUOKA
REVDAT 2 24-FEB-09 1A7V 1 VERSN
REVDAT 1 17-JUN-98 1A7V 0
...........................................................................
Many lines in between
..........................................................................
ATOM 1 N GLN A 1 45.346 45.040 5.004 1.00 90.15 N
ATOM 2 CA GLN A 1 45.068 43.614 4.669 1.00 89.25 C
ATOM 3 C GLN A 1 45.626 42.698 5.751 1.00 89.26 C
ATOM 4 O GLN A 1 46.326 43.158 6.652 1.00 89.60 O
ATOM 5 CB GLN A 1 45.662 43.254 3.302 0.20 89.81 C
ATOM 6 CG GLN A 1 45.062 44.027 2.134 0.20 89.99 C
ATOM 7 CD GLN A 1 43.546 43.995 2.137 0.20 89.88 C
ATOM 8 OE1 GLN A 1 42.909 44.738 2.883 0.20 89.97 O
.........................................................................................................................
ATOM 920 OG SER A 125 44.804 18.922 -1.607 1.00 91.77 O
ATOM 921 OXT SER A 125 43.350 14.761 -1.403 1.00 94.70 O
TER 922 SER A 125
ATOM 923 N GLN B 1 11.868 35.655 8.087 1.00 91.68 N
ATOM 924 CA GLN B 1 13.224 35.969 8.625 1.00 90.25 C
ATOM 925 C GLN B 1 13.335 37.449 8.982 1.00 89.59 C
ATOM 926 O GLN B 1 12.346 38.180 8.909 1.00 89.38 O
ATOM 927 CB GLN B 1 14.309 35.585 7.611 0.20 91.63 C
ATOM 928 CG GLN B 1 15.059 34.291 7.944 0.20 89.78 C
..........................................................................................................................
..........................................................................................................................
In this example,
I need to look for CYTOCHROME C
extract a and b from just next line
print all lines having a and b at field number 5.
So the output should be:
Code:
ATOM 1 N GLN A 1 45.346 45.040 5.004 1.00 90.15 N
ATOM 2 CA GLN A 1 45.068 43.614 4.669 1.00 89.25 C
ATOM 3 C GLN A 1 45.626 42.698 5.751 1.00 89.26 C
ATOM 4 O GLN A 1 46.326 43.158 6.652 1.00 89.60 O
ATOM 5 CB GLN A 1 45.662 43.254 3.302 0.20 89.81 C
ATOM 6 CG GLN A 1 45.062 44.027 2.134 0.20 89.99 C
ATOM 7 CD GLN A 1 43.546 43.995 2.137 0.20 89.88 C
ATOM 8 OE1 GLN A 1 42.909 44.738 2.883 0.20 89.97 O
.........................................................................................................................
ATOM 920 OG SER A 125 44.804 18.922 -1.607 1.00 91.77 O
ATOM 921 OXT SER A 125 43.350 14.761 -1.403 1.00 94.70 O
ATOM 923 N GLN B 1 11.868 35.655 8.087 1.00 91.68 N
ATOM 924 CA GLN B 1 13.224 35.969 8.625 1.00 90.25 C
ATOM 925 C GLN B 1 13.335 37.449 8.982 1.00 89.59 C
ATOM 926 O GLN B 1 12.346 38.180 8.909 1.00 89.38 O
ATOM 927 CB GLN B 1 14.309 35.585 7.611 0.20 91.63 C
ATOM 928 CG GLN B 1 15.059 34.291 7.944 0.20 89.78 C
.............................................................................................................................
.............................................................................................................................
Now the problem is, the search pattern can be in many ways, like:
Code:
COMPND 2 MOLECULE: CYTOCHROME C';
COMPND 3 CHAIN: A;
OR
COMPND 2 MOLECULE: CYTOCHROME C';
COMPND 3 CHAIN: A, B
OR
COMPND 2 MOLECULE: CYTOCHROME C';
COMPND 3 CHAIN: A, B , C, D;
OR
COMPND 2 MOLECULE: CYTOCHROME C;
COMPND 3 CHAIN: A;
COMPND 4 SYNONYM: SOXA;
COMPND 5 MOL_ID: 2;
COMPND 6 MOLECULE: CYTOCHROME C;
COMPND 7 CHAIN: B;
Sorry for sounding complicated. Any help is highly appreciated. I respect your time.
Hello ,
I need your help to extract a line in a big file , and this line is always 11 lines
before a specific pattern . Do you know a way via Awk ?
Thanks in advance
npn35 (17 Replies)
Hi,
the text line looks like this:
"test1" " " "test2" "test3" "test4" "10" "test 10 12" "00:05:58" "filename.bin" "3.3MB" "/dir/name" "18459"
what's the best way to select any of it? So I can for example get only the time or size and so on.
I was trying awk -F""" '{print $N}' but... (3 Replies)
The text line has the following formats:
what.ever.bla.bla.C01G06.BLA.BLA2
what.ever.bla.bla.C11G33.BLA.BLA2
what.ever.bla.bla.01x03.BLA.BLA2
what.ever.bla.bla.03x05.BLA.BLA2
what.ever.bla.bla.Part01.BLA.BLA2
and other similar ones, I need a way to select the "what.ever.bla.bla" part out... (4 Replies)
This is my first post, please be nice. I have tried to google and read different tutorials.
The task at hand is:
Input file input.txt (example)
abc123defhij-E-1234jslo
456ujs-W-abXjklp
From this file the task is to grep the -E- and -W- strings that are unique and write a new file... (5 Replies)
Hi Guys,
I have a situation wherein I need to extract two lines from below the search string.
Eg.
Current:
$ grep "$(date +'%a %b %e')" alert.log
Mon Apr 12 03:58:10 2010
Mon Apr 12 12:51:48 2010
$
Here I would like the display to be something like
Mon Apr 12... (6 Replies)
Hi,
I need to extract <APPNUMBER> tag alone, if the <college> haas IIT Chennai value. college tag value will have spaces embedded. Those spaces should not be suppresses.
My Source file
<Record><sno>1</sno><empid>E0001</empid><name>Rejsh suderam</name><college>IIT ... (3 Replies)
I have a file that has some lines starts with *
I want to get these lines, then get the word between "diac" and "lex".
ex.
file:
;;WORD AlAx
*0.942490 diac:Al>ax lex:>ax_1 bw:Al/DET+>ax/NOUN+ gloss:brother pos:noun prc3:0 prc2:0 prc1:0 prc0:Al_det per:na asp:na vox:na mod:na gen:m num:s... (4 Replies)
Hi all,
I got a file that contains the following content, Actually it is a part of the file content,
Installing XYZ XYZA Image, API 18, revision 2
Unzipping XYZ XYZA Image, API 18, revision 2 (1%)
Unzipping XYZ XYZA Image, API 18, revision 2 (96%)
Unzipping XYZ XYZA Image, API 18,... (7 Replies)
Hi,
I have below file structure and need to display hours, minutes and seconds as different fields.
Incase hour or minute field is not there it should default to zero.
*** Total elapsed time was 2 hours, 54 minutes and 40 seconds.
*** Total elapsed time was 42 minutes and 36 seconds.... (7 Replies)
Hi All,
i would like to get some help regarding extracting certain characters from a line grepped.
blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah
blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah... (10 Replies)