I'm not sure about everything you want to do, but I think this does most of it:
sed - print the line after it finds the line with the matching regex.
awk - print only the text after the colon, could change this if needed pretty simply.
sed - remove spaces & commas so now it'll just read: A, or AB, or AC, etc.
Here's the list of lines where the 5th argument matches your AB:
Adding a grep above for ^[\ \t]*ATOM will give us just the atom lines, so now we just combine it all:
Edit: Probably a cleaner way to do this using just awk, but I never do things that way, so not sure on the exact changes you'd need to make.
Last edited by Vryali; 07-18-2012 at 02:13 PM..
Reason: Cleaned a bit.
Whenever you have sed | awk | grep | kitchen | sink, it can probably be done all in one awk. It's a lot more than a glorified 'cut'.
1) Search for a line containing CYTOCHROME C where there's two fields (as delimited by : )
2) Get the next line, clean it up with gsub(strip out " " ";" ","), turn the second field into a regex like [AB]
3) Set field separator to space.
4) For every line thereafter, if the line contains ATOM and the fifth field matches the regex, print the line.
These 2 Users Gave Thanks to Corona688 For This Post:
Thank you very much Corona688, you made my day .
This script is working fine when I put it in a shell script like this:
But I don't know how to run it on command line directly or by saving it in an AWK script like temp.awk although I use AWK a little bit. Once again thank you very much for the help.
Quote:
Originally Posted by Corona688
Whenever you have sed | awk | grep | kitchen | sink, it can probably be done all in one awk. It's a lot more than a glorified 'cut'.
1) Search for a line containing CYTOCHROME C where there's two fields (as delimited by : )
2) Get the next line, clean it up with gsub(strip out " " ";" ","), turn the second field into a regex like [AB]
3) Set field separator to space.
4) For every line thereafter, if the line contains ATOM and the fifth field matches the regex, print the line.
---------- Post updated at 11:24 AM ---------- Previous update was at 11:21 AM ----------
Thanks Vryali for your reply .
---------- Post updated at 07:57 PM ---------- Previous update was at 11:24 AM ----------
On running the script, some files are giving error. These are few top most lines of 2 files and their respective errors:
Is it something with gsub function or following expression? Thanks & Regards
I can't tell. CYTOCHROME C isn't anywhere in that file, so I have no idea what it's supposed to match. It must be picking up regex-like characters from the string it's trying to catch, which foul up the RGX variable when it's created.
You can put the script in a file easily enough like this:
I'm probably out for the rest of the day unfortunately. I'll check this evening if I can, for further details on your difficulty.
Thanks Corona688 for the reply. Files containing CYTOCHROME C are treating your script very well. Above I took examples of those files which are throwing errors. The only difference here is, I am searching for pattern "LYSOZYME" instead of CYTOCHROME C. The script on a file (with extension .pdb) goes like this:
My little understanding tells me that your speculation is right. Something is messing up with RGX variable. I think ":" symbol just 1 line above to the line mentioned in error. As in file 132L.pdb, error mentions line 4 and I can see a ":" in line 3 and matching pattern in left hand side of it. Just to be clear, I don't want to extract line 3 or 4 here.
Thanks and Regards,
Ashwani
Quote:
Originally Posted by Corona688
I can't tell. CYTOCHROME C isn't anywhere in that file, so I have no idea what it's supposed to match. It must be picking up regex-like characters from the string it's trying to catch, which foul up the RGX variable when it's created.
You can put the script in a file easily enough like this:
I'm probably out for the rest of the day unfortunately. I'll check this evening if I can, for further details on your difficulty.
Hi All,
i would like to get some help regarding extracting certain characters from a line grepped.
blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah
blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah... (10 Replies)
Hi,
I have below file structure and need to display hours, minutes and seconds as different fields.
Incase hour or minute field is not there it should default to zero.
*** Total elapsed time was 2 hours, 54 minutes and 40 seconds.
*** Total elapsed time was 42 minutes and 36 seconds.... (7 Replies)
Hi all,
I got a file that contains the following content, Actually it is a part of the file content,
Installing XYZ XYZA Image, API 18, revision 2
Unzipping XYZ XYZA Image, API 18, revision 2 (1%)
Unzipping XYZ XYZA Image, API 18, revision 2 (96%)
Unzipping XYZ XYZA Image, API 18,... (7 Replies)
I have a file that has some lines starts with *
I want to get these lines, then get the word between "diac" and "lex".
ex.
file:
;;WORD AlAx
*0.942490 diac:Al>ax lex:>ax_1 bw:Al/DET+>ax/NOUN+ gloss:brother pos:noun prc3:0 prc2:0 prc1:0 prc0:Al_det per:na asp:na vox:na mod:na gen:m num:s... (4 Replies)
Hi,
I need to extract <APPNUMBER> tag alone, if the <college> haas IIT Chennai value. college tag value will have spaces embedded. Those spaces should not be suppresses.
My Source file
<Record><sno>1</sno><empid>E0001</empid><name>Rejsh suderam</name><college>IIT ... (3 Replies)
Hi Guys,
I have a situation wherein I need to extract two lines from below the search string.
Eg.
Current:
$ grep "$(date +'%a %b %e')" alert.log
Mon Apr 12 03:58:10 2010
Mon Apr 12 12:51:48 2010
$
Here I would like the display to be something like
Mon Apr 12... (6 Replies)
This is my first post, please be nice. I have tried to google and read different tutorials.
The task at hand is:
Input file input.txt (example)
abc123defhij-E-1234jslo
456ujs-W-abXjklp
From this file the task is to grep the -E- and -W- strings that are unique and write a new file... (5 Replies)
The text line has the following formats:
what.ever.bla.bla.C01G06.BLA.BLA2
what.ever.bla.bla.C11G33.BLA.BLA2
what.ever.bla.bla.01x03.BLA.BLA2
what.ever.bla.bla.03x05.BLA.BLA2
what.ever.bla.bla.Part01.BLA.BLA2
and other similar ones, I need a way to select the "what.ever.bla.bla" part out... (4 Replies)
Hi,
the text line looks like this:
"test1" " " "test2" "test3" "test4" "10" "test 10 12" "00:05:58" "filename.bin" "3.3MB" "/dir/name" "18459"
what's the best way to select any of it? So I can for example get only the time or size and so on.
I was trying awk -F""" '{print $N}' but... (3 Replies)
Hello ,
I need your help to extract a line in a big file , and this line is always 11 lines
before a specific pattern . Do you know a way via Awk ?
Thanks in advance
npn35 (17 Replies)