how to get one particular section (using awk)?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to get one particular section (using awk)?
# 1  
Old 10-28-2010
how to get one particular section (using awk)?

Hey,
I have a problem about how to get one section of a file?
I'm new to shell, but by reading some tutorial, I think I can use awk to do this.

my input file:
Code:
>ref|ZP_04937576.1|
ECRINAEDPKTFMPSPGKVKHFHAPGGNGVRVDSHLYSGYSVPPNYDSLVGKVITYGAD
DEALARMRNALDELIVDGIKTNTELHKDLVRDAAFCKGGVNIHYLE
>ref|NP_253535.1|
ECRINAEDPKTFMPSPGKVKHFHAPGGNGVRVDSHLYSGYSVPPNYDSLVGKVITYGAD
DEALARMRNALDELIVDGIKTNTELHKDLVRDAAFCKGGVNIHYLE
>ref|YP_002442811.1|
ECRINAEDPKTFMPSPGKVKHFHAPGGNGVRVDSHLYSGYSVPPNYDSLVGKVITYGAD
DEALARMRNALDELIVDGIKTNTELHKDLVRDAAFCKGGVNIHYLE
>pdb|2VQD|A
ECRINAEDPKTFMPSPGKVKHFHAPGGNGVRVDSHLYSGYSVPPNYDSLVGKVITYGA
DEALARMRNALDELIVDGIKTNTELHKDLVRDAAFCKGGVNIHYLE

-----------------
say I have the name "NP_253535.1", how do I return the whole thing?
note each section are begin with symbol ">"

expected output:
--------
Code:
>ref|NP_253535.1|
ECRINAEDPKTFMPSPGKVKHFHAPGGNGVRVDSHLYSGYSVPPNYDSLVGKVITYGAD
DEALARMRNALDELIVDGIKTNTELHKDLVRDAAFCKGGVNIHYLE

Thanks a lot!!
Moderator's Comments:
Mod Comment
code tags, please!

Last edited by vgersh99; 10-28-2010 at 04:10 PM.. Reason: code tags, please!
# 2  
Old 10-28-2010
Code:
awk -vRS=">" '/NP_253535\.1/' file

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 10-28-2010
I think sed is a more regular and friendly tool that is usually faster and easy to use. Awk has more strength in delimited many-field records. I am ot much of an awk user, and I don't think either is great about multi-line blocks, but sed is pretty good, as you can pile up lines in the buffer.

Code:
sed '
  :loop
  /^>ref|'$input'| *$/!d
  :loop2
  $b
  N
  P
  s/.*\n//
  /^>...|.*| *$/b loop
  b loop2
 ' $in_file

Narrative: sed opens $in_file and examines every line.
  1. Set a branch tag called 'loop'
  2. If we do not find the desired $input in a header line, delete this line (which starts the script over on the next line).
  3. Set a second branch tag 'loop2',
  4. if at EOF, branch (to end of script = print and exit),
  5. get the next line into the buffer after the first and \n,
  6. print the first line,
  7. delete the first line and \n,
  8. if there is a header in the buffer, loop way back to :loop,
  9. branch to loop2 to continue printing the block.
You can see the simple commands in sed allow easy reuse. I call this sort of sed a looper script. Some sed scripts just work on one line, so I call them filter-transform scripts. If you need both at once, you can pipe one sed to the next. Sed is fast and has no size limits.

I see an awk guy bartus11 has given us a one line awk script. I could one line, I guess, but I like to make it easy to read and maintain. The '.' in your $input is the sed metachar for any char, but that does not seem a very big weakness. I used more definition on the header line, so you can have <>| in the text without side effects. Is every '^>[^|]\{3\}|[^\]\{1,99\}| *$' line a header? I may have used too much definition, with /^>ref| so I will revise -- poof! Always >...|?

Last edited by DGPickett; 10-28-2010 at 04:23 PM..
# 4  
Old 10-28-2010
Quote:
Originally Posted by bartus11
Code:
awk -vRS=">" '/NP_253535\.1/' file

I like that one.
A slight variation on a theme:
Code:
nawk -v str='NP_253535.1' '$0~str{print RS $0}' RS='>' myFile

# 5  
Old 10-28-2010
Can RS= take a regex?
# 6  
Old 10-28-2010
Quote:
Originally Posted by DGPickett
Can RS= take a regex?
My testing machine's GNU AWK can take some kind of regex in the form of character set "[]" , for example:
Code:
[bart@linux ~]$ awk -vRS="[>|]" 'NR==2' a
ref
[bart@linux ~]$ awk -vRS="[>|]" 'NR==3' a
ZP_04937576.1
[bart@linux ~]$ awk -vRS="[>|]" 'NR==4' a

ECRINAEDPKTFMPSPGKVKHFHAPGGNGVRVDSHLYSGYSVPPNYDSLVGKVITYGAD
DEALARMRNALDELIVDGIKTNTELHKDLVRDAAFCKGGVNIHYLE

After testing some more, I think any regex can go there Smilie
# 7  
Old 10-28-2010
Great, you an make the RS proof against a > in the text somewhere! Does it take *, too? Do you lose the RS if you expand it?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to parse section of csv into array

In the awk below I am trying to parse the Sample Name below the section. The values that are extracted are read into array s(each value in a row seperated by a space) which will be used later in a bash script. The awk does execute but no values are printed. I am also not sure how to print in a row... (1 Reply)
Discussion started by: cmccabe
1 Replies

2. Shell Programming and Scripting

awk to lookup section of file in a range of another file

In the below, I am trying to lookup $1 and $2 from file1, in a range search using $1 $2 $3 of file2. If the search key from file1 is found in file2, then the word low is printed in the last field of that line in the updated file1. Only the last section of file1 needs to be searched, but I am not... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Extracting text from within a section of text using AWK

I have a command which returns the below output. How can I write a script to extract mainhost and secondhost from this output and put it into an array? I may sometimes have more hosts like thirdhost. I am redirecting this output to a variable. So I guess there should be a awk or sed command to... (7 Replies)
Discussion started by: heykiran
7 Replies

4. UNIX for Dummies Questions & Answers

Sorting arrays horizontally without END section, awk

input: ref001, Europe, Belgium, 1001 ref001, Europe, Spain, 203 ref001, Europe, Germany, 457 ref002, America, Canada, 234 ref002, America, US, 87 ref002, America, Alaska, 652 Without using an END section, I need to write all the info related to the same ref number ($1)and continent ($2) on... (9 Replies)
Discussion started by: lucasvs
9 Replies

5. Shell Programming and Scripting

using awk to get specific section of lines in logs

i have a log file that has the date and time that looks like this: Wed Jun 28 15:46:21 2012 test failed tailed passed passed not error panic what we want to focus on is the first 5 columns because they contain the date and time. the date and time can be anywhere on the line. in this... (6 Replies)
Discussion started by: SkySmart
6 Replies

6. Shell Programming and Scripting

Prepend first line of section to each line until the next section header

I have searched in a variety of ways in a variety of places but have come up empty. I would like to prepend a portion of a section header to each following line until the next section header. I have been using sed for most things up until now but I'd go for a solution in just about anything--... (7 Replies)
Discussion started by: pagrus
7 Replies

7. Shell Programming and Scripting

Extract section of file based on word in section

I have a list of Servers in no particular order as follows: virtualMachines="IIBSBS IIBVICDMS01 IIBVICMA01"And I am generating some output from a pre-existing script that gives me the following (this is a sample output selection). 9/17/2010 8:00:05 PM: Normal backup using VDRBACKUPS... (2 Replies)
Discussion started by: jelloir
2 Replies

8. Post Here to Contact Site Administrators and Moderators

New section

Hi Just a thought if it already hasn't been suggested. While looking at the forums I thought it might be a good idea under somewhere like 'special forums' add a section called 'projects'. I think this would be good for people to be able to post projects they have created. For example I am... (3 Replies)
Discussion started by: woofie
3 Replies

9. Shell Programming and Scripting

sed & awk--get section of file based 2 params

I need to get a section of a file based on 2 params. I want the part of the file between param 1 & 2. I have tried a bunch of ways and just can't seem to get it right. Can someone please help me out.....its much appreciated. Here is what I have found that looks like what I want....but doesn't... (12 Replies)
Discussion started by: Andy Cook
12 Replies

10. Post Here to Contact Site Administrators and Moderators

New Section

Just like we have a section "Unix for dummies..." , why not have a section on UNIX BACKUP AND RECOVERY Thanks :) (3 Replies)
Discussion started by: kapilv
3 Replies
Login or Register to Ask a Question