Pattern match with awk/sed - help


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Pattern match with awk/sed - help
# 1  
Old 02-23-2015
Hammer & Screwdriver Pattern match with awk/sed - help

I need to grep for the pattern text inside the square brackets which are in red and not in green..my current code greps patterns both of them, which i don't want

Input file
Code:
ref|XP_002371341.1| oxoacyl-ACP reductase, putative [Toxoplasma gondii ME49] gb|EPT24759.1| 3-ketoacyl-(acyl-carrier-protein) reductase [Toxoplasma gondii ME49] gb|ESS34081.1| 3-ketoacyl-(acyl-carrier-protein) reductase [Toxoplasma gondii VEG](376)	-	243	134	61.4617940199336	1	230	2e-71	80.7308970099668
gb|EPR63881.1| 3-ketoacyl-(acyl-carrier-protein) reductase [Toxoplasma gondii GT1](376)	-	243	134	61.4617940199336	1	230	2e-71	80.7308970099668
ref|XP_003885852.1| 3-ketoacyl-(Acyl-carrier-protein) reductase, related [Neospora caninum Liverpool] emb|CBZ55826.1| 3-ketoacyl-(Acyl-carrier-protein) reductase, related [Neospora caninum Liverpool](376)	-	242	137	61.7940199335548	1	229	8e-71	80.3986710963455
emb|CDJ42835.1| oxoacyl-ACP reductase, putative [Eimeria tenella](347)	-	240	141	61.7940199335548	1	211	3e-64	79.734219269103
emb|CDJ64722.1| oxoacyl-ACP reductase, putative [Eimeria necatrix](347)

My current code
Code:
while read line
do
echo $line |  awk 'NR>1{print $1}' RS=[ FS=] >> $OUTPUTFILE
done <$list

any help or suggestions please..

Hint: only positive is for the patterns in red there is a number in brackets next to the pattern like=> (347), which can be used as markers

Last edited by Scrutinizer; 02-23-2015 at 11:35 PM.. Reason: code tags also for data samples; added closing color tag bracket
# 2  
Old 02-23-2015
You do not need the shell loop, since awk has an implicit loop built in in the middle section:
Code:
awk 'NR>1{print $1}' RS=[ FS=] "$list" >> "$OUTPUTFILE"

will accomplish the same.

It does not print the part in parentheses which you also indicated in red. So it is unclear whether you want that printed or not.

If not, try this modification:
Code:
awk 'NR>1 && $2~/^\(/{print $1}' RS=[ FS=] "$list" >> "$OUTPUTFILE"

If so, try:
Code:
awk 'NR>1 && $2~/^\(/{sub(/\).*/,")",$2); print $1 $2}' RS=[ FS=] "$list" >> "$OUTPUTFILE"

or if your grep has the -o option, try:
Code:
grep -o '\[[^]]*\]([^)]*)' "$list" >> "$OUTPUTFILE"

But that will include that square brackets
# 3  
Old 02-24-2015
Try (making use of your footnote hint):
Code:
sed 's/\[[^][]*\]([0-9]\{1,3\})//' file3
ref|XP_002371341.1| oxoacyl-ACP reductase, putative [Toxoplasma gondii ME49] gb|EPT24759.1| 3-ketoacyl-(acyl-carrier-protein) reductase [Toxoplasma gondii ME49] gb|ESS34081.1| 3-ketoacyl-(acyl-carrier-protein) reductase     -    243    134    61.4617940199336    1    230    2e-71    80.7308970099668
gb|EPR63881.1| 3-ketoacyl-(acyl-carrier-protein) reductase     -    243    134    61.4617940199336    1    230    2e-71    80.7308970099668
ref|XP_003885852.1| 3-ketoacyl-(Acyl-carrier-protein) reductase, related [Neospora caninum Liverpool] emb|CBZ55826.1| 3-ketoacyl-(Acyl-carrier-protein) reductase, related     -    242    137    61.7940199335548    1    229    8e-71    80.3986710963455
emb|CDJ42835.1| oxoacyl-ACP reductase, putative     -    240    141    61.7940199335548    1    211    3e-64    79.734219269103
emb|CDJ64722.1| oxoacyl-ACP reductase, putative

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Get range out using sed or awk, only if given pattern match

Input: START OS:: UNIX Release: xxx Version: xxx END START OS:: LINUX Release: xxx Version: xxx END START OS:: Windows Release: xxx Version: xxx ENDHere i am trying to get all the information between START and END, only if i could match OS Type. I can get all the data between the... (3 Replies)
Discussion started by: Dharmaraja
3 Replies

2. Shell Programming and Scripting

Sorting content between match pattern and move on with awk and sed

S 0.0 0.0 (reg, inst050) k e f d c S 0.0 0.0 (mux, m030) k g r s x v S 0.0 0.0 (reg, inst020) q s n m (12 Replies)
Discussion started by: ctphua
12 Replies

3. Shell Programming and Scripting

sed : match one pattern then the next consecutive second pattern not working

Ive used this snippet of code on a solaris box thousands of times. But it isnt working on the new linux box sed -n '/interface LoopBack0/{N;/ ip address /p;}' *.conf its driving me nuts !! Is there something Im missing ? (7 Replies)
Discussion started by: popeye
7 Replies

4. Shell Programming and Scripting

Awk-sed help : to remove first and last line with pattern match:

awk , sed Experts, I want to remove first and last line after pattern match "vg" : I am trying : # sed '1d;$d' works fine , but where the last line is not having vg entry it is deleting one line of data. - So it should check for the pattern vg if present , then it should delete the line ,... (5 Replies)
Discussion started by: rveri
5 Replies

5. Shell Programming and Scripting

Awk to match a pattern and perform a search after the first pattern

Hello Guyz I have been following this forum for a while and the solutions provided are super useful. I currently have a scenario where i need to search for a pattern and start searching by keeping the first pattern as a baseline ABC DEF LMN EFG HIJ LMN OPQ In the above text i need to... (8 Replies)
Discussion started by: RickCharles
8 Replies

6. UNIX for Dummies Questions & Answers

sed multiline pattern match

How can I write a script that takes a cisco config file and outputs every occurrence of two, or more, pattern matches through the whole config file? For example, out of a config file, i want to print out every line with interface, description and ip address through the whole file, and disregard... (3 Replies)
Discussion started by: knownasthatguy
3 Replies

7. Shell Programming and Scripting

Sed Pattern Match

Hi, I would like to use SED to do the following string replacement: asd1abc to www1cda asd2abc to www2cda ... asd9abc to www9cda I can use 'asd.abc' to find the orignal string, however I don't know how to generate the target string. Any suggestion? Thanks, ... (2 Replies)
Discussion started by: mail4mz
2 Replies

8. Shell Programming and Scripting

AWK match $1 $2 pattern in file 1 to $1 $2 pattern in file2

Hi, I have 2 files that I have modified to basically match each other, however I want to determine what (if any) line in file 1 does not exist in file 2. I need to match column $1 and $2 as a single string in file1 to $1 and $2 in file2 as these two columns create a match. I'm stuck in an AWK... (9 Replies)
Discussion started by: right_coaster
9 Replies

9. Shell Programming and Scripting

Match a pattern starting with sub-pattern using sed

Hi all, I've been experiencing a difficulty trying to match a number and write it to a new file. My input file is: input.txt It contains the lines: 103P 123587.256971 3.21472112 3.1517423 1.05897234566427 58.2146258 12.35478 25.3612489 What would be the sed command to... (17 Replies)
Discussion started by: Biederman
17 Replies

10. Shell Programming and Scripting

Use to awk to match pattern, and print the pattern

Hi, I know how to use awk to search some expressions like five consecutive numbers, , this is easy. However, how do I make awk print the pattern that is been matched? For example: input: usa,canada99292,japan222,france59664,egypt223 output:99292,59664 (6 Replies)
Discussion started by: grossgermany
6 Replies
Login or Register to Ask a Question