Sponsored Content
Top Forums Shell Programming and Scripting Split certain strings in a line for a specific column. Post 302914412 by redse171 on Monday 25th of August 2014 06:44:01 PM
Old 08-25-2014
Split certain strings in a line for a specific column.

Hi,

i need help to extract certain strings/words from lines with different length. I have 3 columns separated by tab delimiter. like below

Code:
Probable arabinan endo-1,5-alpha-L-arabinosidase A	(EC 3.2.1.99) (Endo-1,5-alpha-L-arabinanase A) (ABN A) abnA	Ady3G14620
Probable arabinan endo-1,5-alpha-L-arabinosidase B	(EC 3.2.1.99) (Endo-1,5-alpha-L-arabinanase B) (ABN B) abnB	Ady2G14150
Probable arabinan endo-1,5-alpha-L-arabinosidase C	(EC 3.2.1.99) (Endo-1,5-alpha-L-arabinanase C) (ABN C) abnC	Ady6G00770
Isocitrate lyase (ICL) (Isocitrase) (Isocitratase)	(EC 4.1.3.1) icl1 icl	Ady4G13510
Putative aconitate hydratase, mitochondrial (Aconitase 2)	(EC 4.2.1.-) acoB	Ady1g06810
Putative aconitate hydratase (Aconitase 3)	(EC 4.2.1.-) acoC	Ady8g07140
Aconitate hydratase, mitochondrial (Aconitase)	(EC 4.2.1.3) (Citrate hydro-lyase) (Homocitrate dehydratase)	Ady6g12930
Adenine deaminase (ADE)	(EC 3.5.4.2) (Adenine aminohydrolase) (AAH) aah1	Ady2G09150
Disintegrin and metalloproteinase domain-containing protein B (ADAM B)	(EC 3.4.24.-) ADM-B	Ady4G11150
Probable alpha-galactosidase D	(EC 3.2.1.22) (Melibiase D) aglD	Ady4G03585
Arginine biosynthesis bifunctional protein ArgJ, mitochondrial [Cleaved into: Arginine biosynthesis bifunctional protein ArgJ alpha chain; Arginine biosynthesis bifunctional protein ArgJ beta chain] [Includes: Glutamate N-acetyltransferase (GAT)	(EC 2.3.1.35) (Ornithine acetyltransferase) (OATase) (Ornithine transacetylase); Amino-acid acetyltransferase	Ady5G08120

I want to split $2 to take only the "EC x.x.x.x" for it and ignore the rest of the words in $2 and print $1,$2 (EC x.x.x.x only) and $3. and i want to remove it's "brackets" too. The output should be like below

Code:
Probable arabinan endo-1,5-alpha-L-arabinosidase A	EC 3.2.1.99	Ady3G14620
Probable arabinan endo-1,5-alpha-L-arabinosidase B	EC 3.2.1.99	Ady2G14150
Probable arabinan endo-1,5-alpha-L-arabinosidase C	EC 3.2.1.99	Ady6G00770
Isocitrate lyase (ICL) (Isocitrase) (Isocitratase)	EC 4.1.3.1	Ady4G13510
Putative aconitate hydratase, mitochondrial (Aconitase 2)	EC 4.2.1.-	Ady1g06810
Putative aconitate hydratase (Aconitase 3)	EC 4.2.1.-	Ady8g07140
Aconitate hydratase, mitochondrial (Aconitase)	EC 4.2.1.3	Ady6g12930
Adenine deaminase (ADE)	EC 3.5.4.2      Ady2G09150
Disintegrin and metalloproteinase domain-containing protein B (ADAM B)	EC 3.4.24.-	Ady4G11150
Probable alpha-galactosidase D	EC 3.2.1.22 (Melibiase D)	Ady4G03585
Arginine biosynthesis bifunctional protein ArgJ, mitochondrial [Cleaved into: Arginine biosynthesis bifunctional protein ArgJ alpha chain; Arginine biosynthesis bifunctional protein ArgJ beta chain] [Includes: Glutamate N-acetyltransferase GAT	EC 2.3.1.35	Ady5G08120

I did the following codes but still i could not remove the words following the "EC x.x.x.x" for $2. and the sed scripts remove all brackets, i just need to remove brackets for EC.x.x.x.x only. I am sure it should not be that complicated but just couldn't figure out.

Code:
awk -F. '{print $1"."$2"."$3"."$4,$4}' inputfile | sed 's/(\|)//g'

Any help would be appreciated.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

subtitute specific column in line

Hi All, I have problem to solve aaaa,aaaaa,aaa,aaaa,aaa,aa ,aa bbbb,bbbbbbbbb,bbbb,bbbbb ,bb to aaaa;aaaaa,aaa ;aaaa;aaa,aa ;aa bbbb;bbbbbbbbb;bbbb;bbbbb ;bb i try use sed to find and replace, but dont know how to replace specific column position. can u help me?? thx for the... (11 Replies)
Discussion started by: MomoChan
11 Replies

2. Shell Programming and Scripting

On the command line using bash, how do you split a string by column?

Input: MD5(secret.txt)= fe66cbf9d929934b09cc7e8be890522e MD5(secret2.txt)= asd123qwlkjgre5ug8je7hlt488dkr0p I want the results to look like these, respectively: MD5(secret.txt)= fe66cbf9 d929934b 09cc7e8b e890522e MD5(secret2.txt)= asd123qw lkjgre5u g8je7hlt 488dkr0p Basically, keeping... (11 Replies)
Discussion started by: teiji
11 Replies

3. Shell Programming and Scripting

Counting rows line by line from a specific column using Awk

Dear UNIX community, I would like to to count characters from a specific row and have them displayed line-by-line. I have a file called testAwk2.csv which contain the following data: rabbit penguin goat giraffe emu ostrich I would like to count in the middle row individually... (4 Replies)
Discussion started by: vnayak
4 Replies

4. Shell Programming and Scripting

Split a file into multiple files based on line numbers and first column value

Hi All I have one query,say i have a requirement like the below code should be move to diffent files whose maximum lines can be of 10 lines.Say in the below example,it consist of 14 lines. This should be moved logically using the data in the fisrt coloumn to file1 and file 2.The data of first... (2 Replies)
Discussion started by: sarav.shan
2 Replies

5. Shell Programming and Scripting

Split each column in TSV file to be new line?

My TSV looks like: Hello my name is John \t Hello world \t Have a good day! \t See you later! Is there a simple bash script that splits the tsv on tab to: Hello my name is John Hello world Have a good day! See you later! I'm really stuck, would appreciate any help! (5 Replies)
Discussion started by: pxalpine
5 Replies

6. Shell Programming and Scripting

Converting Single Column into Multiple rows, but with strings to specific tab column

Dear fellows, I need your help. I'm trying to write a script to convert a single column into multiple rows. But it need to recognize the beginning of the string and set it to its specific Column number. Each Line (loop) begins with digit (RANGE). At this moment it's kind of working, but it... (6 Replies)
Discussion started by: AK47
6 Replies

7. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

8. Shell Programming and Scripting

Help with print out line that have different record in specific column

Input file 1: - 7367 8198 - 8225 9383 + 9570 10353 Input file 2: - 2917 3667 - 3851 4250 + 4517 6302 + 6302 6740 + 6768 7524 + 7648 8170 + 8272 8896 + 8908 9915 - 10010 ... (18 Replies)
Discussion started by: perl_beginner
18 Replies

9. Shell Programming and Scripting

Overwrite specific column in xml file with the specific column from adjacent line

I have an xml file dumped from rrd file, that I want to "patch" so the xml file doesn't contain any blank hole in the resulting graph of the rrd file. Here is the file. <!-- 2015-10-12 14:00:00 WIB / 1444633200 --> <row><v> 4.0419731265e+07 </v><v> 4.5045912770e+06... (2 Replies)
Discussion started by: rk4k
2 Replies

10. UNIX for Beginners Questions & Answers

Deletion of strings depending of the value in a specific column

Happy new year guys! I have a new question for you! Ubuntum, Bash version: 4.3.46 BashI have a csv file, composed from several columns. INPUT x1 x2 x3 x4 x5 as 10 32 T 3 sd 50 7 B 48 af 18 98 D 25 fe 75 55 P 15 I want to cancel the strings where the x2 and/or x3 values are <=10... (6 Replies)
Discussion started by: echo manolis
6 Replies
All times are GMT -4. The time now is 04:26 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy