Column content match and add suffix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Column content match and add suffix
# 1  
Old 05-31-2012
Column content match and add suffix

My input

Code:
chr3    galGal3_xenoRefFlat     CDS     4178235 4178264 0.000000        +       0       gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19";
chr3    galGal3_xenoRefFlat     exon    4178235 4178264 0.000000        +       .	gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19";
chr3    galGal3_xenoRefFlat     CDS     4178746 4178826 0.000000        +       0       gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19";
chr3    galGal3_xenoRefFlat     exon    4178746 4178826 0.000000        +       .       gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19";
chr3    galGal3_xenoRefFlat     CDS     4179277 4179338 0.000000        +       0       gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19";
chr3    galGal3_xenoRefFlat     exon    4179277 4179338 0.000000        +       .       gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19";
chr3    galGal3_xenoRefFlat     CDS     4184594 4184751 0.000000        +       0       gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19";
chr3    galGal3_xenoRefFlat     exon    4184594 4184751 0.000000        +       .	gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19";
chr3    galGal3_xenoRefFlat     CDS     4187403 4187538 0.000000        +       1	gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19";
chr3    galGal3_xenoRefFlat     exon    4187403 4187541 0.000000        +       .       gene_id "T6J4.19; T6J4_19"; transcript_id "T6J4.19; T6J4_19";
chr3    galGal3_xenoRefFlat     CDS     4179280 4179336 0.000000        +       0       gene_id "T15C9.2"; transcript_id "T15C9.2";
chr3    galGal3_xenoRefFlat     exon    4179280 4179336 0.000000        +       .	gene_id "T15C9.2"; transcript_id "T15C9.2";
chr3    galGal3_xenoRefFlat     CDS     4180045 4180087 0.000000        +       1	gene_id "AT3G26020"; transcript_id "AT3G26020_dup1";
chr3    galGal3_xenoRefFlat     exon    4180045 4180087 0.000000        +       .       gene_id "AT3G26020"; transcript_id "AT3G26020_dup1";
chr3    galGal3_xenoRefFlat     CDS     4187410 4187538 0.000000        +       0	gene_id "AT3G26020"; transcript_id "AT3G26020_dup1";
chr3    galGal3_xenoRefFlat     exon    4187410 4187541 0.000000        +       .       gene_id "AT3G26020"; transcript_id "AT3G26020_dup1";
chr3    galGal3_xenoRefFlat     CDS     4178746 4178876 0.000000        +       0	gene_id "si687042f02"; transcript_id "si687042f02";
chr3    galGal3_xenoRefFlat     exon    4178746 4178876 0.000000        +       .	gene_id "si687042f02"; transcript_id "si687042f02";

My output

Code:
chr3    galGal3_xenoRefFlat     CDS     4178235 4178264 0.000000        +       0       gene_id "T6J4.19_1"; transcript_id "T6J4.19_1";
chr3    galGal3_xenoRefFlat     exon    4178235 4178264 0.000000        +       .	gene_id "T6J4.19_2"; transcript_id "T6J4.19_2";
chr3    galGal3_xenoRefFlat     CDS     4178746 4178826 0.000000        +       0       gene_id "T6J4.19_2"; transcript_id "T6J4.19_3";
chr3    galGal3_xenoRefFlat     exon    4178746 4178826 0.000000        +       .       gene_id "T6J4.19_3"; transcript_id "T6J4.19_4";
chr3    galGal3_xenoRefFlat     CDS     4179277 4179338 0.000000        +       0       gene_id "T6J4.19_4"; transcript_id "T6J4.19_5";
chr3    galGal3_xenoRefFlat     exon    4179277 4179338 0.000000        +       .       gene_id "T6J4.19_5"; transcript_id "T6J4.19_6";
chr3    galGal3_xenoRefFlat     CDS     4184594 4184751 0.000000        +       0       gene_id "T6J4.19_6"; transcript_id "T6J4.19_7";
chr3    galGal3_xenoRefFlat     exon    4184594 4184751 0.000000        +       .	gene_id "T6J4.19_7"; transcript_id "T6J4.19_8";
chr3    galGal3_xenoRefFlat     CDS     4187403 4187538 0.000000        +       1	gene_id "T6J4.19_8"; transcript_id "T6J4.19_9";
chr3    galGal3_xenoRefFlat     exon    4187403 4187541 0.000000        +       .       gene_id "T6J4.19_9"; transcript_id "T6J4.19_10";
chr3    galGal3_xenoRefFlat     CDS     4179280 4179336 0.000000        +       0       gene_id "T15C9.2_1"; transcript_id "T15C9.2_1";
chr3    galGal3_xenoRefFlat     exon    4179280 4179336 0.000000        +       .	gene_id "T15C9.2_2"; transcript_id "T15C9.2_2";
chr3    galGal3_xenoRefFlat     CDS     4180045 4180087 0.000000        +       1	gene_id "AT3G26020_1"; transcript_id "AT3G26020_dup1_1";
chr3    galGal3_xenoRefFlat     exon    4180045 4180087 0.000000        +       .       gene_id "AT3G26020_2"; transcript_id "AT3G26020_dup1_2";
chr3    galGal3_xenoRefFlat     CDS     4187410 4187538 0.000000        +       0	gene_id "AT3G26020_3"; transcript_id "AT3G26020_dup1_3";
chr3    galGal3_xenoRefFlat     exon    4187410 4187541 0.000000        +       .       gene_id "AT3G26020_4"; transcript_id "AT3G26020_dup1_4";
chr3    galGal3_xenoRefFlat     CDS     4178746 4178876 0.000000        +       0	gene_id "si687042f02_1"; transcript_id "si687042f02_1";
chr3    galGal3_xenoRefFlat     exon    4178746 4178876 0.000000        +       .	gene_id "si687042f02_2"; transcript_id "si687042f02_2";

Basically, what I want to do is match the content of one row's gene_id and transcript_id which are separated by space to the other row. If they are the same, I would like to add a suffix as a series, i.e _1, _2...so on.

Please note that the columns are separated by spaces.

Thanks for ur help.
# 2  
Old 05-31-2012
I think you can do this in a awk one liner. This will basically just append a number corresponding to the number of times a value has appeared consecutively in each column.

Code:
awk '{ for(i=1; i<=NF; i++) { if($i == a[i] ) { k[i] = (k[i]+1); printf $i"_"k[i]" " } else { printf $i"_1 "; k[i]=1 } } print ""; split($0,a," ") }' input > output


Input:

Code:
A B C
A B C
A D E
B D E

Output:

Code:
A_1 B_1 C_1 
A_2 B_2 C_2 
A_3 D_1 E_1 
B_1 D_2 E_2

This User Gave Thanks to hydrabane For This Post:
# 3  
Old 06-01-2012
Hi hydrabane,

Thanks for ur valuable time.

Your code works, but it is printing the suffix for each and every column like this

Code:
chr3_1 galGal3_xenoRefFlat_1 CDS_1 4178235_1 4178264_1 0.000000_1 +_1 0_1 gene_id_1 "T6J4.19;_1 T6J4_19";_1 transcript_id_1 "T6J4.19;_1 T6J4_19";_1 
chr3_2 galGal3_xenoRefFlat_2 exon_1 4178235_2 4178264_2 0.000000_2 +_2 ._1 gene_id_2 "T6J4.19;_2 T6J4_19";_2 transcript_id_2 "T6J4.19;_2 T6J4_19";_2 
chr3_3 galGal3_xenoRefFlat_3 CDS_1 4178746_1 4178826_1 0.000000_3 +_3 0_1 gene_id_3 "T6J4.19;_3 T6J4_19";_3 transcript_id_3 "T6J4.19;_3 T6J4_19";_3 
chr3_4 galGal3_xenoRefFlat_4 exon_1 4178746_2 4178826_2 0.000000_4 +_4 ._1 gene_id_4 "T6J4.19;_4 T6J4_19";_4 transcript_id_4 "T6J4.19;_4 T6J4_19";_4 
chr3_5 galGal3_xenoRefFlat_5 CDS_1 4179277_1 4179338_1 0.000000_5 +_5 0_1 gene_id_5 "T6J4.19;_5 T6J4_19";_5 transcript_id_5 "T6J4.19;_5 T6J4_19";_5 
chr3_6 galGal3_xenoRefFlat_6 exon_1 4179277_2 4179338_2 0.000000_6 +_6 ._1 gene_id_6 "T6J4.19;_6 T6J4_19";_6 transcript_id_6 "T6J4.19;_6 T6J4_19";_6 
chr3_7 galGal3_xenoRefFlat_7 CDS_1 4184594_1 4184751_1 0.000000_7 +_7 0_1 gene_id_7 "T6J4.19;_7 T6J4_19";_7 transcript_id_7 "T6J4.19;_7 T6J4_19";_7 
chr3_8 galGal3_xenoRefFlat_8 exon_1 4184594_2 4184751_2 0.000000_8 +_8 ._1 gene_id_8 "T6J4.19;_8 T6J4_19";_8 transcript_id_8 "T6J4.19;_8 T6J4_19";_8 
chr3_9 galGal3_xenoRefFlat_9 CDS_1 4187403_1 4187538_1 0.000000_9 +_9 1_1 gene_id_9 "T6J4.19;_9 T6J4_19";_9 transcript_id_9 "T6J4.19;_9 T6J4_19";_9 
chr3_10 galGal3_xenoRefFlat_10 exon_1 4187403_2 4187541_1 0.000000_10 +_10 ._1 gene_id_10 "T6J4.19;_10 T6J4_19";_10 transcript_id_10 "T6J4.19;_10 T6J4_19";_10


I am scared that you missed my point. I would like to add the suffix only to the content in inverted commas after gene_id and transcript_id columns.

I hope you got some time to solve my problem. Thanks in advance.

---------- Post updated 06-01-12 at 10:33 AM ---------- Previous update was 05-31-12 at 04:22 PM ----------

Any thoughts friends?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

If pattern match in other column, modify column 3.

My command sed will modify everything in column 3 if i will use the command below. I want to search for a pattern then modify everything in column 3. sed -i 's/\|165\|/server1/g' file.txt Input: 01-31-2019 19:14:05|device|165|1548962040165|5c5348f9-0804-1111|file_attach|7271|587|smtp|... (6 Replies)
Discussion started by: invinzin21
6 Replies

2. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

3. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

4. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

5. Shell Programming and Scripting

Change file content 4 column to one Column using script

Hi Gurus, I have file content sample: ,5113955056,,TAgent-Suspend ,5119418233,,TAgent-Suspend ,5102119078,,TAgent-Suspend filenames 120229H5_suspend, 120229H6_unsuspend I receive those files one of directory /home/temp/ I need following: 1. Backup first /home/temp/ file to... (5 Replies)
Discussion started by: thepurple
5 Replies

6. Shell Programming and Scripting

Awk or Sed, fubd match in column, then edit column.

FILE A: 9780743551526,(Abridged) 9780743551779,(Unabridged) 9780743582469,(Abridged) 9780743582483,(Unabridged) 9780743563468,(Abridged) 9780743563475,(Unabridged) FILE B: c3saCandyland 9780743518321 "CANDYLAND" "MCBAIN, ED" 2001 c3sbCandyland 9780743518321 ... (7 Replies)
Discussion started by: glev2005
7 Replies

7. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies

8. Shell Programming and Scripting

SED to add a suffix

Hi all, Im trying to make a proper hosts.allow with the lists of sshbl.org to block the ssh brute force attackers. The list is a text file with an IP on every line. What I've gotten up sofar is to prefix "sshd : " on every line, but I need a " : deny" suffix behind every line as well. ... (9 Replies)
Discussion started by: necron
9 Replies

9. Shell Programming and Scripting

How to add a new line between different column data content?

Input file: Germany 10 500 5000 Germany 20 500 5000 Germany 50 10 500 England 5 10 25 USA 30 25 55 USA 20 35 90 Japan 2 5 60 Singapore 50 30 90 Singapore 150 230 290 Output file: Germany 10 500 5000 Germany 20 500 5000 Germany 50 10 500 England 5 10 25 (7 Replies)
Discussion started by: patrick87
7 Replies

10. Shell Programming and Scripting

add a column and match two files

I have two files: File #1: ...... ATOM 91 H2'' G A 3 17.357 8.753 -30.401 1.00 0.00 A ATOM 92 O2' G A 3 16.590 9.059 -28.495 1.00 0.00 A ATOM 93 H2' G A 3 16.670 9.792 -27.880 1.00 0.00 A ATOM 94 ... (6 Replies)
Discussion started by: rockytodd
6 Replies
Login or Register to Ask a Question