[Solved] awk solution to add sequential numbers based on a word


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers [Solved] awk solution to add sequential numbers based on a word
# 1  
Old 07-04-2013
[Solved] awk solution to add sequential numbers based on a word

Hi experts, I've been struggling to format a large genetic dataset. It's complicated to explain so I'll simply post example input/output

Code:
$cat input.txt
ID        GENE    pos    start  end
blah1    coolgene   1      3     5
blah2    coolgene   1      4     6
blah3    coolgene   1      4     7
blah4    BADgene    1      4     3245
blah5    BADgene    1      4     234
blah6    neatgene   1      24    45
blah7    neatgene   1      24    45
blah8    neatgene   1      24    45
blah9    neatgene   1      24    45
blah10   neatgene   1      24    45

$cat output.txt
ID        gene    NEWcol  pos    start  end
blah1    coolgene    1      1      3     5
blah2    coolgene    2      1      4     6
blah3    coolgene    3      1      4     7
blah4    BADgene     1      1      4     3245
blah5    BADgene     2      1      4     234
blah6    neatgene    1      1      24    45
blah7    neatgene    2      1      24    45
blah8    neatgene    3      1      24    45
blah9    neatgene    4      1      24    45
blah10   neatgene    5      1      24    45

For every word in the "gene" column, I would like a new column (NEWcol) that sequentially numbers each line. ID column is a unique identifier column.

Any help is appreciated!!
Many thanks.
# 2  
Old 07-04-2013
Code:
awk 'NR==1 { $2=$2"\t"NEWcol" ; print ; next }
{ A[$2]++ ; $2=$2"\t"A[$2] } 1' inputfile

These 2 Users Gave Thanks to Corona688 For This Post:
# 3  
Old 07-04-2013
Thanks so much that appears to work! I think it was just missing a quote before "NEWcol"

Many thanks for your reply.
# 4  
Old 07-08-2013
Quote:

Posted by Corona688
Code:
awk 'NR==1 { $2=$2"\t"NEWcol" ; print ; next }{ A[$2]++ ; $2=$2"\t"A[$2] } 1' inputfile

Thanks a lot Corona688 for this great command, could you please also let us know the concept behind same. Will be grateful to you.


Thanks,
R. Singh
# 5  
Old 07-08-2013
First line, add "NEWcol" to second column and print.

Every other line, increase count for A[$2] where $2 is the second column, add that to the second column, and print.
This User Gave Thanks to Corona688 For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to adjust coordinates in field based on sequential numbers in another field

I am trying to output a tab-delimited result that uses the data from a tab-delimited file to combine and subtract specific lines. If $4 matches in each line then the first matching sequential $6 value is added to $2, unless the value is 1, then the original $2 is used (like in the case of line... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. UNIX for Dummies Questions & Answers

Replace groups into sequential numbers

I have a file that looks like this: n1 1 n2 1 n3 1 n4 3 n4 3 n2 5 n2 5 n2 5 n2 5 n3 5 n3 5 n4 6 n7 6 that is a name followed be a descriptive number. I want to make these numbers sequential starting from 0 but without changing the "neighbours" each name belongs to. So the above... (3 Replies)
Discussion started by: FelipeAd
3 Replies

3. UNIX for Dummies Questions & Answers

awk solution to duplicate lines based on column

Hi experts, I have a tab-delimited file with one column containing values separated by a comma. I wish to duplicate the entire line for every value in that comma-delimited field. For example: $cat file 4444 4444 4444 4444 9990 2222,7777 6666 2222 ... (3 Replies)
Discussion started by: torchij
3 Replies

4. UNIX for Dummies Questions & Answers

[Solved] Add row of numbers

Hi, Trying to add a row of numbers. There are 24 number across. Would like to have column 25 sum each row. 10 3 45 49 0 24... Sum 3 200 3 9 1 3 ...... Sum 9 7 20 9 8 10 ...... Sum Thank you. (5 Replies)
Discussion started by: jimmyf
5 Replies

5. Shell Programming and Scripting

Sequential numbers

Hi All, I am looking for a simple way to write numbers to a file sequentially starting from 1 and ending on a specified upper limit. Example of the output file is below Example 1 2 3 4 5 . . . . 1000 please let me know the best way to do it. (10 Replies)
Discussion started by: Lucky Ali
10 Replies

6. Shell Programming and Scripting

word frequency counter - awk solution?

Dear all, i need your help on this. There is a text file, i need to count word frequency for each word with frequency >40 in each line of file and output it into another file with columns like this: word1,word2,word3, ...wordn 0,0,1 1,2,0 3,2,0 etc -- each raw represents... (13 Replies)
Discussion started by: irrevocabile
13 Replies

7. Shell Programming and Scripting

awk fetch numbers after the word

Hi, I would want to fetch all the numbers after a word the number of characters could very. how can I do that? below is the example of the data and the expected output sample data 03 xxxx occurs 1090 times. 04 aslkja occurs 10 times. I would want to fetch 10 & 1090 separately. (13 Replies)
Discussion started by: ahmedwaseem2000
13 Replies

8. Shell Programming and Scripting

How to fetch rows based on line numbers or based on the beginning of a word?

I have a file which will have rows like shown below, ST*820*316054716 RMR*IV*11333331009*PO*40.31 REF*IV*22234441009*xsss471-2762 DTM*003*091016 ENT*000006 RMR*IV*2222234444*PO*239.91 REF*IV*1234445451009*LJhjlkhkj471-2762 </SPAN> DTM*003* 091016 RMR*IV*2223344441009*PO*40.31... (18 Replies)
Discussion started by: Muthuraj K
18 Replies

9. Shell Programming and Scripting

AWK help to add up sequential values

Hello All! As a beginner user i want to add up sequential values in a text file and want to print total sum as output.The Text file will have values like the following: class1{root}>less SUM.txt 1140.00 1155.00 1183.00 ... # it continues # i tried to write a... (1 Reply)
Discussion started by: EAGL€
1 Replies

10. Shell Programming and Scripting

AWK solution to subtract multiple columns of numbers

Hope somebody is happy. NR==1 { num_columns=split( $0, menuend ); next; } { split( $0, substrend ); for ( i=1; i<=NF; i++ ) { minuend -= substrend; } } END { print "Result:"; for ( i=1; i<=num_columns; i++ ) { printf(... (3 Replies)
Discussion started by: awkward
3 Replies
Login or Register to Ask a Question