Appending lines with word frequencies, ordering and indexing a column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Appending lines with word frequencies, ordering and indexing a column
# 1  
Old 01-01-2011
Appending lines with word frequencies, ordering and indexing a column

Dear All,

I have the following input data:
Code:
w1	20	g1
w1	10	g1
w2	12	g1
w2	23	g1
w3	10	g1
w3	17	g1
w3	12.5	g1
w3	21	g1
w4	11	g1
w4	13.2	g1
w4	23	g1
w4	18	g1

First I seek to find the word frequencies in col1 and sort col2 in ascending order for each change in a col1 word. Second, append the frequencies and orders to each line such as:

Code:
W	Z	U	freq(W)	Z-order

w1	10	g1	2	1
w1	20	g1	2	2
w2	12	g1	2	1
w2	23	g1	2	2
w3	10	g1	4	1
w3	12.5	g1	4	2
w3	17	g1	4	3
w3	21	g1	4	4
w4	11	g1	4	1
w4	13.2	g1	4	2
w4	18	g1	4	3
w4	23	g1	4	4

I trying to complete the following code but not making any headway:

Code:
awk 'NR==FNR{words[++nwords]=$1;next}
{for(i=1;i<=NF;i++)freq[$i]++}
END{for(w=1;w<=nwords;w++)
print words[w], freq[words[w]]+0}' infile

I therefore need your help.

Many thanks,

Ghetz
# 2  
Old 01-02-2011
Try this,

Code:
sort -nk2,1 sortfile | awk 'NR==FNR{if(s!=$1){j=0;s=$1}a[$1]=++j;next} {if(p!=$1){i=0;p=$1}if(a[p]){$4=a[p]}$5=++i;print }' input_file input_file

# 3  
Old 01-02-2011
NR==FNR is only useful when there is more than one input file. Try this, assuming column one is in sorted order:
Code:
awk 'function pr(){if(p)for(i=1;i<=n;i++){print A[i],n,i;delete A[i]};p=$1;n=0}p!=$1{pr()}{A[++n]=$0}END{pr()}' OFS='\t' infile


Last edited by Scrutinizer; 01-03-2011 at 01:08 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 01-02-2011
Dear pravin27

First many thanks for your reply.

I tried your code replacing "sortfile" with "input_file":
Code:
sort -nk2,1 sortfile | awk 'NR==FNR{if(s!=$1){j=0;s=$1}a[$1]=++j;next} {if(p!=$1){i=0;p=$1}if(a[p]){$4=a[p]}$5=++i;print }' input_file input_file


but some how the ascending order sort on col2 does not work. It produces the same output as that of Scrutinizer. Is there something I am missing?

Regards,

Ghetz
# 5  
Old 01-03-2011
Hi Ghetz,

Sorry ....

Try this,

Code:
sort -nk2,1 inputfile -o inputfile; awk 'NR==FNR{if(s!=$1){j=0;s=$1}a[$1]=++j;next} {if(p!=$1){i=0;p=$1}if(a[p]){$4=a[p]}$5=++i;print }' OFS="\t" inputfile inputfile


Last edited by pravin27; 01-03-2011 at 01:35 AM..
This User Gave Thanks to pravin27 For This Post:
# 6  
Old 01-03-2011
Many thanks pravin27,

Your code works beautifully.

Regards,

Ghetz
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to search for a word in column header that fully matches the word not partially in awk?

I have a multicolumn text file with header in the first row like this The headers are stored in an array called . which contains I want to search for each elements of this array from that multicolumn text file. And I am using this awk approach for ii in ${hdr} do gawk -vcol="$ii" -F... (1 Reply)
Discussion started by: Atta
1 Replies

2. Shell Programming and Scripting

Indexing each repeating pattern of rows in a column using awk/sed

Hello All, I have data like this in a column. 0 1 2 3 0 3 4 5 6 0 1 2 3 etc. where 0 identifies the start of a pattern in my data. So I need the output like below using either awk/sed. 0 1 (2 Replies)
Discussion started by: ks_reddy
2 Replies

3. UNIX for Dummies Questions & Answers

Search word in 3rd column and move it to next column (4th)

Hi, I have a file with +/- 13000 lines and 4 column. I need to search the 3rd column for a word that begins with "SAP-" and move/skip it to the next column (4th). Because the 3rd column need to stay empty. Thanks in advance.:) 89653 36891 OTR-60 SAP-2 89653 36892 OTR-10 SAP-2... (2 Replies)
Discussion started by: AK47
2 Replies

4. Shell Programming and Scripting

Appending a word to the last line

Hi, I would like to append input given id at last line of file. For ex: In the following sample.txt file i would like to append the input given user id (after id6,id7) but it is adding on the next line instead same line. Sample.txt read=id1,id2,id3 write=id4,id5,id6 Thanks Raveendran (8 Replies)
Discussion started by: raveendran.l
8 Replies

5. Shell Programming and Scripting

Re ordering lines - Awk

Is it possible to re-order certain rows as columns (of large files). Few lines from the file for reference. input Splicing Factor: Tra2beta, Motif: aaguguu, Cutoff: 0.5000 Sequence Position Genomic Coordinate K-mer Score 97 chr1:67052604 uacuguu 0.571 147... (3 Replies)
Discussion started by: quincyjones
3 Replies

6. Shell Programming and Scripting

Search the word to be deleted and delete lines above this word starting from P1 to P3

Hi, I have to search a word in a text file and then I have to delete lines above from the word searched . For eg suppose the file is like this: Records P1 10,23423432 ,77:1 ,234:2 P2 10,9089004 ,77:1 ,234:2 ,87:123 ,9898:2 P3 456456 P1 :123,456456546 P2 abc:324234 (2 Replies)
Discussion started by: vsachan
2 Replies

7. Homework & Coursework Questions

word ordering problem HELP please (linux)

Hi guys I need you ,please help me i have to do this for tomorow and i don't understand how to do Q1 : Order the words of RADIO.txt by frequency Q2 : Order the words of RADIO.txt in alphabétique order Q3 : Order the words of RADIO.txt par ordre "rhymique" (exemple, put togeder words which are... (1 Reply)
Discussion started by: Lili
1 Replies

8. Shell Programming and Scripting

trying to make an AWK code for ordering numbers in a column from least to highest

Hi all, I have a large column of numbers like 5.6789 2.4578 9.4678 13.5673 1.6589 ..... I am trying to make an awk code so that awk can easily go through the column and arrange the numbers from least to highest like 1.6589 2.4578 5.6789 ....... can anybody suggest, how can I do... (5 Replies)
Discussion started by: ananyob
5 Replies

9. Shell Programming and Scripting

appending column file

Hi all, I have two files with the same number of lines the first file is a.dat and looks like 0.000 1.000 1.000 2.000 ... the fields are tab separated the second file is b.dat and looks like 1.2347 0.546 2.3564 0.321 ... the fields are tab separated I would like to have a file c.dat... (4 Replies)
Discussion started by: f_o_555
4 Replies

10. Shell Programming and Scripting

need help appending lines/combining lines within a file...

Is there a way to combine two lines onto a single line...append the following line onto the previous line? I have the following file that contains some blank lines and some lines I would like to append to the previous line... current file: checking dsk c19t2d6 checking dsk c19t2d7 ... (2 Replies)
Discussion started by: mr_manny
2 Replies
Login or Register to Ask a Question