Advanced: Sort, count data in column, append file name


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Advanced: Sort, count data in column, append file name
# 8  
Old 08-09-2012
Quote:
Originally Posted by raj_saini20
Code:
awk 'BEGIN{i=1}{
            x=$1;
            $1=y;
            if(!match(c[$0],x))
                {
                    if(c[$0])
                        {
                            c[$0]=substr(c[$0],1)","substr(x,1)
                        }
                    else
                        {
                            c[$0]=x
                };
                    
                };
            if(a[$0])
                {
                    a[$0]++
                }
            else
                {
                    a[$0]=1;
                    b[i]=$0;
                    i++
                }
        }
 END{for(k=1;k<i;k++){print a[b[k]],b[k],c[b[k]]}}'  filename

output is
Code:
2  aab rrt File1,File3
2  ccd bbt File1,File2
3  ggt iir File2,File3,File1

sort on column two if you need output sorted on column two.
This is also very interesting as it might be easier for me to modify. However, when running this it seems to output all columns, counting correctly the and adding the "lane#" to each new line. The data in my columns, as illustrated with x, does not just contain x. They might have any letter or charater. Is this the problem with the above awk command? How are the right columns (e.g. 4 and 5) selected printed in the awk command?
# 9  
Old 08-09-2012
in x i am storing the name of file for particular row and y is nothing only used to make $1 value null.
# 10  
Old 08-09-2012
Quote:
Originally Posted by raj_saini20
in x i am storing the name of file for particular row and y is nothing only used to make $1 value null.
Ok, but how is this awk command selecting which columns to include. From the script I cannot see how columns 4 and 5 are used to compare their values. I am sorry for all the questions, but trying to learn as much as possible. It might be that I want to include more columns later on and then it would be very useful that I can make small changes to the script. I am very happy with your time so far, so if you do not have time please feel free to let it slide Smilie
# 11  
Old 08-09-2012
i am storing each unique column sum in associative array a[]. and in the end printing the sum of each unique occurrence. and unique column are stored in array b[] used in the end for printing unique column
# 12  
Old 08-09-2012
Quote:
Originally Posted by raj_saini20
i am storing each unique column sum in associative array a[]. and in the end printing the sum of each unique occurrence. and unique column are stored in array b[] used in the end for printing unique column
Ok, I see. That is why it is not working for:
Code:
File1 bb xx xx aab rrt xx
File1 xx xx xx ccd bbt xx
File1 xx xx xx ggt iir xx
File2 xx xx xx ggt iir xx
File2 xx xx xx ccd bbt xx
File3 aa xx xx aab rrt xx
File3 xx xx xx ggt iir xx

as it will print it as:
Code:
1  bb xx xx aab rrt xx File1
2  xx xx xx ccd bbt xx File1,File2
3  xx xx xx ggt iir xx File1,File2,File3
1  aa xx xx aab rrt xx File3

It will not compare only column 5 and 6 but all columns and hence say that 1st and 6th line in column 2 are different, even though I want to know when the lines are identical in column 5 and 6. From the above dataset I want to get, independent of what is in the other columns, specific columns e.g. 5 and 6:
Code:
2  aab rrt File1
2  ccd bbt File1,File2
3  ggt iir File1,File2,File3

Is it possible to specify, in the suggested awk command, which columns to compare and display/save?
# 13  
Old 08-10-2012
I have seen the powerful perl program and would like to learn more.

Also, I would like to add a few more columns to the output file.

From the perl program given earlier:
Code:
perl -lane '$c{"$F[4] $F[5]"}++; $x{"$F[4] $F[5]"} .= "$F[0]," if  $F[5]; END{for(keys %x){$x{$_}=~s/,$//;print "$c{$_} $_ $x{$_}"}}'  file[123]

I decided to have a go (modification in bold):
Code:
perl -lane '$b{"$F[3]"};$c{"$F[4] $F[5]"}++; $x{"$F[4] $F[5]"} .= "$F[0]," if $F[5]; END{for(keys %x){$x{$_}=~s/,$//;print "$c{$_} $_ $x{$_} $b{$_}"}}'  file[123] > output.file

But this is not adding column 3 to the end of each line in the output file. What am I missing here? I have very little knowledge but would really like to know how to modify the program so I can add more columns to the output file, or perform other small changes. Again, thanks for the help so far.
# 14  
Old 08-13-2012
try this
Code:
awk 'BEGIN{i=1}{
            x=$1;
            y=$5":"$6
            if(!match(c[y],x))
                {
                    if(c[y])
                        {
                            c[y]=substr(c[y],1)","substr(x,1)
                        }
                    else
                        {
                            c[y]=x
                };
                    
                };
            if(a[y])
                {
                    a[y]++
                }
            else
                {
                    a[y]=1;
                    b[i]=y;
                    i++
                }
        }
 END{for(k=1;k<i;k++){split(b[k],d,":");print a[b[k]],d[1],d[2],c[b[k]]}}'  inputfile

for input
Code:
File1 bb xx xx aab rrt xx
File1 xx xx xx ccd bbt xx
File1 xx xx xx ggt iir xx
File2 xx xx xx ggt iir xx
File2 xx xx xx ccd bbt xx
File3 aa xx xx aab rrt xx
File3 xx xx xx ggt iir xx

and output is
Code:
2 aab rrt File1,File3
2 ccd bbt File1,File2
3 ggt iir File1,File2,File3

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Append data with substring of nth column fields using awk

Hi guys, I have problem to append new data at the end of each line of the files where it takes whole value of the nth column. My expected result i just want to take a specific value only. This new data is based on substring of 11th, 12th 13th column that has comma seperated value. My code: awk... (4 Replies)
Discussion started by: null7
4 Replies

2. Shell Programming and Scripting

Help with sort only column 2 data separately

Input File Contig_1_294435nt 242231 242751 Contig_1_294435nt 242390 242782 Contig_1_294435nt 242390 242782 Contig_1_294435nt 291578 291668 Contig_2_242278nt 75910 76271 Contig_2_242278nt 76036 76316 Contig_2_242278nt 76036 76316... (2 Replies)
Discussion started by: perl_beginner
2 Replies

3. Shell Programming and Scripting

To append new data at the end of each line based on substring of last column

Hi guys, I need to append new data at the end of each line of the files. This new data is based on substring (3rd fields) of last column. Input file xxx.csv: U1234|1-5X|orange|1-5X|Act|1-5X|0.1 /sac/orange 12345 0 U5678|1-7X|grape|1-7X|Act|1-7X|0.1 /sac/grape 5678 0... (5 Replies)
Discussion started by: null7
5 Replies

4. Shell Programming and Scripting

Append data to first column delimited file

Hi, I have a data like Input: 12||34|56|78 Output: XYZ|12||34|56|78 I tried like this , but it puts it on another line awk -F "|" ' BEGIN {"XYZ"} {print $0} 'file Any quick suggessitons in sed/awk ? am using HP-UX (3 Replies)
Discussion started by: selvankj
3 Replies

5. Shell Programming and Scripting

Count column data in a text file

I have a text file that has the following column data: 0.007 0.005 0.004 0.007 How do i output the total sum of the data above? (6 Replies)
Discussion started by: alegnagrp
6 Replies

6. Shell Programming and Scripting

Count column data

Hi Guys, B07 U51C A1 44 B1 44 Yes B07 L64U A2 44 B1 44 Yes B07 L62U A2 44 B1 44 Yes B07 L11C A4 32 B1 44 NO B05 L12Z A1 12 B1 44 NO B01 651Z A2 44 B1 44 NO B04 A51Z A2 12 B1 44 NO L07 B08D A4 12 B1 44 NO B07 RU8D A4 44 B1 44 Yes B07 L58D A4 15 B1 44 No B07 LA8D A4 44 B1 44 Yes B07... (6 Replies)
Discussion started by: asavaliya
6 Replies

7. Shell Programming and Scripting

Sort data As per first Column

hI I have file A NSU30504 5 6 G 6 NSU3050B T 7 9 J NSU30506 T I 8 9 NSU3050C H J K L Output: NSU3050B T 7 9 J NSU3050C H J K L NSU30504 5 6 G 6 NSU30506 T I 8 9Video tutorial on how to use code tags in The UNIX and Linux Forums. (13 Replies)
Discussion started by: pareshkp
13 Replies

8. Shell Programming and Scripting

Sort a the file & refine data column & row format

cat file1.txt field1 "user1": field2:"data-cde" field3:"data-pqr" field4:"data-mno" field1 "user1": field2:"data-dcb" field3:"data-mxz" field4:"data-zul" field1 "user2": field2:"data-cqz" field3:"data-xoq" field4:"data-pos" Now i need to have the date like below. i have just... (7 Replies)
Discussion started by: ckaramsetty
7 Replies

9. UNIX for Advanced & Expert Users

Script to sort the files and append the extension .sort to the sorted version of the file

Hello all - I am to this forum and fairly new in learning unix and finding some difficulty in preparing a small shell script. I am trying to make script to sort all the files given by user as input (either the exact full name of the file or say the files matching the criteria like all files... (3 Replies)
Discussion started by: pankaj80
3 Replies

10. Shell Programming and Scripting

Append the data to first column

Hi, The below is the content of the file. 008.03.50.21|ID4|0015a3f01cf3 008.04.20.16|ID3|0015a3f02337 008.04.20.17|ID4_1xVoice|00131180d80e 008.04.20.03|ID3_1xVoice|0015a3694125 008.04.30.05|ID3_1xVoice|0015a3f038af 008.06.30.17|ID3_1xVoice|00159660d454... (2 Replies)
Discussion started by: ravi_rn
2 Replies
Login or Register to Ask a Question