[Solved] Remove Duplicates and Sum in a spreadsheet


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers [Solved] Remove Duplicates and Sum in a spreadsheet
# 1  
Old 05-30-2012
[Solved] Remove Duplicates and Sum in a spreadsheet

Hi all,

I have a problem hoping someone can help me.
I have this spreadsheet :

PC.333 PC.444 PC.555 PC.666 PC.777



0 0 1 0 0 Bacteria Bacteroidetes Bacteroidia Bacteroidales

8 4 1 0 6 Bacteria Bacteroidetes Bacteroidia Firmicutes


0 0 8 0 0 Bacteria Bacteroidetes Bacteroidia Bacteroidales

.......




These are all separate columns and rows. The first 5 (PC.333,PC.444...) are patients and under them every number represents the number of Bacteria that the patient presents referred to the row.

I need to merge togheter the fields regarding Bacteria but without loosing the count of them for every patients. In other words I need to delete duplicate but sum before for the patients the numbers. So do not miss anything. Is there any possibilities ?! The file is very big and I would like to avoid doing it manually.

Please help Me ! Thank you in advance

Last edited by Giorgio C; 05-30-2012 at 11:55 AM..
# 2  
Old 05-30-2012
Code:
$ cat bacteria.awk

NR==1 { print ; next }

{
        B="";   for(N=6; N<=NF; N++) B=B" "$N;  B=substr(B,2);

        A[B]++;

        for(N=1; N<=5; N++) T[B,N]+=$N;
}

END {
        for(B in A)
        {
                L=""
                for(N=1; N<=5; N++)     L=L" "T[B,N];

                print substr(L, 2), B;
        }
}

$ awk -f bacteria.awk data

PC.333 PC.444 PC.555 PC.666 PC.777
0 0 9 0 0 Bacteria Bacteroidetes Bacteroidia Bacteroidales
8 4 1 0 6 Bacteria Bacteroidetes Bacteroidia Firmicutes

$

This User Gave Thanks to Corona688 For This Post:
# 3  
Old 05-30-2012
Thank you very much Corona !!!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

Remove duplicates

Hi I have a below file structure. 200,1245,E1,1,E1,,7611068,KWH,30, ,,,,,,,, 200,1245,E1,1,E1,,7611070,KWH,30, ,,,,,,,, 300,20140223,0.001,0.001,0.001,0.001,0.001 300,20140224,0.001,0.001,0.001,0.001,0.001 300,20140225,0.001,0.001,0.001,0.001,0.001 300,20140226,0.001,0.001,0.001,0.001,0.001... (1 Reply)
Discussion started by: tejashavele
1 Replies

3. Shell Programming and Scripting

Remove duplicates

I have a file with the following format: fields seperated by "|" title1|something class|long...content1|keys title2|somhing class|log...content1|kes title1|sothing class|lon...content1|kes title3|shing cls|log...content1|ks I want to remove all duplicates with the same "title field"(the... (3 Replies)
Discussion started by: dtdt
3 Replies

4. Shell Programming and Scripting

Awk: Remove Duplicates

I have the following code for removing duplicate records based on fields in inputfile file & moves the duplicate records in duplicates file(1st Awk) & in 2nd awk i fetch the non duplicate entries in inputfile to tmp file and use move to update the original file. Requirement: Can both the awk... (4 Replies)
Discussion started by: siramitsharma
4 Replies

5. UNIX for Dummies Questions & Answers

[Solved] Sum operation

I have file input w34 AG1 2 184 w35 AG1 6 552 w35 BG1 12 0 w35 CD1 7 0 w36 CG1 4 0 my output should be w34 AG1 2 184 0.991 w35 AG1 6 552 0.991 w35 BG1 12 0 1.000 w35 CD1 7 0... (3 Replies)
Discussion started by: radius
3 Replies

6. Shell Programming and Scripting

[Solved] sum up third and second columns by 0 difference

Hi Friends, I have the following file chr1 1 2 chr1 2 3 chr1 3 4 chr1 4 5 chr1 5 6 chr1 19 20 chr1 20 21 chr1 21 22 I want to compare the third column of record 1 to second column of next record and if the difference is zero, consider its third column and match it to next record... (4 Replies)
Discussion started by: jacobs.smith
4 Replies

7. UNIX for Dummies Questions & Answers

Remove Duplicates and Sum in a spreadsheet

Hi all, I have a problem hoping someone can help me. I have this spreadsheet : Removed embedded code These are all separate columns and rows. The first 5 (PC.333,PC.444...) are patients and under them every number represents the number of Bacteria that the patient... (1 Reply)
Discussion started by: Giorgio C
1 Replies

8. Shell Programming and Scripting

[SOLVED] Awk one line to sum up a function

I need help with debugging an error in my awk script. I have a shell script with variable named U_new_i and want to pass it to awk for use in a summation. The original file have the following content. cat test.txt -2445.7132000000 -2444.9349000000 -2444.3295000000 -2443.1814000000 ... (0 Replies)
Discussion started by: Quantum_Dot
0 Replies

9. Shell Programming and Scripting

Remove duplicates

Hello Experts, I have two files named old and new. Below are my example files. I need to compare and print the records that only exist in my new file. I tried the below awk script, this script works perfectly well if the records have exact match, the issue I have is my old file has got extra... (4 Replies)
Discussion started by: forumthreads
4 Replies
Login or Register to Ask a Question