Sponsored Content
Top Forums Shell Programming and Scripting Help with sum the column having same content Post 302908013 by perl_beginner on Thursday 3rd of July 2014 04:50:53 AM
Old 07-03-2014
Help with sum the column having same content

Input file
Code:
hsa-miR-1       1
hsa-miR-1       1
hsa-miR-7-5p    1
hsa-miR-9-5p    1
hsa-miR-9-5p    2
hsa-miR-9-5p    1
hsa-miR-10a-5p  1
hsa-miR-10b-5p  1
hsa-miR-34a-5p  1
hsa-miR-34a-5p  1
.
.

Desired output file
Code:
hsa-miR-1       2
hsa-miR-7-5p    1
hsa-miR-9-5p    4
hsa-miR-10a-5p  1
hsa-miR-10b-5p  1
hsa-miR-34a-5p  2
.
.

Do anybody know how to sum the column 2 data if column 1 is exactly the same?
Desired output file should just content unique column 1 content by sum the column 2 data if they share same column 1.

Thanks for any advice.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I sum one column based on another column?

Hi, I am new to this forum and new to awk. I have a file that contains 2 columns. Heres an example of what it looks like: 10 + 20 + 40 + 50 - 70 - So the file is tab-delimited. What I want to do is add 10 to column 1 whenever column 2 is + and substract 10 from column 1... (1 Reply)
Discussion started by: phil_heath
1 Replies

2. UNIX for Dummies Questions & Answers

How to sum rows in e.g. column 1 by a category in e.g. column 2

Hi, I've shown an example of what I would like to achieve below. In the example file, I would like to sum the values in column 2 for each distinct category in column 3 (presumably making an array?) and print the sum as well as the category name and length (note:length always corresponds with... (8 Replies)
Discussion started by: auburn
8 Replies

3. Shell Programming and Scripting

Sum Of Column Based On Column Condition

I have a following inputfile MT,AP,CDM,TTML,MUM,GS,SUCC,3 MT,AP,CDM,TTSL,AP,GS,FAIL,9 MT,AP,CDM,RCom,MAH,GS,SUCC,3 MT,AP,CDM,RTL,HP,GS,SUCC,1 MT,AP,CDM,Uni,UPE,GS,SUCC,2 MT,AP,CDM,Uni,MUM,GS,SUCC,2 TTSL,AP,GS,MT,MAH,CDM,SUCC,20 TTML,AP,GS,MT,MAH,CDM,FAIL,10... (2 Replies)
Discussion started by: siramitsharma
2 Replies

4. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

5. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

6. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

7. Shell Programming and Scripting

Sum column content

Hi, in bash I have a text file like this. text1 1.365E+08 1.363E+08 1.354E+08 1.314E+08 1.207E+08 8.964E+07 3.830E+07 text1 7.139E+08 7.131E+08 7.081E+08 6.875E+08 6.315E+08 4.689E+08 2.003E+08 text1 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 The file has... (7 Replies)
Discussion started by: f_o_555
7 Replies

8. UNIX for Beginners Questions & Answers

Sum the values in the column using date column

I have a file which need to be summed up using date column. I/P: 2017/01/01 a 10 2017/01/01 b 20 2017/01/01 c 40 2017/01/01 a 60 2017/01/01 b 50 2017/01/01 c 40 2017/01/01 a 20 2017/01/01 b 30 2017/01/01 c 40 2017/02/01 a 10 2017/02/01 b 20 2017/02/01 c 30 2017/02/01 a 10... (6 Replies)
Discussion started by: Booo
6 Replies

9. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

10. Shell Programming and Scripting

Sum of a column as new column based on header in a script

Hello, I am trying to store sum of a column as a new column inside a file but have to find the column names dynamically I/p c1,c2,c3,c4,c5 10,20,30,40,50 20,30,40,50,60 If i want to find sum only column c1, c3 and output it as c6,c7 O/p c1,c2,c3,c4,c5,c6,c7 10,20,30,40,50,30,70... (6 Replies)
Discussion started by: mkathi
6 Replies
FASTX_QUALITY_STATS(1)						   User Commands					    FASTX_QUALITY_STATS(1)

NAME
fastx_quality_stats - FASTX Statistics DESCRIPTION
usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE] Part of FASTX Toolkit 0.0.13.2 by A. Gordon (gordon@cshl.edu) [-h] = This helpful help screen. [-i INFILE] = FASTQ input file. default is STDIN. [-o OUTFILE] = TEXT output file. default is STDOUT. [-N] = New output format (with more information per nucleotide/cycle). The *OLD* output TEXT file will have the following fields (one row per column): column = column number (1 to 36 for a 36-cycles read solexa file) count = number of bases found in this column. min = Lowest quality score value found in this column. max = Highest quality score value found in this column. sum = Sum of quality score values for this column. mean = Mean quality score value for this column. Q1 = 1st quartile quality score. med = Median quality score. Q3 = 3rd quartile quality score. IQR = Inter-Quartile range (Q3-Q1). lW = 'Left-Whisker' value (for boxplotting). rW = 'Right-Whisker' value (for boxplotting). A_Count = Count of 'A' nucleotides found in this column. C_Count = Count of 'C' nucleotides found in this column. G_Count = Count of 'G' nucleotides found in this column. T_Count = Count of 'T' nucleotides found in this column. N_Count = Count of 'N' nucleo- tides found in this column. max-count = max. number of bases (in all cycles) The *NEW* output format: cycle (previously called 'column') = cycle number max-count For each nucleotide in the cycle (ALL/A/C/G/T/N): count = number of bases found in this column. min = Lowest quality score value found in this column. max = Highest quality score value found in this column. sum = Sum of quality score values for this column. mean = Mean quality score value for this column. Q1 = 1st quartile quality score. med = Median quality score. Q3 = 3rd quartile quality score. IQR = Inter-Quartile range (Q3-Q1). lW = 'Left-Whisker' value (for boxplotting). rW = 'Right-Whisker' value (for boxplotting). SEE ALSO
The quality of this automatically generated manpage might be insufficient. It is suggested to visit http://hannonlab.cshl.edu/fastx_toolkit/commandline.html to get a better layout as well as an overview about connected FASTX tools. fastx_quality_stats 0.0.13.2 May 2012 FASTX_QUALITY_STATS(1)
All times are GMT -4. The time now is 11:50 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy