Match sum of values in each column with the corresponding column value present in trailer record Post: 302942298

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers Match sum of values in each column with the corresponding column value present in trailer record Post 302942298 by tpk on Monday 27th of April 2015 10:03:20 AM

04-27-2015

Registered User

Hi MadeInGermany,

Thank You for the solution!!!!.

The solution is working for pf_20150127.csv file, I ran the same script for pf_20150325.csv file and it got failed with the below error

Code:

Expected file(s) found, Performing Validations for file: pf_20150325.csv
pf_20150325.csv,20150325
------------------------------------------------------------------------------------
Checking Specific Validations 2 for File: pf_20150325.csv
------------------------------------------------------------------------------------
The sum of either or all columns is not matching with last row sum value of corresponding column. Hence exiting the Job
Errors: col 1: 173000000000 != 172928624441

So, I checked the temp files and found that the temp file temp_original_20150325.tmp where I am cutting the trailer record initially from original csv file, it is being read as below

Code:

cat temp_original_20150325.tmp

172928624441.00,334431290.00,346417133.00,354231936.00,443777494.00,526288959.00,769941370.00,918420217.00,1274200675.00,1067695005.00,1122762029.00,1181290201.00

And when I did a cat on the sum temp file temp_sum_20150325.tmp, it is calculated as below

Code:

cat temp_sum_20150325.tmp

1.73E+11,334431290,346417133,354231936,443777494,526288959,769941370,918420217,1274200675,1067695005,1122762029,1181290201

I checked in the csv file by opening it in excel and the value in the trailer record for column D is 1.72929E+11 and when I summed the rows under column D excluding Header and Trailer in the excel it turned out to be also same as that of trailer record which is 1.72929E+11. I don't understand why unix is reading the trailer record differently from the original file.

So, as there is difference in temp_original file and temp_sum file it's being failed. I don't understand why the original temp file is storing the values with XXXXXXXXXX.00. How can we make the code generic so that what ever value is present in the trailer record irrespective of e or E notation, my sum should be calculated accordingly. Please help me out.

With Regards,
TPK

Last edited by tpk; 04-27-2015 at 11:13 AM.. Reason: Correction

tpk

View Public Profile for tpk

Find all posts by tpk

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to sum column 1 values

I have a file file like this. I want to sum all column 1 values. input A 2 A 3 A 4 B 4 B 2 Out put A 9 B 6

2. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the...

3. Shell Programming and Scripting

Getting a sum of column values

I have a file in the following layout: 201008005946873001846130058030701006131840000000000000000000 201008006784994001154259058033001009527844000000000000000000 201008007323067002418095058034801002418095000000000000000000 201008007697126001722141058029101002214158000000000000000000...

4. Shell Programming and Scripting

Sum up the column values group by using some field

12-11-2012,PNL,158406 12-11-2012,RISK,4564 12-11-2012,VAR_1D,310101 12-11-2012,VAR_10D,310101 12-11-2012,CB,866 12-11-2012,STR_VAR_1D,298494 12-11-2012,STR_VAR_10D,309623 09-11-2012,PNL,1024106 09-11-2012,RISK,4565 09-11-2012,VAR_1D,317211 09-11-2012,VAR_10D,317211 09-11-2012,CB,985...

5. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the...

6. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3...

7. Shell Programming and Scripting

Sum if line match with first column

Hi, i have log like below: A 2 5 B 4 1 C 6 8 B 0 1 C 1 0 B 2 3 A 0 0 i want to make result if match with A then sum from column 2 and 3 so the results: A 2 5

8. Shell Programming and Scripting

Sum column values matching other field

this is part of a KT i am going thru. i am writing a script in bash shell, linux where i have 2 columns where 1st signifies the nth hour like 00, 01, 02...23 and 2nd the file size. sample data attached. Desired output is 3 columns which will give the nth hour, number of entries in nth hour and...

9. Shell Programming and Scripting

Help with calculate the total sum of record in column one

Input file: 101M 10M10D20M1I70M 10M10D39M4I48M 10M10D91M 10M10I13M2I7M1I58M 10M10I15M1D66M Output file: 101M 101 0 0 10M10D20M1I70M 100 1 10 10M10D39M4I48M 97 4 10 10M10D91M 101 0 10 10M10I13M2I7M1I58M 88 13 0 10M10I15M1D66M 91 10 1 I'm interested to count how many total of...

10. UNIX for Beginners Questions & Answers

Sum the values in the column using date column

I have a file which need to be summed up using date column. I/P: 2017/01/01 a 10 2017/01/01 b 20 2017/01/01 c 40 2017/01/01 a 60 2017/01/01 b 50 2017/01/01 c 40 2017/01/01 a 20 2017/01/01 b 30 2017/01/01 c 40 2017/02/01 a 10 2017/02/01 b 20 2017/02/01 c 30 2017/02/01 a 10...

LEARN ABOUT DEBIAN

fastx_quality_stats

FASTX_QUALITY_STATS(1)						   User Commands					    FASTX_QUALITY_STATS(1)

NAME

       fastx_quality_stats - FASTX Statistics

DESCRIPTION

       usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE] Part of FASTX Toolkit 0.0.13.2 by A. Gordon (gordon@cshl.edu)

	      [-h]  =  This  helpful help screen.  [-i INFILE]	= FASTQ input file. default is STDIN.  [-o OUTFILE] = TEXT output file. default is
	      STDOUT.  [-N]	    = New output format (with more information per nucleotide/cycle).

   The *OLD* output TEXT file will have the following fields (one row per column):
       column = column number (1 to 36 for a 36-cycles read solexa file)

       count  = number of bases found in this column.

       min    = Lowest quality score value found in this column.

       max    = Highest quality score value found in this column.

       sum    = Sum of quality score values for this column.

       mean   = Mean quality score value for this column.

       Q1     = 1st quartile quality score.

       med    = Median quality score.

       Q3     = 3rd quartile quality score.

       IQR    = Inter-Quartile range (Q3-Q1).

       lW     = 'Left-Whisker' value (for boxplotting).

       rW     = 'Right-Whisker' value (for boxplotting).

	      A_Count = Count of 'A' nucleotides found in this column.	C_Count = Count of 'C' nucleotides found in this column.  G_Count =  Count
	      of  'G'  nucleotides found in this column.  T_Count = Count of 'T' nucleotides found in this column.  N_Count = Count of 'N' nucleo-
	      tides found in this column.  max-count = max. number of bases (in all cycles)

   The *NEW* output format:
	      cycle (previously called 'column') = cycle number max-count For each nucleotide in the cycle (ALL/A/C/G/T/N):

       count  = number of bases found in this column.

       min    = Lowest quality score value found in this column.

       max    = Highest quality score value found in this column.

       sum    = Sum of quality score values for this column.

       mean   = Mean quality score value for this column.

       Q1     = 1st quartile quality score.

       med    = Median quality score.

       Q3     = 3rd quartile quality score.

       IQR    = Inter-Quartile range (Q3-Q1).

       lW     = 'Left-Whisker' value (for boxplotting).

       rW     = 'Right-Whisker' value (for boxplotting).

SEE ALSO

       The quality of this automatically generated manpage might be insufficient.  It is suggested to visit

	      http://hannonlab.cshl.edu/fastx_toolkit/commandline.html

       to get a better layout as well as an overview about connected FASTX tools.

fastx_quality_stats 0.0.13.2					     May 2012						    FASTX_QUALITY_STATS(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to sum column 1 values

Discussion started by: suresh3566

2. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Discussion started by: amigarus

3. Shell Programming and Scripting

Getting a sum of column values

Discussion started by: jclanc8

4. Shell Programming and Scripting

Sum up the column values group by using some field

Discussion started by: manas_ranjan

5. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Discussion started by: jacobs.smith

6. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Discussion started by: sargotrons

7. Shell Programming and Scripting

Sum if line match with first column

Discussion started by: justbow

8. Shell Programming and Scripting

Sum column values matching other field

Discussion started by: alpha_1

9. Shell Programming and Scripting

Help with calculate the total sum of record in column one

Discussion started by: perl_beginner

10. UNIX for Beginners Questions & Answers

Sum the values in the column using date column

Discussion started by: Booo

LEARN ABOUT DEBIAN

fastx_quality_stats