The solution is working for pf_20150127.csv file, I ran the same script for pf_20150325.csv file and it got failed with the below error
Code:
Expected file(s) found, Performing Validations for file: pf_20150325.csv
pf_20150325.csv,20150325
------------------------------------------------------------------------------------
Checking Specific Validations 2 for File: pf_20150325.csv
------------------------------------------------------------------------------------
The sum of either or all columns is not matching with last row sum value of corresponding column. Hence exiting the Job
Errors: col 1: 173000000000 != 172928624441
So, I checked the temp files and found that the temp file temp_original_20150325.tmp where I am cutting the trailer record initially from original csv file, it is being read as below
I checked in the csv file by opening it in excel and the value in the trailer record for column D is 1.72929E+11 and when I summed the rows under column D excluding Header and Trailer in the excel it turned out to be also same as that of trailer record which is 1.72929E+11. I don't understand why unix is reading the trailer record differently from the original file.
So, as there is difference in temp_original file and temp_sum file it's being failed. I don't understand why the original temp file is storing the values with XXXXXXXXXX.00. How can we make the code generic so that what ever value is present in the trailer record irrespective of e or E notation, my sum should be calculated accordingly. Please help me out.
With Regards,
TPK
Last edited by tpk; 04-27-2015 at 11:13 AM..
Reason: Correction
Hi All,
I have a file which is having 3 columns as (string string integer)
a b 1
x y 2
p k 5
y y 4
.....
.....
Question:
I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
I have a file in the following layout:
201008005946873001846130058030701006131840000000000000000000
201008006784994001154259058033001009527844000000000000000000
201008007323067002418095058034801002418095000000000000000000
201008007697126001722141058029101002214158000000000000000000... (2 Replies)
Hi,
My input files is like this
axis1 0 1 10
axis2 0 1 5
axis1 1 2 -4
axis2 2 3 -3
axis1 3 4 5
axis2 3 4 -1
axis1 4 5 -6
axis2 4 5 1
Now, these are my following tasks
1. Print a first column for every two rows that has the same value followed by a string.
2. Match on the... (3 Replies)
Hi,
I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column)
The input is for example, after sorted:
K00001 1 1 4 3... (8 Replies)
Hi,
i have log like below:
A 2 5
B 4 1
C 6 8
B 0 1
C 1 0
B 2 3
A 0 0
i want to make result if match with A then sum from column 2 and 3
so the results:
A 2 5 (5 Replies)
this is part of a KT i am going thru.
i am writing a script in bash shell, linux where i have 2 columns where 1st signifies the nth hour like 00, 01, 02...23 and 2nd the file size.
sample data attached.
Desired output is 3 columns which will give the nth hour, number of entries in nth hour and... (3 Replies)
I have a file which need to be summed up using date column.
I/P:
2017/01/01 a 10
2017/01/01 b 20
2017/01/01 c 40
2017/01/01 a 60
2017/01/01 b 50
2017/01/01 c 40
2017/01/01 a 20
2017/01/01 b 30
2017/01/01 c 40
2017/02/01 a 10
2017/02/01 b 20
2017/02/01 c 30
2017/02/01 a 10... (6 Replies)
Discussion started by: Booo
6 Replies
LEARN ABOUT DEBIAN
fastx_quality_stats
FASTX_QUALITY_STATS(1) User Commands FASTX_QUALITY_STATS(1)NAME
fastx_quality_stats - FASTX Statistics
DESCRIPTION
usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE] Part of FASTX Toolkit 0.0.13.2 by A. Gordon (gordon@cshl.edu)
[-h] = This helpful help screen. [-i INFILE] = FASTQ input file. default is STDIN. [-o OUTFILE] = TEXT output file. default is
STDOUT. [-N] = New output format (with more information per nucleotide/cycle).
The *OLD* output TEXT file will have the following fields (one row per column):
column = column number (1 to 36 for a 36-cycles read solexa file)
count = number of bases found in this column.
min = Lowest quality score value found in this column.
max = Highest quality score value found in this column.
sum = Sum of quality score values for this column.
mean = Mean quality score value for this column.
Q1 = 1st quartile quality score.
med = Median quality score.
Q3 = 3rd quartile quality score.
IQR = Inter-Quartile range (Q3-Q1).
lW = 'Left-Whisker' value (for boxplotting).
rW = 'Right-Whisker' value (for boxplotting).
A_Count = Count of 'A' nucleotides found in this column. C_Count = Count of 'C' nucleotides found in this column. G_Count = Count
of 'G' nucleotides found in this column. T_Count = Count of 'T' nucleotides found in this column. N_Count = Count of 'N' nucleo-
tides found in this column. max-count = max. number of bases (in all cycles)
The *NEW* output format:
cycle (previously called 'column') = cycle number max-count For each nucleotide in the cycle (ALL/A/C/G/T/N):
count = number of bases found in this column.
min = Lowest quality score value found in this column.
max = Highest quality score value found in this column.
sum = Sum of quality score values for this column.
mean = Mean quality score value for this column.
Q1 = 1st quartile quality score.
med = Median quality score.
Q3 = 3rd quartile quality score.
IQR = Inter-Quartile range (Q3-Q1).
lW = 'Left-Whisker' value (for boxplotting).
rW = 'Right-Whisker' value (for boxplotting).
SEE ALSO
The quality of this automatically generated manpage might be insufficient. It is suggested to visit
http://hannonlab.cshl.edu/fastx_toolkit/commandline.html
to get a better layout as well as an overview about connected FASTX tools.
fastx_quality_stats 0.0.13.2 May 2012 FASTX_QUALITY_STATS(1)