Aggregation of Huge files

02-28-2014

Registered User

16, 0

Join Date: Aug 2012

Last Activity: 7 April 2014, 1:56 AM EDT

Location: Chennai

Posts: 16

Thanks Given: 12

Thanked 0 Times in 0 Posts

Aggregation of Huge files

Hi Friends !!

I am facing a hash total issue while performing over a set of files of huge volume:

Command used:

Code:

tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f'

Pipe delimited file and 156 column is for hash totalling.

File 1:

Record count is 254368

Absolute Sum in DB is 23840949436509.39

Absolute Sum using above script is 23840949436510.18750

File 2:

Record count is 2580100

Absolute Sum in DB is 7305817400402102.5619993295

Absolute Sum using above script is 7305817400403184.00000

Kindly help me in resolving this issue and do suggest me if any better way to do absolute hash totalling for huge volume.

Thanks in advance,
Ravi

Moderator's Comments:

edit by bakunin: you are welcome but you would be even more welcome if you would use these CODE-tags for your code. Thank you for using them yourself from now on.

Last edited by bakunin; 02-28-2014 at 04:10 AM..

Ravichander

View Public Profile for Ravichander

Find all posts by Ravichander

02-28-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

awk uses 32-bit floating point numbers which do not have infinite precision -- they have at best 9 decimal digits precision. If you want infinite precision like a database will do, try the bc utility.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

02-28-2014

Registered User

16, 0

Join Date: Aug 2012

Last Activity: 7 April 2014, 1:56 AM EDT

Location: Chennai

Posts: 16

Thanks Given: 12

Thanked 0 Times in 0 Posts

Hi Corona..

Can you help me with bc utility for this scenario ? I am just new to these functions !

Regards,
Ravi

Ravichander

View Public Profile for Ravichander

Find all posts by Ravichander

02-28-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Assuming that you're using the tail in the command line:

Code:

tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f'

to discard the 1st two lines of your input file because they contain headers that you don't want included in your output, that the '%.2f' didn't really appear in the command line you executed (since that would be a syntax error for nawk), that you don't really want the output rounded to five digits after the decimal point in the output (as would be done in your command line by OFMT='%.5f', and assuming that field #156 in the other lines in your input file contains a double quoted string containing a string of digits with no more than one period and with an optional leading minus sign (which you want to be ignored), you could try something like:

Code:

nawk -F'|' -v dqANDms='["-]' '
BEGIN { f=156
        printf("0")
}       
NR > 2 {gsub(dqANDms, "", $f) 
        printf("+%s", $f)
}
END {   printf("\n")
}' <File_Name> | bc

Last edited by Don Cragun; 02-28-2014 at 11:45 AM.. Reason: Fix typos.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

02-28-2014

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by Ravichander

Hi Corona..
Can you help me with bc utility for this scenario?

Depends what your scenario is, I don't know yet, all I have is a program which doesn't do what you want...

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

03-06-2014

Registered User

16, 0

Join Date: Aug 2012

Last Activity: 7 April 2014, 1:56 AM EDT

Location: Chennai

Posts: 16

Thanks Given: 12

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Don Cragun

Assuming that you're using the tail in the command line:

Code:

tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f'

Code:

nawk -F'|' -v dqANDms='["-]' '
BEGIN { f=156
        printf("0")
}       
NR > 2 {gsub(dqANDms, "", $f) 
        printf("+%s", $f)
}
END {   printf("\n")
}' <File_Name> | bc

Hi Don !

Thanks for the work around solution and it is working fine for small files, but when I execute large files..facing below error:

Code:

 
0705-001: bundling space exceeded on line 1 stdin

Kindly help me in this regard.

Regards,
Ravichander

Ravichander

View Public Profile for Ravichander

Find all posts by Ravichander

03-06-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Making the assumption that that error code is coming from bc, you could try:

Code:

awk -F'|' -v dqANDms='["-]' '
BEGIN { f=156
        printf("s=0\n")
}
NR > 2 {gsub(dqANDms, "", $f)
        printf("s+=%s\n",  $f)
}
END {   printf("s\n")
}' file | bc

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Aggregation of Huge files

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Aggregation of huge data

Discussion started by: Ravichander

2. UNIX for Dummies Questions & Answers

File comparison of huge files

Discussion started by: kaaliakahn

3. Shell Programming and Scripting

Compression - Exclude huge files

Discussion started by: DevendraG

4. AIX

Copy huge files system

Discussion started by: Mr.AIX

5. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Discussion started by: jiapei100

6. Shell Programming and Scripting

Help in locating a word in huge files

Discussion started by: Prateek007

7. High Performance Computing

Huge Files to be Joined on Ux instead of ORACLE

Discussion started by: magedfawzy

8. UNIX for Advanced & Expert Users

Huge files manipulation

Discussion started by: Klashxx

9. UNIX for Dummies Questions & Answers

Difference between two huge files

Discussion started by: pyaranoid

10. Shell Programming and Scripting

Comparing two huge files

Discussion started by: kmkbuddy_1983