Aggregation of Huge files Post: 302892807

Sponsored Content

Top Forums Shell Programming and Scripting Aggregation of Huge files Post 302892807 by Don Cragun on Friday 14th of March 2014 02:37:02 PM

03-14-2014

Registered User

Quote:

Originally Posted by Ravichander

Hi Don !

The below is the requirement for my side to work with unix scripting:

1. The number of records may vary from 200000 to 4500000.
2. The 156th column needs to be calculated for sum which has a decimal range of (38,10)
3. The file will be pipe de-limited and for now, the double quotes won't appear but it may come in future. So, currently we can take it like only pipe delimited.
4. While performing aggregation, we need to take absolute sum of the 156th column.
5. The maximum precision is of 38,10 is expected and on normally, the 156th column length coming as 24,10.

If the code which ever I have used/provided is erroneous or not suiting the requirement, kindly help me in arriving at a command to perform the above stated requirements.

I am finding quite difficult to find the reason as such that is causing this difference !

Regards,
Ravichander

You didn't answer my question about the length of the longest line in your file! If you have any lines longer than 2048 bytes (including the terminating newline character), nawk may fail.

The number of records doesn't matter for this script.
The code that you provided did NOT calculate the sum of the numbers in the 156th field; it calculated the sum of the absolute values of the numbers in the 156h field!
The quote removal slows down the processing, but doesn't affect the results unless there is a pipe symbol (|) between quotes that is not to be treated as a field separator. If there is any possibility that a | between double quotes (") should not be treated as a field separator, this awk script will not work! If there will never be a | between " characters and there will never be " characters in the 156th field, the script should ignore " characters completely.
The absolute value of the sum is not the same as the sum of the absolute values!!! You need to clearly describe the calculation to be performed!
Using bc to calculate the sum of a set of numbers can easily handle sums with a hundred digits before and after the radix character with no loss of precision.

The script assumes that the contents of field 156 will be a string of digits with an optional leading minus sign (-) and no more than one decimal point character (.). If there is a decimal point character and a minus sign, the minus sign must still be the 1st character in the string. If the contents of field 156 contains more than one minus sign, more than one decimal point, or contains any other non-numeric characters, the results are unspecified.

When extracting data from your database, are you absolutely sure that you are getting the records and the sum that you have in your header in a single transaction? If you are getting the data in one transaction and the sum in another transaction, changes to your database between those two transactions could easily cause the differences you are seeing.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two huge files

Hi, I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of...

2. UNIX for Dummies Questions & Answers

Difference between two huge files

Hi, As per my requirement, I need to take difference between two big files(around 6.5 GB) and get the difference to a output file without any line numbers or '<' or '>' in front of each new line. As DIFF command wont work for big files, i tried to use BDIFF instead. I am getting incorrect...

3. UNIX for Advanced & Expert Users

Huge files manipulation

Hi , i need a fast way to delete duplicates entrys from very huge files ( >2 Gbs ) , these files are in plain text. I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (memory core dump) In using HP-UX large servers. Any advice will...

4. High Performance Computing

Huge Files to be Joined on Ux instead of ORACLE

we have one file (11 Million) line that is being matched with (10 Billion) line. the proof of concept we are trying , is to join them on Unix : All files are delimited and they have composite keys.. could unix be faster than Oracle in This regards.. Please advice

5. Shell Programming and Scripting

Help in locating a word in huge files

hi i receive about 5000 files per day in my system. Each of them are like: cat ABC.april24.dat ABH00001990 01993 409009092 0909 INI iop 9033 AAB0000237893784 8430900 898383 AUS 34349089008 849843 9474822 AAA00003849893498098394 84834 348348439 -438939 IN AAA00004438493893849384...

6. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a...

7. AIX

Copy huge files system

Dear Guy’s By using dd command or any strong command, I’d like to copy huge data from file system to another file system Sours File system: /sfsapp File system has 250 GB of data Target File system: /tgtapp I’d like to copy all these files and directories from /sfsapp to /tgtapp as...

8. Shell Programming and Scripting

Compression - Exclude huge files

I have a DB folder which sizes to 60GB approx. It has logs which size from 500MB - 1GB. I have an Installation which would update the DB. I need to backup this DB folder, just incase my Installation FAILS. But I do not need the logs in my backup. How do I exclude them during compression (tar)? ...

9. UNIX for Dummies Questions & Answers

File comparison of huge files

Hi all, I hope you are well. I am very happy to see your contribution. I am eager to become part of it. I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below For clear picture, please see...

10. Shell Programming and Scripting

Aggregation of huge data

Hi Friends, I have a file with sample amount data as follows: -89990.3456 8788798.990000128 55109787.20 -12455558989.90876 I need to exclude the '-' symbol in order to treat all values as an absolute one and then I need to sum up.The record count is around 1 million. How...

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two huge files

Discussion started by: kmkbuddy_1983

2. UNIX for Dummies Questions & Answers

Difference between two huge files

Discussion started by: pyaranoid

3. UNIX for Advanced & Expert Users

Huge files manipulation

Discussion started by: Klashxx

4. High Performance Computing

Huge Files to be Joined on Ux instead of ORACLE

Discussion started by: magedfawzy

5. Shell Programming and Scripting

Help in locating a word in huge files

Discussion started by: Prateek007

6. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Discussion started by: jiapei100

7. AIX

Copy huge files system

Discussion started by: Mr.AIX

8. Shell Programming and Scripting

Compression - Exclude huge files

Discussion started by: DevendraG

9. UNIX for Dummies Questions & Answers

File comparison of huge files

Discussion started by: kaaliakahn

10. Shell Programming and Scripting

Aggregation of huge data

Discussion started by: Ravichander