12-11-2009
Best Stratergy to process Huge files
I have a file with 20 million records. I need to read each record and process it.
Which will be faster? Perl, Shell or awk?
and what is the best method to read huge files line by line?
10 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
Hello !
I have a problem with an apache process that is causing huge load. It starts from time to time - I'm not sure what is making it start beacause there's nothing in cron, but it appears every few minutes - and when it starts is uses a lot of RAM (up to 1.3GB) and create a huge load on... (1 Reply)
Discussion started by: Sergiu-IT
1 Replies
2. Shell Programming and Scripting
Hi,
I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of... (11 Replies)
Discussion started by: kmkbuddy_1983
11 Replies
3. UNIX for Dummies Questions & Answers
Hi,
As per my requirement, I need to take difference between two big files(around 6.5 GB) and get the difference to a output file without any line numbers or '<' or '>' in front of each new line.
As DIFF command wont work for big files, i tried to use BDIFF instead.
I am getting incorrect... (13 Replies)
Discussion started by: pyaranoid
13 Replies
4. UNIX for Advanced & Expert Users
Hi , i need a fast way to delete duplicates entrys from very huge files ( >2 Gbs ) , these files are in plain text.
I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (memory core dump)
In using HP-UX large servers.
Any advice will... (8 Replies)
Discussion started by: Klashxx
8 Replies
5. High Performance Computing
we have one file (11 Million) line that is being matched with (10 Billion) line.
the proof of concept we are trying , is to join them on Unix :
All files are delimited and they have composite keys..
could unix be faster than Oracle in This regards..
Please advice (1 Reply)
Discussion started by: magedfawzy
1 Replies
6. Shell Programming and Scripting
Hi, all:
I've got two folders, say, "folder1" and "folder2".
Under each, there are thousands of files.
It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command.
However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies
7. AIX
Hi All
what is the command to check process ids , which are running from long time and which are consuming more cpu?
Also how to check, what a particular PID is running what
For Ex:
i have a pid :3223722 which is running since from long time,
if i want to check what is this... (1 Reply)
Discussion started by: sidharthmellam
1 Replies
8. AIX
Dear Guy’s
By using dd command or any strong command, I’d like to copy huge data from file system to another file system
Sours File system: /sfsapp
File system has 250 GB of data
Target File system: /tgtapp
I’d like to copy all these files and directories from /sfsapp to /tgtapp as... (28 Replies)
Discussion started by: Mr.AIX
28 Replies
9. Shell Programming and Scripting
Hi all,
I need help on getting difference between 2 .csv files.
I have 2 large . csv files which has equal number of columns. I nned to compare them and get output in new file which will have difference olny.
E.g.
File1.csv
Name, Date, age,number
Sakshi, 16-12-2011, 22, 56
Akash,... (10 Replies)
Discussion started by: Dimple
10 Replies
10. Shell Programming and Scripting
Hi Friends !!
I am facing a hash total issue while performing over a set of files of huge volume:
Command used:
tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f'
Pipe delimited file and 156 column is for hash totalling.... (14 Replies)
Discussion started by: Ravichander
14 Replies
LEARN ABOUT OSF1
acctprc1
acctprc(8) System Manager's Manual acctprc(8)
NAME
acctprc1, acctprc2, accton - Perform process-accounting procedures
SYNOPSIS
acctprc1 [InFile]
acctprc2
accton [OutFile]
DESCRIPTION
The three acctprc commands, acctprc1, acctprc2, and accton, are used in the runacct shell procedure to produce process-accounting reports.
acctprc1 [InFile]
The acctprc1 command is used to read records from standard input that are in a format defined by the acct structure in the
/usr/include/sys/acct.h header file. This process adds the login names that correspond to user IDs, and then writes corresponding ASCII
records to standard output. For each process, the record format includes the following seven unheaded columns: The user ID column includes
both traditional and assigned user identification numbers listed in the /etc/passwd file. The login name is the one used for the user ID
in the /etc/passwd file. The number of seconds the process consumed when executed during prime-time hours. Prime-time and nonprime-time
hours are defined in the /usr/sbin/acct/holidays file. The number of seconds the process consumed when executed during nonprime-time
hours. Total number of characters transferred. Total number of blocks read and written. Mean memory size (in kilobyte units).
When specified, InFile contains a list of login sessions in a format defined by the utmp structure in the /usr/include/utmp.h header file.
The login session records are sorted according to user ID and login name. When InFile is not specified, acctprc1 gets login names from the
password file /etc/passwd. The information in InFile is used to distinguish different login names that share the same user ID.
acctprc2
The acctprc2 command reads, from standard input, the records written by acctprc1, summarizes them according to user ID and name, and writes
sorted summaries to standard output as total accounting records in the tacct format (see the acctmerg command).
accton [OutFile]
When no parameters are specified with the accton command, account processing is turned off. When you specify an existing OutFile file,
process accounting is turned on, and the kernel adds records to that file. You must specify an Outfile to start process accounting. Many
shell script procedures expect the file name /var/adm/pacct, the standard process-accounting file.
EXAMPLES
To add a user name to each process-accounting record in a binary file and then write these modified binary-file records to an ASCII file
named out.file, enter the following line to an accounting shell script:
/usr/sbin/acct/acctprc1 < /var/adm/pacct >out.file
A user name is added to each record. The raw data in the pacct file is converted to ASCII and added to file out.file. To produce a
total binary accounting record of the ASCII output file out.file produced in example 1, enter the following line to an accounting
shell script:
/usr/sbin/acct/acctprc2 < out.file > /var/adm/acct/nite/daytacct
The resulting binary total accounting file, written in the acct format, contains records sorted by user ID. This sorted user ID
file, is usually merged with other total accounting records when an acctmerg command is processed to produce a daily summary
accounting record called /var/adm/acct/sum/daytacct. To turn on process accounting, enter:
/usr/sbin/acct/accton /var/adm/pacct To turn off process accounting, enter:
/usr/sbin/acct/accton
FILES
Specifies the command path. Specifies the command path. Specifies the command path.
RELATED INFORMATION
Commands: acct(8), acctcms(8), acctmerg(8), runacct(8)
Functions: acct(2) delim off
acctprc(8)