Sponsored Content
Top Forums Shell Programming and Scripting Faster command to remove headers for files in a directory Post 302730409 by jim mcnamara on Monday 12th of November 2012 08:17:26 PM
Old 11-12-2012
Did sed win? Or did file caching speed up sed? Modern controllers and RAM cache - HPUX - can cache 100 MB of a single file without really using up system resources.

I vote for caching. The only fair test is two separate files.

BTW: programs like sed, awk, head, tail, grep are all highly optimized for their respective jobs. There are several of external factors like: caching, I/O load (I/O request queue length), SAN vs disk, that distort these kinds of tests. So, by the time you have runs some tests, any time differences between the commands will likely have been eaten up by testing.

Your best bet is to parallelize, use the cpu and disk I/O to the max. With a quad core maybe you want to consider 4 simultaneous child processes, for example:

Code:
cd /directory
cnt=1
for fname in $(find . -type f)
do
   (awk 'FNR>1' $fname > tmp.${cnt}; mv tmp.${cnt} $fname)  &
   cnt=$(( $cnt + 1  ))
   [  $(( $cnt % 4 )) -eq 0 ]  && wait
done
wait

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

help:how to remove headers in output file

Hi I am running a script (which compares two directory contents) for which I am getting an output of 70 pages in which few pages are blank so I was able to delete those blank lines. But I also want to delete the headers present for each page. can any one help me by providing the code... (1 Reply)
Discussion started by: raj_thota
1 Replies

2. Shell Programming and Scripting

Which one is faster to remove control m characters?

I have a file with millions of records...Before I experiment, I would like to know which one is faster. Both the commands work absolutely fine on a smaller set of records. Please advice. sed 's/^M//g' ${INPUT_FILE} > tmp.txt mv tmp.txt ${INPUT_FILE} tr -d "\15" < ${INPUT_FILE} > ... (11 Replies)
Discussion started by: madhunk
11 Replies

3. Shell Programming and Scripting

Remove Headers throughout a data file

I have a data file with over 500,000 records/lines that has the header throughout the file. SEQ_ID Name Start_Date Ins_date Add1 Add2 1 Harris 04/02/08 03/02/08 333 Main Suite 101 2 Smith 02/03/08 01/23/08 287 Jenkins SEQ_ID Name ... (3 Replies)
Discussion started by: psmall
3 Replies

4. UNIX for Dummies Questions & Answers

Remove certain headers using mailx or sendmail

Hello, So i want to send mails in any way from a solaris 5.8 system, perhaps using mailx or sendmail. My purpose is to stay clear of systems name in head data. So i want to strip at least the "Message-Id" and the "Recieved" headers of the mail. Yet this seems to be a bit of a problem. Now i... (2 Replies)
Discussion started by: congo
2 Replies

5. Shell Programming and Scripting

Remove text between headers while leaving headers intact

Hi, I'm trying to strip all lines between two headers in a file: ### BEGIN ### Text to remove, contains all kinds of characters ... Antispyware-Downloadserver.com (Germany)=http://www.antispyware-downloadserver.c om/updates/ Antispyware-Downloadserver.com #2... (3 Replies)
Discussion started by: Trones
3 Replies

6. Shell Programming and Scripting

Merging of files with different headers to make combined headers file

Hi , I have a typical situation. I have 4 files and with different headers (number of headers is varible ). I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only). For example - File 1 H1|H2|H3|H4 11|12|13|14 21|22|23|23... (1 Reply)
Discussion started by: marut_ashu
1 Replies

7. Shell Programming and Scripting

Running rename command on large files and make it faster

Hi All, I have some 80,000 files in a directory which I need to rename. Below is the command which I am currently running and it seems, it is taking fore ever to run this command. This command seems too slow. Is there any way to speed up the command. I have have GNU Parallel installed on my... (6 Replies)
Discussion started by: shoaibjameel123
6 Replies

8. UNIX for Dummies Questions & Answers

Using sed command to remove multiple instances of repeating headers in one file?

Hi, I have catenated multiple output files (from a monte carlo run) into one big output file. Each individual file has it's own two line header. So when I catenate, there are multiple two line headers (of the same wording) within the big file. How do I use the sed command to search for the... (1 Reply)
Discussion started by: rebazon
1 Replies

9. Shell Programming and Scripting

Remove headers thar dont match

Good evening I need your help please, im new at Unix and i wanted to remove the first 5 headers for 100000 records files and then create a control file .ctl that contains the number of records and all seem to work out but when i tested at production it didnt wotk. Here is the code: #!... (6 Replies)
Discussion started by: alexcol
6 Replies

10. Shell Programming and Scripting

Remove white space and duplicate headers

I have a file called "dsout" with empty rows and duplicate headers. DATE TIME TOTAL_GB USED_GB %USED --------- -------- ---------- ---------- ---------- 03/05/013 12:34 PM 3151.24316 2331.56653 73.988785 ... (3 Replies)
Discussion started by: Daniel Gate
3 Replies
PCAP_LOOP(3PCAP)														  PCAP_LOOP(3PCAP)

NAME
pcap_loop, pcap_dispatch - process packets from a live capture or savefile SYNOPSIS
#include <pcap/pcap.h> typedef void (*pcap_handler)(u_char *user, const struct pcap_pkthdr *h, const u_char *bytes); int pcap_loop(pcap_t *p, int cnt, pcap_handler callback, u_char *user); int pcap_dispatch(pcap_t *p, int cnt, pcap_handler callback, u_char *user); DESCRIPTION
pcap_loop() processes packets from a live capture or ``savefile'' until cnt packets are processed, the end of the ``savefile'' is reached when reading from a ``savefile'', pcap_breakloop() is called, or an error occurs. It does not return when live read timeouts occur. A value of -1 or 0 for cnt is equivalent to infinity, so that packets are processed until another ending condition occurs. pcap_dispatch() processes packets from a live capture or ``savefile'' until cnt packets are processed, the end of the current bufferful of packets is reached when doing a live capture, the end of the ``savefile'' is reached when reading from a ``savefile'', pcap_breakloop() is called, or an error occurs. Thus, when doing a live capture, cnt is the maximum number of packets to process before returning, but is not a minimum number; when reading a live capture, only one bufferful of packets is read at a time, so fewer than cnt packets may be processed. A value of -1 or 0 for cnt causes all the packets received in one buffer to be processed when reading a live capture, and causes all the packets in the file to be processed when reading a ``savefile''. (In older versions of libpcap, the behavior when cnt was 0 was undefined; different platforms and devices behaved differently, so code that must work with older versions of libpcap should use -1, nor 0, as the value of cnt.) callback specifies a pcap_handler routine to be called with three arguments: a u_char pointer which is passed in the user argument to pcap_loop() or pcap_dispatch(), a const struct pcap_pkthdr pointer pointing to the packet time stamp and lengths, and a const u_char pointer to the first caplen (as given in the struct pcap_pkthdr a pointer to which is passed to the callback routine) bytes of data from the packet. The struct pcap_pkthdr and the packet data are not to be freed by the callback routine, and are not guaranteed to be valid after the callback routine returns; if the code needs them to be valid after the callback, it must make a copy of them. RETURN VALUE
pcap_loop() returns 0 if cnt is exhausted, -1 if an error occurs, or -2 if the loop terminated due to a call to pcap_breakloop() before any packets were processed. It does not return when live read timeouts occur; instead, it attempts to read more packets. pcap_dispatch() returns the number of packets processed on success; this can be 0 if no packets were read from a live capture (if, for example, they were discarded because they didn't pass the packet filter, or if, on platforms that support a read timeout that starts before any packets arrive, the timeout expires before any packets arrive, or if the file descriptor for the capture device is in non-blocking mode and no packets were available to be read) or if no more packets are available in a ``savefile.'' It returns -1 if an error occurs or -2 if the loop terminated due to a call to pcap_breakloop() before any packets were processed. If your application uses pcap_breakloop(), make sure that you explicitly check for -1 and -2, rather than just checking for a return value < 0. If -1 is returned, pcap_geterr() or pcap_perror() may be called with p as an argument to fetch or display the error text. SEE ALSO
pcap(3PCAP), pcap_geterr(3PCAP), pcap_breakloop(3PCAP) 24 December 2008 PCAP_LOOP(3PCAP)
All times are GMT -4. The time now is 05:12 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy