Delete first 100 lines from a BIG File


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Delete first 100 lines from a BIG File
# 1  
Old 06-17-2012
Delete first 100 lines from a BIG File

Hi,

I need a unix command to delete first n (say 100) lines from a log file. I need to delete some lines from the file without using any temporary file. I found sed -i is an useful command for this but its not supported in my environment( AIX 6.1 ). File size is approx 100MB.

Thanks in advance.
# 2  
Old 06-17-2012
Hi.

Most versions of sed that support "-i" will use a temporary file:
Code:
`-i[SUFFIX]'
`--in-place[=SUFFIX]'
     This option specifies that files are to be edited in-place.  GNU
     `sed' does this by creating a temporary file and sending output to
     this file rather than to the standard output.(1).

     This option implies `-s'.

     When the end of the file is reached, the temporary file is renamed
     to the output file's original name. 

excerpt from info sed, q.v.

One way to really re-write in-place is to hold the data in memory. Here's one such solution with a short, no-frills perl script. The driving shell script will compare inodes using first sed and then perl. The sed solution shows that the file is different because of the rename, whereas the perl will keep the same inode:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate REAL re-write in place if enough memory, perl.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C sed perl

FILE=data1
cp sacred $FILE

pl " Short perl code, p1:"
cat p1
pe "# --- end of perl code"

pl " Results sed:"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
head $FILE

sed -i '1,2d' $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
cat $FILE

# perl
cp sacred $FILE

pl " Results perl:"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
cat $FILE

./p1 1,2 $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
cat $FILE

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
sed GNU sed version 4.1.5
perl 5.10.0

-----
 Short perl code, p1:
#!/usr/bin/env perl

# @(#) p1	Demonstrate feature (minimal).

use strict;
use warnings;

my ( $debug, $f, $file, @all, $first, $last, $i );

$debug = 1;
$debug = 0;

my ($lines_to_delete) = shift || die " Must supply line numbers.\n";
( $first, $last ) = split( /,/, $lines_to_delete );
$first-- ; $last--;
print " delete lines $first,$last\n" if $debug;
$file = shift || die " Must supply file name.\n";

open( $f, "<", $file ) || die " Cannot open file $file for input.\n";

@all = <$f>;
close($f);

open( $f, ">", $file ) || die " Cannot open file $file for output.\n";
for ( $i = 0; $i <= $#all; $i++ ) {
  print " working on line $i\n" if $debug;
  next if ( $i >= $first and $i <= $last );
  print $f $all[$i];
}

exit(0);

# --- end of perl code

-----
 Results sed:

-----
 Data file data1 before (inode:  334057):
Now is the time
for all good men
to come to the aid
of their country.

-----
 Data file data1 after  (inode:  334060):
to come to the aid
of their country.

-----
 Results perl:

-----
 Data file data1 before (inode:  334060):
Now is the time
for all good men
to come to the aid
of their country.

-----
 Data file data1 after  (inode:  334060):
to come to the aid
of their country.

Both solutions delete lines 1,2 of the file. The sed result uses a new file created from the temporary. The perl solution uses the same file, but requires as much memory as will hold the entire file.

Best wishes ... cheers, drl
# 3  
Old 06-17-2012
ed or ex can delete the lines in place for you.

Regards,
Alister
# 4  
Old 06-17-2012
Hi.

@alister

The GNU ed code:
Code:
ed GNU Ed 0.7

seems to use a scratch file. Using strace, one can see even for a 4-line file:
Code:
open("/tmp/tmpfXhei5v", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlink("/tmp/tmpfXhei5v")               = 0

and it then goes on to use the file, using lseeks and not open/close to position the file.

Whether this counts as without using any temporary file in the OP's view is unknown ... cheers, drl
This User Gave Thanks to drl For This Post:
# 5  
Old 06-17-2012
Excellent observation, drl.

Regards,
Alister
# 6  
Old 06-17-2012
Hi.

Here is a generalization of the perl script I posted earlier. This does not do any sed work. What this expects is text coming in. It saves it in memory and then writes to the file of one's choice. Not very exciting, however, the key idea is that it allows any utility to write in-place. It was inspired by sponge, part of moreutils:moreutils
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate REAL re-write in place if enough memory, perl.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C sed perl absorb-memory

SACRED=${1-sacred}
FILE=data1
cat -n $SACRED > $FILE

pl " Short perl code, absorb-memory $( wc -l < ~/bin/absorb-memory) lines:"
cat ~/bin/absorb-memory | sanitize
pe "# --- end of perl code"

pl " Results sed:"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

time sed -i '1,2d' $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

# perl
cat -n $SACRED > $FILE

pl " Results sed (no -i) into absorb-memory (perl):"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

time sed '1,2d' $FILE | absorb-memory $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

rm -f $FILE
exit 0

producing:
Code:
./s2 /tmp/100-mb.txt 

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
sed GNU sed version 4.1.5
perl 5.10.0
absorb-memory - ( local: RepRev 1.7, ~/bin/absorb-memory, 2012-06-17 )

-----
 Short perl code, absorb-memory 59 lines:
#!/usr/bin/perl

# @(#) absorb-memory	Read STDIN to memory, write to file at EOF, work-alike for sponge.
# $Id: absorb-memory,v 1.7 2012/06/17 19:01:49 drl Exp drl $

## Modification history: when / who / what: most recent at top.
#  Relocate to end if grows too long, or re-sequence.
#
# 2012.06.17 / drl / Version that uses memory only.
#
# 2011.07.15 / drl / Do initial test for write permission, abort
# if output file lacks it.
#
# 2010.11.09 / drl / Rename to absorb, avoiding conflict with
# sponge itself.
#
# 2009.04.06 / drl / original.

use warnings;
use strict;
use Carp;

my ($debug);
$debug = 1;
$debug = 0;

# Avoid hang for argument matching "-version","--version", etc.
exit(0) if @ARGV && $ARGV[0] =~ /-version/;

$/ = 0777;

my ( $file, $f, $memory );
if ( !$ARGV[0] ) { $ARGV[0] = "-"; }
$file = shift;

# Preliminary basic tests on output file.
if ( $file ne "-" ) {
  if ( not -f $file ) {
    croak("not a plain file, $file");
  }
}

$memory = do { local $/; <> };

my ($len) = length($memory);
print STDERR " Length of file in memory variable: $len\n" if $debug;

if ( $file eq "-" ) {
  open( $f, ">-" ) || die " Cannot open STDOUT for writing.\n";
}
else {
  open( $f, ">", $file ) || die " Cannot open file \"$file\" for write.\n";
}
print " debug write - file is :$file:\n" if $debug;

print $f "$memory";

END { close(STDOUT) || die "can't close stdout: $!" }
exit(0);
# --- end of perl code

-----
 Results sed:

-----
 Data file data1 before (inode:  334060):
Edges: 3:0:3 of 1777700 lines in file "data1"
     1	Preliminary Matter.  
     2	
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

real	0m13.351s
user	0m1.384s
sys	0m11.189s

-----
 Data file data1 after  (inode:  334061):
Edges: 3:0:3 of 1777698 lines in file "data1"
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
     4	It was prepared by Professor Eugene F. Irey at the University of Colorado.
     5	Any subsequent copies of this data must include this notice  
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

-----
 Results sed (no -i) into absorb-memory (perl):

-----
 Data file data1 before (inode:  334061):
Edges: 3:0:3 of 1777700 lines in file "data1"
     1	Preliminary Matter.  
     2	
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

real	0m2.890s
user	0m1.212s
sys	0m1.516s

-----
 Data file data1 after  (inode:  334061):
Edges: 3:0:3 of 1777698 lines in file "data1"
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
     4	It was prepared by Professor Eugene F. Irey at the University of Colorado.
     5	Any subsequent copies of this data must include this notice  
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

This is a busy output, but there are some items of interest. First, one can use essentially the same sed command (omitting the "-i") as for a sed that knows about "-i". Second, the times are noticeably better for the in-memory version. Third, note that the inodes are different in the case of sed, proving that a temporary file was used, and was renamed as the main input file. For the absorb-memory case, the inode is that same.

The same perl code can make any standard utility(STDIN, STDOUT compatible) into a utility that can do in-place processing.

Best wishes ... cheers, drl
# 7  
Old 06-17-2012
It seems to me any solution should always make use a temporary intermediate file for safety reasons. If we read the whole file into memory and then write it back to the same file, we run the risk of losing the original in case of power failure during the write-back phase..

With a temporary file there is only mv involved, which is only a rename if the temporary file is in the same dir on the same file system, so a temporary file in the same directory instead of /tmp for example may be preferably. If we use /tmp for example for the intermediate file, a temporary rename to .bak of the original until the move from /tmp may be required and safest will probably be to keep the .bak until the user deletes it..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to copy only some lines from very big file?

Dear all, I have stuck with this problem for some days. I have a very big file, this file can not open by vi command. There are 200 loops in this file, in each loop will have one line like this: GWA quasiparticle energy with Z factor (eV) And I need 98 lines next after this line. Is... (6 Replies)
Discussion started by: phamnu
6 Replies

2. Shell Programming and Scripting

Want to extract certain lines from big file

Hi All, I am trying to get some lines from a file i did it with while-do-loop. since the files are huge it is taking much time. now i want to make it faster. The requirement is the file will be having 1 million lines. The format is like below. ##transaction, , , ,blah, blah... (38 Replies)
Discussion started by: mad man
38 Replies

3. UNIX for Dummies Questions & Answers

Delete records from a big file based on some condition

Hi, To load a big file in a table,I have a make sure that all rows in the file has same number of the columns . So in my file if I am getting any rows which have columns not equal to 6 , I need to delete it . Delimiter is space and columns are optionally enclosed by "". This can be ... (1 Reply)
Discussion started by: hemantraijain
1 Replies

4. Shell Programming and Scripting

Delete rows from big file

Hi all, I have a big file (about 6 millions rows) and I have to delete same occurrences, stored in a small file (about 9000 rews). I have tried this: while read line do grep -v $line big_file > ok_file.tmp mv ok_file.tmp big_file done < small_file It works, but is very slow. How... (2 Replies)
Discussion started by: Tibbeche
2 Replies

5. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

6. Shell Programming and Scripting

Re: Deleting lines from big file.

Hi, I have a big (2.7 GB) text file. Each lines has '|' saperator to saperate each columns. I want to delete those lines which has text like '|0|0|0|0|0' I tried: sed '/|0|0|0|0|0/d' test.txt Unfortunately, it scans the file but does nothing. file content sample:... (4 Replies)
Discussion started by: dipeshvshah
4 Replies

7. Shell Programming and Scripting

Print #of lines after search string in a big file

I have a command which prints #lines after and before the search string in the huge file nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r;print;c=a}b{r=$0}' b=0 a=10 s="STRING1" FILE The file is 5 gig big. It works great and prints 10 lines after the lines which contains search string in... (8 Replies)
Discussion started by: prash184u
8 Replies

8. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Input: a b b c d d I need: a c I know how to get this (the lines that have duplicates) : b d sort file | uniq -d But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem. (3 Replies)
Discussion started by: necroman08
3 Replies

9. Solaris

delete first 100 lines from a file

I have a file with 28,00,000 lines of rows in this the first 80 lines will be chunks . I want to delete the chunks of 80 lines. I tried tail -f2799920 filename. is there any efficient way to do this. Thanks in advance. (7 Replies)
Discussion started by: salaathi
7 Replies

10. Solaris

delete first 100 lines rather than zero out of file

Hi experts, in my solaris 9 the file- /var/adm/messeages growin too first. by 24 hours 40MB. And always giving the below messages-- bash-2.05# tail -f messages Nov 9 16:35:38 ME1 last message repeated 1 time Nov 9 16:35:38 ME1 ftpd: wtmpx /var/adm/wtmpx No such file or directory Nov 9... (7 Replies)
Discussion started by: thepurple
7 Replies
Login or Register to Ask a Question