Delete first 100 lines from a BIG File

06-17-2012

Registered User

4, 0

Join Date: Jun 2012

Last Activity: 21 June 2012, 12:23 PM EDT

Posts: 4

Thanks Given: 3

Thanked 0 Times in 0 Posts

Delete first 100 lines from a BIG File

Hi,

I need a unix command to delete first n (say 100) lines from a log file. I need to delete some lines from the file without using any temporary file. I found sed -i is an useful command for this but its not supported in my environment( AIX 6.1 ). File size is approx 100MB.

Thanks in advance.

unohu

View Public Profile for unohu

Find all posts by unohu

06-17-2012

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

Most versions of sed that support "-i" will use a temporary file:

Code:

`-i[SUFFIX]'
`--in-place[=SUFFIX]'
     This option specifies that files are to be edited in-place.  GNU
     `sed' does this by creating a temporary file and sending output to
     this file rather than to the standard output.(1).

     This option implies `-s'.

     When the end of the file is reached, the temporary file is renamed
     to the output file's original name. 

excerpt from info sed, q.v.

One way to really re-write in-place is to hold the data in memory. Here's one such solution with a short, no-frills perl script. The driving shell script will compare inodes using first sed and then perl. The sed solution shows that the file is different because of the rename, whereas the perl will keep the same inode:

Code:

#!/usr/bin/env bash

# @(#) s1	Demonstrate REAL re-write in place if enough memory, perl.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C sed perl

FILE=data1
cp sacred $FILE

pl " Short perl code, p1:"
cat p1
pe "# --- end of perl code"

pl " Results sed:"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
head $FILE

sed -i '1,2d' $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
cat $FILE

# perl
cp sacred $FILE

pl " Results perl:"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
cat $FILE

./p1 1,2 $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
cat $FILE

exit 0

producing:

Code:

% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
sed GNU sed version 4.1.5
perl 5.10.0

-----
 Short perl code, p1:
#!/usr/bin/env perl

# @(#) p1	Demonstrate feature (minimal).

use strict;
use warnings;

my ( $debug, $f, $file, @all, $first, $last, $i );

$debug = 1;
$debug = 0;

my ($lines_to_delete) = shift || die " Must supply line numbers.\n";
( $first, $last ) = split( /,/, $lines_to_delete );
$first-- ; $last--;
print " delete lines $first,$last\n" if $debug;
$file = shift || die " Must supply file name.\n";

open( $f, "<", $file ) || die " Cannot open file $file for input.\n";

@all = <$f>;
close($f);

open( $f, ">", $file ) || die " Cannot open file $file for output.\n";
for ( $i = 0; $i <= $#all; $i++ ) {
  print " working on line $i\n" if $debug;
  next if ( $i >= $first and $i <= $last );
  print $f $all[$i];
}

exit(0);

# --- end of perl code

-----
 Results sed:

-----
 Data file data1 before (inode:  334057):
Now is the time
for all good men
to come to the aid
of their country.

-----
 Data file data1 after  (inode:  334060):
to come to the aid
of their country.

-----
 Results perl:

-----
 Data file data1 before (inode:  334060):
Now is the time
for all good men
to come to the aid
of their country.

-----
 Data file data1 after  (inode:  334060):
to come to the aid
of their country.

Both solutions delete lines 1,2 of the file. The sed result uses a new file created from the temporary. The perl solution uses the same file, but requires as much memory as will hold the entire file.

Best wishes ... cheers, drl

drl

View Public Profile for drl

Find all posts by drl

06-17-2012

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

ed or ex can delete the lines in place for you.

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

06-17-2012

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

@alister

The GNU ed code:

Code:

ed GNU Ed 0.7

seems to use a scratch file. Using strace, one can see even for a 4-line file:

Code:

open("/tmp/tmpfXhei5v", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlink("/tmp/tmpfXhei5v")               = 0

and it then goes on to use the file, using lseeks and not open/close to position the file.

Whether this counts as without using any temporary file in the OP's view is unknown ... cheers, drl

This User Gave Thanks to drl For This Post:

drl

View Public Profile for drl

Find all posts by drl

06-17-2012

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Excellent observation, drl.

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

06-17-2012

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

Here is a generalization of the perl script I posted earlier. This does not do any sed work. What this expects is text coming in. It saves it in memory and then writes to the file of one's choice. Not very exciting, however, the key idea is that it allows any utility to write in-place. It was inspired by sponge, part of moreutils:moreutils

Code:

#!/usr/bin/env bash

# @(#) s1	Demonstrate REAL re-write in place if enough memory, perl.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C sed perl absorb-memory

SACRED=${1-sacred}
FILE=data1
cat -n $SACRED > $FILE

pl " Short perl code, absorb-memory $( wc -l < ~/bin/absorb-memory) lines:"
cat ~/bin/absorb-memory | sanitize
pe "# --- end of perl code"

pl " Results sed:"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

time sed -i '1,2d' $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

# perl
cat -n $SACRED > $FILE

pl " Results sed (no -i) into absorb-memory (perl):"
pl " Data file $FILE before (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

time sed '1,2d' $FILE | absorb-memory $FILE

pl " Data file $FILE after  (inode: $( stat -c " %i" $FILE )):"
specimen 3 $FILE

rm -f $FILE
exit 0

producing:

Code:

./s2 /tmp/100-mb.txt 

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
sed GNU sed version 4.1.5
perl 5.10.0
absorb-memory - ( local: RepRev 1.7, ~/bin/absorb-memory, 2012-06-17 )

-----
 Short perl code, absorb-memory 59 lines:
#!/usr/bin/perl

# @(#) absorb-memory	Read STDIN to memory, write to file at EOF, work-alike for sponge.
# $Id: absorb-memory,v 1.7 2012/06/17 19:01:49 drl Exp drl $

## Modification history: when / who / what: most recent at top.
#  Relocate to end if grows too long, or re-sequence.
#
# 2012.06.17 / drl / Version that uses memory only.
#
# 2011.07.15 / drl / Do initial test for write permission, abort
# if output file lacks it.
#
# 2010.11.09 / drl / Rename to absorb, avoiding conflict with
# sponge itself.
#
# 2009.04.06 / drl / original.

use warnings;
use strict;
use Carp;

my ($debug);
$debug = 1;
$debug = 0;

# Avoid hang for argument matching "-version","--version", etc.
exit(0) if @ARGV && $ARGV[0] =~ /-version/;

$/ = 0777;

my ( $file, $f, $memory );
if ( !$ARGV[0] ) { $ARGV[0] = "-"; }
$file = shift;

# Preliminary basic tests on output file.
if ( $file ne "-" ) {
  if ( not -f $file ) {
    croak("not a plain file, $file");
  }
}

$memory = do { local $/; <> };

my ($len) = length($memory);
print STDERR " Length of file in memory variable: $len\n" if $debug;

if ( $file eq "-" ) {
  open( $f, ">-" ) || die " Cannot open STDOUT for writing.\n";
}
else {
  open( $f, ">", $file ) || die " Cannot open file \"$file\" for write.\n";
}
print " debug write - file is :$file:\n" if $debug;

print $f "$memory";

END { close(STDOUT) || die "can't close stdout: $!" }
exit(0);
# --- end of perl code

-----
 Results sed:

-----
 Data file data1 before (inode:  334060):
Edges: 3:0:3 of 1777700 lines in file "data1"
     1	Preliminary Matter.  
     2	
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

real	0m13.351s
user	0m1.384s
sys	0m11.189s

-----
 Data file data1 after  (inode:  334061):
Edges: 3:0:3 of 1777698 lines in file "data1"
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
     4	It was prepared by Professor Eugene F. Irey at the University of Colorado.
     5	Any subsequent copies of this data must include this notice  
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

-----
 Results sed (no -i) into absorb-memory (perl):

-----
 Data file data1 before (inode:  334061):
Edges: 3:0:3 of 1777700 lines in file "data1"
     1	Preliminary Matter.  
     2	
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

real	0m2.890s
user	0m1.212s
sys	0m1.516s

-----
 Data file data1 after  (inode:  334061):
Edges: 3:0:3 of 1777698 lines in file "data1"
     3	This text of Melville's Moby-Dick is based on the Hendricks House edition.
     4	It was prepared by Professor Eugene F. Irey at the University of Colorado.
     5	Any subsequent copies of this data must include this notice  
   ---
1777698	THEY GLIDED BY AS IF WITH PADLOCKS ON THEIR MOUTHS; THE SAVAGE SEA-HAWKS SAILE
1777699	D WITH SHEATHED BEAKS. +ON THE SECOND DAY, A SAIL DREW NEAR, NEARER, AND PIC
1777700	KED ME UP AT LAST. +IT WAS THE DEVIOUS-CRUISING +RACHEL, THAT IN HER RETRACIN

This is a busy output, but there are some items of interest. First, one can use essentially the same sed command (omitting the "-i") as for a sed that knows about "-i". Second, the times are noticeably better for the in-memory version. Third, note that the inodes are different in the case of sed, proving that a temporary file was used, and was renamed as the main input file. For the absorb-memory case, the inode is that same.

The same perl code can make any standard utility(STDIN, STDOUT compatible) into a utility that can do in-place processing.

Best wishes ... cheers, drl

drl

View Public Profile for drl

Find all posts by drl

06-17-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

It seems to me any solution should always make use a temporary intermediate file for safety reasons. If we read the whole file into memory and then write it back to the same file, we run the risk of losing the original in case of power failure during the write-back phase..

With a temporary file there is only mv involved, which is only a rename if the temporary file is in the same dir on the same file system, so a temporary file in the same directory instead of /tmp for example may be preferably. If we use /tmp for example for the intermediate file, a temporary rename to .bak of the original until the move from /tmp may be required and safest will probably be to keep the .bak until the user deletes it..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

UNIX for Advanced & Expert Users

Delete first 100 lines from a BIG File

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to copy only some lines from very big file?

Discussion started by: phamnu

2. Shell Programming and Scripting

Want to extract certain lines from big file

Discussion started by: mad man

3. UNIX for Dummies Questions & Answers

Delete records from a big file based on some condition

Discussion started by: hemantraijain

4. Shell Programming and Scripting

Delete rows from big file

Discussion started by: Tibbeche

5. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Discussion started by: krishnix

6. Shell Programming and Scripting

Re: Deleting lines from big file.

Discussion started by: dipeshvshah

7. Shell Programming and Scripting

Print #of lines after search string in a big file

Discussion started by: prash184u

8. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Discussion started by: necroman08

9. Solaris

delete first 100 lines from a file

Discussion started by: salaathi

10. Solaris

delete first 100 lines rather than zero out of file

Discussion started by: thepurple