Reading ALL BUT the first and last line of a huge file

03-30-2016

Registered User

59, 1

Join Date: Jan 2013

Last Activity: 19 December 2017, 10:27 AM EST

Posts: 59

Thanks Given: 6

Thanked 1 Time in 1 Post

Reading ALL BUT the first and last line of a huge file

Hi.

Pardon me if I'm posting a duplicate thread but..
I have a text file with over 150 Million records, file size is in the range if MB(close to GB).
The requirement is to read ALL the lines excepting the FIRST LINE which is the file header and the LAST LINE which is it's trailer record.

What is the most OPTIMUM way to do it?
I'm aware that the SED solution will take a significantly long time to process such a huge file, hence I'm not opting it.

Please advise.

Thanks.

Warm Regards,
Kumarjit.

kumarjt

View Public Profile for kumarjt

Find all posts by kumarjt

03-30-2016

Registered User

2, 0

Join Date: Mar 2016

Last Activity: 17 August 2016, 8:44 AM EDT

Location: Chicago

Posts: 2

Thanks Given: 0

Thanked 0 Times in 0 Posts

You're not really giving much information, but you could always start by keeping a count of the record being processed and throw out the first and last.

Code:

wc -l inputfile | read totalrecs

Will give you the total number of records in the file. So, ...

Code:

 
recordCount=0
wc -l filename | read totalrecs
cat filename | while read rec
do
  ((recordCount+=1))
  if [[ $recordCount == 1 ]] ; then
     continue ;
  fi
  if [[ $recordCount == $totalrecs ]] ; then
     break;
  fi
# ... your other processing goes here
done

Last edited by jazzman58; 03-30-2016 at 10:23 AM.. Reason: Bad example

jazzman58

View Public Profile for jazzman58

Find all posts by jazzman58

03-30-2016

Read Only

1,278, 486

Join Date: Sep 2012

Last Activity: 27 February 2020, 8:59 PM EST

Location: Houston, Texas, USA

Posts: 1,278

Thanks Given: 0

Thanked 486 Times in 451 Posts

try also:

Code:

wc -l infile | read l ; awk 'NR>1 && NR<l' l=$l infile > newfile

rdrtx1

View Public Profile for rdrtx1

Find all posts by rdrtx1

03-30-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

I'm not sure why the sed solution (which, BTW?) should take significantly longer than the other ones posted. Would you mind to post some comparisons?

These 2 Users Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

03-30-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Tested with a 2 GB file (excluding writing to a file which should be similar for all approaches):

Code:

$ time sed '1d;$d' greptestin1 > /dev/null

real	0m29.835s
user	0m29.186s
sys	0m0.591s
$ time awk 'NR>2{print p}{p=$0}' greptestin1 > /dev/null    # BSD awk

real	1m44.183s
user	1m43.627s
sys	0m0.481s
$ time mawk 'NR>2{print p}{p=$0}' greptestin1 > /dev/null

real	0m14.982s
user	0m14.463s
sys	0m0.498s
$ time gawk 'NR>2{print p}{p=$0}' greptestin1 > /dev/null

real	0m24.682s
user	0m24.210s
sys	0m0.414s
$ time gawk4 'NR>2{print p}{p=$0}' greptestin1 > /dev/null

real	0m27.621s
user	0m27.173s
sys	0m0.419s

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

03-30-2016

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

If the shell reads the first line, then the loop has a condition less:

Code:

time { read header; sed '$d'; } < greptestin1 > /dev/null
time { read header; perl -pe '{exit if eof}'; } < greptestin1 > /dev/null

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

03-30-2016

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Then I get these results:

Code:

$ time { read header; sed '$d'; } < greptestin1 > /dev/null

real	0m31.812s
user	0m30.796s
sys	0m0.658s

$ time { read header; perl -pe '{exit if eof}'; } < greptestin1 > /dev/null

real	0m20.205s
user	0m19.719s
sys	0m0.472s

$ time perl -ne 'print unless ($.==1 || eof)' greptestin1 > /dev/null

real	0m20.225s
user	0m19.600s
sys	0m0.490s

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Reading ALL BUT the first and last line of a huge file

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Need to replace new line characters in a huge file

Discussion started by: rajeevm

2. Shell Programming and Scripting

Reading line by line from live log file using while loop and considering only those lines start from

Discussion started by: ketanraut

3. Shell Programming and Scripting

Edit a Huge one line file

Discussion started by: kaushikadya

4. Shell Programming and Scripting

splitting a huge line of file into multiple lines with fixed number of columns

Discussion started by: rajsharma

5. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Discussion started by: manishkomar007

6. Shell Programming and Scripting

Implement in one line sed or awk having no delimiter and file size is huge

Discussion started by: millan

7. Shell Programming and Scripting

[Solved] Problem in reading a file line by line till it reaches a white line

Discussion started by: hakermania

8. UNIX for Dummies Questions & Answers

How to remove FIRST Line of huge text file on Solaris

Discussion started by: madoatz

9. UNIX for Advanced & Expert Users

Insert a line as the first line into a very huge file

Discussion started by: shriek