Reading ALL BUT the first and last line of a huge file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Reading ALL BUT the first and last line of a huge file
# 1  
Old 03-30-2016
Reading ALL BUT the first and last line of a huge file

Hi.

Pardon me if I'm posting a duplicate thread but..
I have a text file with over 150 Million records, file size is in the range if MB(close to GB).
The requirement is to read ALL the lines excepting the FIRST LINE which is the file header and the LAST LINE which is it's trailer record.

What is the most OPTIMUM way to do it?
I'm aware that the SED solution will take a significantly long time to process such a huge file, hence I'm not opting it.

Please advise.

Thanks.

Warm Regards,
Kumarjit.
# 2  
Old 03-30-2016
You're not really giving much information, but you could always start by keeping a count of the record being processed and throw out the first and last.

Code:
wc -l inputfile | read totalrecs

Will give you the total number of records in the file. So, ...

Code:
 
recordCount=0
wc -l filename | read totalrecs
cat filename | while read rec
do
  ((recordCount+=1))
  if [[ $recordCount == 1 ]] ; then
     continue ;
  fi
  if [[ $recordCount == $totalrecs ]] ; then
     break;
  fi
# ... your other processing goes here
done


Last edited by jazzman58; 03-30-2016 at 10:23 AM.. Reason: Bad example
# 3  
Old 03-30-2016
try also:
Code:
wc -l infile | read l ; awk 'NR>1 && NR<l' l=$l infile > newfile

# 4  
Old 03-30-2016
I'm not sure why the sed solution (which, BTW?) should take significantly longer than the other ones posted. Would you mind to post some comparisons?
These 2 Users Gave Thanks to RudiC For This Post:
# 5  
Old 03-30-2016
Tested with a 2 GB file (excluding writing to a file which should be similar for all approaches):
Code:
$ time sed '1d;$d' greptestin1 > /dev/null

real	0m29.835s
user	0m29.186s
sys	0m0.591s
$ time awk 'NR>2{print p}{p=$0}' greptestin1 > /dev/null    # BSD awk

real	1m44.183s
user	1m43.627s
sys	0m0.481s
$ time mawk 'NR>2{print p}{p=$0}' greptestin1 > /dev/null

real	0m14.982s
user	0m14.463s
sys	0m0.498s
$ time gawk 'NR>2{print p}{p=$0}' greptestin1 > /dev/null

real	0m24.682s
user	0m24.210s
sys	0m0.414s
$ time gawk4 'NR>2{print p}{p=$0}' greptestin1 > /dev/null

real	0m27.621s
user	0m27.173s
sys	0m0.419s

This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 03-30-2016
If the shell reads the first line, then the loop has a condition less:
Code:
time { read header; sed '$d'; } < greptestin1 > /dev/null
time { read header; perl -pe '{exit if eof}'; } < greptestin1 > /dev/null

# 7  
Old 03-30-2016
Then I get these results:
Code:
$ time { read header; sed '$d'; } < greptestin1 > /dev/null

real	0m31.812s
user	0m30.796s
sys	0m0.658s

$ time { read header; perl -pe '{exit if eof}'; } < greptestin1 > /dev/null

real	0m20.205s
user	0m19.719s
sys	0m0.472s

$ time perl -ne 'print unless ($.==1 || eof)' greptestin1 > /dev/null

real	0m20.225s
user	0m19.600s
sys	0m0.490s

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Need to replace new line characters in a huge file

Hi , I would like to replace new line characters(\n) in a huge file of about 2 million records . I tried this one (:%s/\n//g) but it's hanging there and no result. Does this command do not work if the file is big. Please let me know if you have any other options Regards Raj (1 Reply)
Discussion started by: rajeevm
1 Replies

2. Shell Programming and Scripting

Reading line by line from live log file using while loop and considering only those lines start from

Hi, I want to read a live log file line by line and considering those line which start from time stamp; Below code I am using, which read line but throws an exception when comparing line that does not contain error code tail -F /logs/COMMON-ERROR.log | while read myline; do... (2 Replies)
Discussion started by: ketanraut
2 Replies

3. Shell Programming and Scripting

Edit a Huge one line file

We have a huge file which has just one really large line; about 500 MB. I want to 1. Count all the occurrences of a phrase 2. Replace the phrase with another. Trying to open it using vi has not helped as it complains that it is too large. Can any script help? Please advise. Thank you, (12 Replies)
Discussion started by: kaushikadya
12 Replies

4. Shell Programming and Scripting

splitting a huge line of file into multiple lines with fixed number of columns

Hi, I have a huge file with a single line. But I want to break that line into lines of with each line having five columns. My file is like this: code: "hi","there","how","are","you?","It","was","great","working","with","you.","hope","to","work","you." I want it like this: code:... (1 Reply)
Discussion started by: rajsharma
1 Replies

5. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised... (7 Replies)
Discussion started by: manishkomar007
7 Replies

6. Shell Programming and Scripting

Implement in one line sed or awk having no delimiter and file size is huge

I have file which contains around 5000 lines. The lines are fixed legth but having no delimiter.Each line line contains nearly 3000 characters. I want to delete the lines a> if it starts with 1 and if 576th postion is a digit i,e 0-9 or b> if it starts with 0 or 9(i,e header and footer) ... (4 Replies)
Discussion started by: millan
4 Replies

7. Shell Programming and Scripting

[Solved] Problem in reading a file line by line till it reaches a white line

So, I want to read line-by-line a text file with unknown number of files.... So: a=1 b=1 while ; do b=`sed -n '$ap' test` a=`expr $a + 1` $here do something with b etc done the problem is that sed does not seem to recognise the $a, even when trying sed -n ' $a p' So, I cannot read... (3 Replies)
Discussion started by: hakermania
3 Replies

8. UNIX for Dummies Questions & Answers

How to remove FIRST Line of huge text file on Solaris

i need help..!!!! i have one big text file estimate data file size 50 - 100GB with 70 Mega Rows. on OS SUN Solaris version 8 How i can remove first line of the text file. Please suggest me for solutions. Thank you very much in advance:) (5 Replies)
Discussion started by: madoatz
5 Replies

9. UNIX for Advanced & Expert Users

Insert a line as the first line into a very huge file

Hello, I need to insert a line (like a header) as the first line of a very huge file (about 3 ml rows). I am able to do it with sed, but redirecting the output and creating a new file takes quite some time. I was wondering if there was a more efficient way of doing it? Any help would be... (3 Replies)
Discussion started by: shriek
3 Replies
Login or Register to Ask a Question