quickest way to get the total number of lines in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting quickest way to get the total number of lines in a file
# 8  
Old 10-01-2012
Code:
sed -n '$=' input_file

This User Gave Thanks to msabhi For This Post:
# 9  
Old 10-01-2012
I think wc -l file is a efficient way of counting lines as compared to other methods discussed above..Smilie
This User Gave Thanks to pamu For This Post:
# 10  
Old 10-01-2012
IF you start testing methods on a file be aware of the effect file caching by the OS and disk controllers. You will get completely bogus results if you are not aware of this. I/O wait time is the biggest time consumer. Disks are at the very best 10 times slower than memory unless you have SSD.

Pretend you try sed and get this answer:
Code:
time sed -n '$=' input_file
real    0m2.098s
user    0m0.516s
sys     0m0.338s

Great - that took 2.098 seconds of wall time.
Let's try wc -l
Code:
time wc -l input_file
real    0m0.778s
user    0m0.416s
sys     0m0.338s

Wow. wc -l was faster.

No. A lot of the file data was still in cache. So there was no I/O wait. Why, because you ran against the same file. As you read thru a file the system will attempt to cache all or parts of it, depending on available resources.

The file data in the cache slowly goes away as other users read/write the same disk. After a while the file is no longer cached. How long that is, I cannot say. Solaris will use part of free memory as file cache, so will Linux. Add this to what the disk controller caches and some large chunks of really huge files can be in memory.

SAN storage behaves in a similar way, but is a lot more complex. SAN is generally slower than direct disk, then some systems have the faster directio options, the fastest storage is raw disk (bypassing the filesystem and kernel code for filesystem support). Oracle will do this for its database files if configured.

Also you can tune a filesystem.

If you have to speed up file I/O look into SSD for desktops.
These 2 Users Gave Thanks to jim mcnamara For This Post:
# 11  
Old 10-01-2012
Quote:
Originally Posted by jim mcnamara
IF you start testing methods on a file be aware of the effect file caching by the OS and disk controllers. You will get completely bogus results if you are not aware of this. I/O wait time is the biggest time consumer. Disks are at the very best 10 times slower than memory unless you have SSD.

Pretend you try sed and get this answer:
Code:
time sed -n '$=' input_file
real    0m2.098s
user    0m0.516s
sys     0m0.338s

Great - that took 2.098 seconds of wall time.
Let's try wc -l
Code:
time wc -l input_file
real    0m0.778s
user    0m0.416s
sys     0m0.338s

Wow. wc -l was faster.

No. A lot of the file data was still in cache. So there was no I/O wait. Why, because you ran against the same file. As you read thru a file the system will attempt to cache all or parts of it, depending on available resources.

The file data in the cache slowly goes away as other users read/write the same disk. After a while the file is no longer cached. How long that is, I cannot say. Solaris will use part of free memory as file cache, so will Linux. Add this to what the disk controller caches and some large chunks of really huge files can be in memory.

SAN storage behaves in a similar way, but is a lot more complex. SAN is generally slower than direct disk, then some systems have the faster directio options, the fastest storage is raw disk (bypassing the filesystem and kernel code for filesystem support). Oracle will do this for its database files if configured.

Also you can tune a filesystem.

If you have to speed up file I/O look into SSD for desktops.
thank you so much for the detailed explanation. i've always wondered why sometimes i get faster response and other times i get a much slower response when running the same command on a file. now i know. thanks a million.
# 12  
Old 10-01-2012
Quote:
Originally Posted by jim mcnamara
IF you start testing methods on a file be aware of the effect file caching by the OS and disk controllers. You will get completely bogus results if you are not aware of this. I/O wait time is the biggest time consumer. Disks are at the very best 10 times slower than memory unless you have SSD.

Pretend you try sed and get this answer:
Code:
time sed -n '$=' input_file
real    0m2.098s
user    0m0.516s
sys     0m0.338s

Great - that took 2.098 seconds of wall time.
Let's try wc -l
Code:
time wc -l input_file
real    0m0.778s
user    0m0.416s
sys     0m0.338s

Wow. wc -l was faster.

No. A lot of the file data was still in cache. So there was no I/O wait. Why, because you ran against the same file. As you read thru a file the system will attempt to cache all or parts of it, depending on available resources.

The file data in the cache slowly goes away as other users read/write the same disk. After a while the file is no longer cached. How long that is, I cannot say. Solaris will use part of free memory as file cache, so will Linux. Add this to what the disk controller caches and some large chunks of really huge files can be in memory.

SAN storage behaves in a similar way, but is a lot more complex. SAN is generally slower than direct disk, then some systems have the faster directio options, the fastest storage is raw disk (bypassing the filesystem and kernel code for filesystem support). Oracle will do this for its database files if configured.

Also you can tune a filesystem.

If you have to speed up file I/O look into SSD for desktops.
Yeah i thought so when i tested...varying times..Very good food for our thoughts Jim...thanks..
# 13  
Old 10-01-2012
Hi.

See also post at https://www.unix.com/shell-programmin...ines-file.html for some additional timings ... cheers, drl
This User Gave Thanks to drl For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to find count total number of pattern in a file …?

How to find count total number of pattern in a file … File contains : a.txt ------------- aaa bbb nnn ccc aaa bbb aaa ddd aaa aaa aaa aaa grep -c aaa a.txt Op: 4 ( But my requirement is should count the total no of patterns as 7 ) (4 Replies)
Discussion started by: Jitten
4 Replies

2. UNIX for Dummies Questions & Answers

Write the total number of rows in multiple files into another file

Hello Friends, I know you all are busy and inteligent too... I am stuck with one small issue if you can help me then it will be really great. My problem is I am having some files i.e. Input.txt1 Input.txt2 Input.txt3 Now my task is I need to check the total number of rows in... (4 Replies)
Discussion started by: malaya kumar
4 Replies

3. Shell Programming and Scripting

Help with sum total number of record and total number of record problem asking

Input file SFSQW 5192.56 HNRNPK 611.486 QEQW 1202.15 ASDR 568.627 QWET 6382.11 SFSQW 4386.3 HNRNPK 100 SFSQW 500 Desired output file SFSQW 10078.86 3 QWET 6382.11 1 QEQW 1202.15 1 HNRNPK 711.49 2 ASDR 568.63 1 The way I tried: (2 Replies)
Discussion started by: patrick87
2 Replies

4. Shell Programming and Scripting

Select lines in which column have value greater than some percent of total file lines

i have a file in following format 1 32 3 4 6 4 4 45 1 45 4 61 54 66 4 5 65 51 56 65 1 12 32 85 now here the total number of lines are 8(they vary each time) Now i want to select only those lines in which the values... (6 Replies)
Discussion started by: vaibhavkorde
6 Replies

5. Shell Programming and Scripting

perl script on how to count the total number of lines of all the files under a directory

how to count the total number of lines of all the files under a directory using perl script.. I mean if I have 10 files under a directory then I want to count the total number of lines of all the 10 files contain. Please help me in writing a perl script on this. (5 Replies)
Discussion started by: adityam
5 Replies

6. Shell Programming and Scripting

Removing lines from large files.. quickest method?

Hi I have some files that contain be anything up to 100k lines - eg. file100k I have another file called file5k and I need to produce filec which will contain everything in file100k minus what matches in file 5k.. ie. File100k contains 1FP 2FP 3FP File5k contains 2FP I would... (2 Replies)
Discussion started by: frustrated1
2 Replies

7. Shell Programming and Scripting

Appending line number to each line and getting total number of lines

Hello, I need help in appending the line number of each line to the file and also to get the total number of lines. Can somebody please help me. I have a file say: abc def ccc ddd ffff The output should be: Instance1=abc Instance2=def Instance3=ccc Instance4=ddd Instance5=ffff ... (2 Replies)
Discussion started by: chiru_h
2 Replies

8. Shell Programming and Scripting

total number of lines in a file

Hi , How about find the total number of lines in a file ? How can i do that with the "grep" command ? (9 Replies)
Discussion started by: Raynon
9 Replies

9. Shell Programming and Scripting

total number of lines

Hi have following file |abcd 2|abcd |sdfh |sdfj I want to find total number of files haivng nothing in feild 1 using awk will command awk -F "|" '( $1=="") {print NR}' test_awk will work??? (4 Replies)
Discussion started by: mahabunta
4 Replies

10. Shell Programming and Scripting

Total of lines w/out header and footer incude for a file

I am trying to get a total number of tapes w/out headers or footers in a ERV file and append it to the file. For some reason I cannot get it to work. Any ideas? #!/bin/sh dat=`date +"%b%d_%Y"` + date +%b%d_%Y dat=Nov16_2006 tapemgr="/export/home/legato/tapemgr/rpts"... (1 Reply)
Discussion started by: gzs553
1 Replies
Login or Register to Ask a Question