Which cut command is more efficient?


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Which cut command is more efficient?
# 8  
Old 03-23-2011
Quote:
Originally Posted by sumoka
combining cat & Cut will cause spawning and will utilize more CPU threads which is fine for smaller files.
Not so much "fine" as "negligible".
Quote:
In case of bigger files as in my case, it is better to directly operate the Cut command on the file. This will result in optimum CPU utilization.
Correct. It's a bad habit in general -- test data tends to be small so the problem isn't apparent, only when you make it do real work will you run into trouble.
This User Gave Thanks to Corona688 For This Post:
# 9  
Old 03-23-2011
@Corona688
Yes, 36 cores (9x4). CPU power not an issue. Regularly running over 30,000 concurrent processes.

Bottleneck on reading large files is invariably the disc system, closely followed by the software. This is where reading moderate size files with "cat" scores over the read function in some unix utilities. I recognise that "cut" is actually one of the better ones.

For the advanced user with large data files I am not averse to using "dd" or "cpio" (or both) to read from the disc in an optimum manner.

On a single core system running ancient unix it was very important to minimise the number of concurrent processes. This is really not the case nowadays unless you happen to be running unix on a home system.

Back to the O/P.
The conventional answer is that running more processes is less efficient. On a modern large system with multiple processors (i.e. the norm) it can be more efficient to run a pipeline of multiple efficient processes than to run a single inefficient process.
The "Useless use of cat" brigade have clearly never used a modern computer where apparent inefficiencies are in fact covered by proper utilisation of the software and hardware as a team.
By applying lateral thought we can deduce that hardware design evolution is actually targetted towards making inefficient processes efficient. We can take advantage of that by tactical use of the previously-inefficent processes.

Nuff said.

Last edited by methyl; 03-23-2011 at 07:07 PM.. Reason: spellin, verbosity
# 10  
Old 03-23-2011
Quote:
Originally Posted by methyl
For the advanced user with large data files I am not averse to using "dd" or "cpio" (or both) to read from the disc in an optimum manner.
That hardly compares to the 'cat' being run here, untuned and untunable. You're doubling the amount of work done for a <1% improvement in speed -- and with 30,000 concurrent processes, that's time something else probably could've used.
Quote:
On a single core system running ancient unix it was very important to minimise the number of concurrent processes. This is really not the case nowadays unless you happen to be running unix on a home system.
In my early shell-scripting days I wrote scripts that I'm sure would need your 36 processors to function with any efficiency Smilie Having CPU power to waste hardly makes it a good idea to do so.
# 11  
Old 03-24-2011
can you suggest one solid link or book that covers these basics...?
# 12  
Old 03-27-2011
Quote:
In my early shell-scripting days I wrote scripts that I'm sure would need your 36 processors to function with any efficiency Having CPU power to waste hardly makes it a good idea to do so.
My stats do not show waste of CPU power. They show a CPU power saving by using "cat" because "cut" is less efficient at reading files. However in a single stream environment loading multiple processes in a long pipeline would have been a performance disaster.

In my early days or unix I have dealt with system crashes caused by for example: too many concurrent processes; too many forks; disc buffer overload; mysterious kernel crash etc. . It's hard to even generate these situations on modern systems after an initial large-scale kernel build.


There is a very good O'Reilly book "System Performance Tuning" but do bear in mind that it does not cover very large unix systems adequately.
# 13  
Old 03-28-2011
Quote:
Originally Posted by methyl
My stats do not show waste of CPU power. They show a CPU power saving by using "cat" because "cut" is less efficient at reading files.
What makes you arrive at the fact that..."cut" is less efficient at reading files than cat.
# 14  
Old 03-28-2011
Quote:
Originally Posted by methyl
My stats do not show waste of CPU power. They show a CPU power saving by using "cat" because "cut" is less efficient at reading files.
So reading the file, writing it to a pipe, and reading from the pipe, utilizing two separate CPU's simultaneously is more efficient than reading it once and using it once? If your CPU benchmarks show that this uses less CPU, frankly, they're wrong. Less total real time maybe, but nothing in that reduces the amount of CPU cut uses -- adding more commands can only add more CPU utilization.

The only performance benefit I can see is the pipe effectively acts as a read-ahead buffer, albeit a highly expensive one. With the power expended for that 1% performance improvement, how much more actual work could have been done instead by running two instances of cut on different data sets?

---------- Post updated at 11:08 AM ---------- Previous update was at 10:58 AM ----------

Quote:
Originally Posted by shamrock
What makes you arrive at the fact that..."cut" is less efficient at reading files than cat.
cut has to read line by line. cat can just read and write huge blocks.

Last edited by Corona688; 03-28-2011 at 02:14 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Efficient way to combine command

im currently running the following command to grab all arguments in front of a script, directly from the process table. # cat /tmp/allmyprocs ubuntu 9933 27793 0 03:29 pts/0 00:00:00 /bin/sh ./prying.sh ubuntu 9941 9933 0 03:29 pts/0 00:00:00 sh ubuntu 9952 9941 0 03:29... (1 Reply)
Discussion started by: SkySmart
1 Replies

2. Shell Programming and Scripting

Combining awk command to make it more efficient

VARIABLE="jhovan 5259 5241 0 20:11 ? 00:00:00 /proc/self/exe --type=gpu-process --channel=5182.0.1597089149 --supports-dual-gpus=false --gpu-driver-bug-workarounds=2,45,57 --disable-accelerated-video-decode --gpu-vendor-id=0x80ee --gpu-device-id=0xbeef --gpu-driver-vendor... (3 Replies)
Discussion started by: SkySmart
3 Replies

3. UNIX for Beginners Questions & Answers

Cut command: can't make it cut fields

I'm a complete beginner in UNIX (and not a computer science student either), just undergoing a tutoring course. Trying to replicate the instructions on my own I directed output of the ls listing command (lists all files of my home directory ) to My_dir.tsv file (see the screenshot) to make use of... (9 Replies)
Discussion started by: scrutinizerix
9 Replies

4. Shell Programming and Scripting

Cut command

hi, i have a file abc,"an,ab",cde,efg abc,anab,cde,efg and need to cut the second field so the output should be abc,cde,efg and i have used cut -d',' -f1-1,3- but its giving me abc,ab",cde,efg abc,cde,efg (4 Replies)
Discussion started by: ATWC
4 Replies

5. UNIX for Dummies Questions & Answers

Cut pid from ps using cut command

hay i am trying to get JUST the PID from the ps command. my command line is: ps -ef | grep "mintty" | cut -d' ' -f2 but i get an empty line. i assume that the delimiter is not just one space character, but can't figure out what should i do in order to do that. i know i can use awk or cut... (8 Replies)
Discussion started by: ran ber
8 Replies

6. Shell Programming and Scripting

Cut Command error cut: Bad range

Hi Can anyone what I am doing wrong while using cut command. for f in *.log do logfilename=$f Log "Log file Name: $logfilename" logfile1=`basename $logfilename .log` flength=${#logfile1} Log "file length $flength" from_length=$(($flength - 15)) Log "from... (2 Replies)
Discussion started by: dgmm
2 Replies

7. Shell Programming and Scripting

Help with cut command

Gurus, I need help with the cut command : I have a file with garbage charaters at the beginning of each record; but these characters are not of the same length; First record has 3 garbage chars to be removed; rest have 2; If the length was consistent across all the records, I could have... (3 Replies)
Discussion started by: tru_tell
3 Replies

8. Shell Programming and Scripting

Cut command

Hi, I want to cut from a particular position to a particular position and retain the rest. I tried this cut -c31-51 file1.txt > file2.txt But The characters from the position 31 to 51 were only present in file2.txt. Is there a way to reverse this i.e to retain the rest except from... (1 Reply)
Discussion started by: ragavhere
1 Replies

9. UNIX for Dummies Questions & Answers

cut command

how do you show just the used disk space. using the cut and df command?? or does anyone have any other suggestions on how to do it a better way? (3 Replies)
Discussion started by: rookie22
3 Replies
Login or Register to Ask a Question