Explanation for interesting sed behaviour?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Explanation for interesting sed behaviour?
# 1  
Old 09-24-2009
Question Explanation for interesting sed behaviour?

This is my first post so hi to you all. I have browsed these forums in the past and what a great community and resource this is! Thanks to all the contributors ... I look forward to being able to give something back.

In the meantime, I have a little conundrum concerning sed. My very simple script is as follows ...

for file in `find . -type f -print | xargs file | grep ASCII | cut -d: -f1`
do
cat $file | sed 's/phrase/substitute/' > $file
done

I know this isn't the best way of doing things and now use the -i switch with sed to achieve the same end which works fine.

When this version is run, it appears to randomly leave the odd file empty (i.e. - zero bytes in size). I've run it on a directory containing literally only two files ... the first few runs go fine, then suddenly, one of the files becomes zero bytes. The other follows some random (small) number of runs later.

Problem is, I'm being hassled to provide an explanation as to why this happens. My guess is that it's got something to do with the interaction between the tool and the OS (Linux) and the way the files are streamed between cat, sed, and the redirection but I don't have any real evidence to back this up.

I was hoping somebody here would be able to provide a more concrete explanation of why I might be seeing this behaviour.

Many thanks in advance.

Gavin
# 2  
Old 09-24-2009
I think you've got yourself a nice little race condition. In most cases, cat can read the whole file before the redirection from sed opens the file (and thus truncates it). But ever so often, be it because of the file size, scheduling, or cosmic rays, it's not fast enough. Then sed truncates the file before cat has a chance to read it (in part or fully).

I'd suggest you rewrite it to this:
Code:
mv ${file} ${file}.TMP
sed 's/phrase/substitute/' ${file}.TMP > ${file}
rm ${file}.TMP

The difference to the '-i' switch is that it's portable across all versions of sed.
# 3  
Old 09-24-2009
Code:
for file in `find . -type f -print | xargs file | grep ASCII | cut -d: -f1`; do
     perl -pi -e 's/phrase/substitute/g' $file 
done

# 4  
Old 09-24-2009
I like to do something like this

Code:
sed 's/phrase/substitute/' ${file} > ${file}.TMP && /bin/mv -f ${file}.TMP ${file}

The '&&' makes the second part (mv -f) only execute when the first part worked fine, thus preventing you from accidentally overwriting the original file.
# 5  
Old 09-24-2009
Thanks guys for the quick replies.

In response to pludi's explanation, I have been going through a mental experiment with pencil and paper to figure out the sequence of events with this line of code:
cat $file | sed 's/phrase/substitute/' > $file
I can understand how this might result in truncated files (which I have also seen). Essentially, if the redirection begins writing back to the file before the cat command had finished buffering it, I can see how we could loose the end of the file.

But that still doesn't explain (in my mind at least) how I could end up with a file of zero bytes in size. Surely, for this to happen, the redirection would have written (opened) the file before cat had even started reading it?!?! Is this possible?

Unfortunately, my knowledge of process scheduling and file IO in Linux is extremely limited so I'm not entirely sure.

Gavin
# 6  
Old 09-24-2009
I'm no expert, either, by any means, but here's my interpretation of it:
  1. The shell fork()s off a new process, redirects stdout to a pipe, and then exec()s cat
  2. Meanwhile, since the forked process runs in parallel, a second process is fork()ed off, has stdin redirected to use the same pipe, stdout redirected to a file, and exec()s sed
  3. If the first exec is delayed for any reason it's possible that the file redirection/trucation takes place before cat can even start to read the file. When it gets around to reading it, it sees an empty file.
# 7  
Old 09-24-2009
Many thanks pludi ... that makes more sense now. I had wondered about the parallelism of the statement but wasn't entirely sure how it would be treated.

I think I had assumed that the implied dependency of the output process on the input process would be understood by the scheduler but maybe it isn't that clever.

Gavin
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Interesting awk/Perl/sed parsing challenge

I have a log with entries like: out/target/product/imx53_smd/obj/STATIC_LIBRARIES/libwebcore_intermediates/Source/WebCore/bindings/V8HTMLVideoElement.cpp : target thumb C++: libwebcore <=... (8 Replies)
Discussion started by: glev2005
8 Replies

2. Shell Programming and Scripting

sed command explanation

Will someone give me an explanation on how the sed command below works. sed 's/.*//' Thanks! (3 Replies)
Discussion started by: scj2012
3 Replies

3. Shell Programming and Scripting

interesting grep behaviour

I suppose that this is not actually a script question, but I noticed this while working on a bash script homework assignment and I have been impressed with the quality of posts here -- so that is why I posted it here. I have this text file named textfile: total 40 -rwxr-xr-x 1 joeblow... (2 Replies)
Discussion started by: landog
2 Replies

4. UNIX for Dummies Questions & Answers

SED command explanation

can someone please explain the below sed command.. sed 's/\(*|\)\(.*\)/\2\1/' (6 Replies)
Discussion started by: raghu_shekar
6 Replies

5. Shell Programming and Scripting

strange behaviour from sed???

Hi all, I want to do a very simple thing with sed. I want to print out the line number of a disk I have defined in /etc/exports, so I do: It's all good, but here's the problem. When I define md0 in a variable, I get nothing from sed: Why is that? can anybody please help? Thanks (2 Replies)
Discussion started by: alirezan
2 Replies

6. Shell Programming and Scripting

A sed doubt - need explanation

Hi, The following command works fine for me, but I could not grasp the logic working behind of sed command, it's obscure to me :( :confused: echo "./20080916/core/audioex.amr" | sed "s%\(\)/%\1_%g" o/p: ./20080916_core_audioex.amr Could anyone please explain to me in detail, that how... (6 Replies)
Discussion started by: royalibrahim
6 Replies

7. Shell Programming and Scripting

Weird sed behaviour in script

I've written a small script to replace certain words in all the the files in a directory. #!/bin/sh #Get list of files to be edited file_list=`ls -p` for i in $file_list do echo "Processing $i" alteredi=`echo "$i" | sed -e 's/\//d/'` if then if then #actual altering (2 Replies)
Discussion started by: Peetrus
2 Replies

8. UNIX for Advanced & Expert Users

Strange sed behaviour

$ echo a.bc | sed -e "s/\|/\\|/g" |a|.|b|c| $ Is the behavior of the sed statement expected ? Or is this a bug in sed ? OS details Linux 2.6.9-55.0.0.0.2.ELsmp #1 SMP Wed May 2 14:59:56 PDT 2007 i686 i686 i386 GNU/Linux (8 Replies)
Discussion started by: vino
8 Replies

9. Shell Programming and Scripting

sed command explanation needed

Hi, Could you please explain me the below statement -- phrase wise. sed -e :a -e '$q;N;'$cnt',$D;ba' abc.txt > xyz.txt if suppose $cnt contains value: 10 it copies last 9 lines of abc.txt to xyz.txt why it is copying last 9 rather than 10. and also what is ba and $D over there in... (4 Replies)
Discussion started by: subbukns
4 Replies

10. Shell Programming and Scripting

any explanation for thsi shell script behaviour

hello whats the difference between excuting a shell script as a)sh myscript.sh b). ./myscript.sh i noticed that my shell script works fine when i run it as . ./myscript .sh but fails when i run it as sh myscript.sh could anybody explain why. the shell script is very simple ... (9 Replies)
Discussion started by: xiamin
9 Replies
Login or Register to Ask a Question