sed and cut behaving differently


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed and cut behaving differently
# 8  
Old 05-01-2010
Just try stripping off first 2 characters using sed and using cut. Check the difference in the results.
# 9  
Old 05-01-2010
Quote:
Originally Posted by amicon007
But



is behaving differently and removing more than expected characters.
I can not confirm that, both behave like they should:
Code:
$ cut -c3- sample.conf > sample-cut.conf
$ sed 's/^..//' sample.conf > sample-sed.conf
$ diff sample-cut.conf sample-sed.conf
$

# 10  
Old 05-01-2010
I am having the difference:
Quote:
$ diff sample-cut.conf sample-sed.conf
1,10c1,10
< 147405037|44846|44846|8705|20100401000000|20100516000000|20100408220743|20100523235959|2010040822074 3|||20100326014658|S|15092360154
< 26537555|44849|44849|8705|20100401000000|20100516000000||||||20100326014658|S|15077793658
< 230042230|44857|44857|8705|20100401000000|20100516000000||||||20100326014658|S|15098928810
< 43398728|44848|44848|8705|20100401000000|20100516000000|20100401092126|20100516235959|20100401092126 |||20100326014658|S|15080179924
< 236218845|44848|44848|8705|20100401000000|20100516000000||||||20100326014658|S|15100098523
< 22029612|44859|44859|8705|20100401000000|20100516000000|20100402165043|20100517235959|20100402165043 |||20100326014658|S|15077092386
< 242395460|44846|44846|8705|20100401000000|20100516000000||||||20100326014658|S|15100863598
< 121527978|44846|44846|8705|20100401000000|20100516000000||||||20100326014658|S|15088997374
< 254748690|44846|44846|8705|20100401000000|20100516000000||||||20100326014658|S|15103592530
< 146234438|44846|44846|8705|20100401000000|20100516000000|20100415163904|20100530235959|2010041516390 4|||20100326014658|S|15092152331
---
> 47405037|44846|44846|8705|20100401000000|20100516000000|20100408220743|20100523235959|20100408220743 |||20100326014658|S|15092360154
> 6537555|44849|44849|8705|20100401000000|20100516000000||||||20100326014658|S|15077793658
> 30042230|44857|44857|8705|20100401000000|20100516000000||||||20100326014658|S|15098928810
> 3398728|44848|44848|8705|20100401000000|20100516000000|20100401092126|20100516235959|20100401092126| ||20100326014658|S|15080179924
> 36218845|44848|44848|8705|20100401000000|20100516000000||||||20100326014658|S|15100098523
> 2029612|44859|44859|8705|20100401000000|20100516000000|20100402165043|20100517235959|20100402165043| ||20100326014658|S|15077092386
> 42395460|44846|44846|8705|20100401000000|20100516000000||||||20100326014658|S|15100863598
> 21527978|44846|44846|8705|20100401000000|20100516000000||||||20100326014658|S|15088997374
> 54748690|44846|44846|8705|20100401000000|20100516000000||||||20100326014658|S|15103592530
> 46234438|44846|44846|8705|20100401000000|20100516000000|20100415163904|20100530235959|20100415163904 |||20100326014658|S|15092152331
I have SunOS
# 11  
Old 05-01-2010
Obviously your sed doesn't like the leading characters of the input lines, which are "^".
Just try
Code:
sed 's/^.//' sample

If I were you I'd create a fresh file and create a couple test lines like following:
Code:
^@abcd
^@1234
^@9999
^@wxyz

and run sed again and see what happens.
# 12  
Old 05-01-2010
It does recognize:
Quote:
$ cat sam2
^@abcd
^@1234
^@9999
^@wxyz
$ sed 's/^.//' sam2
@abcd
@1234
@9999
@wxyz
There seems some difference with handling of binary character between my sed and cut.
# 13  
Old 05-01-2010
Hi.

I ended up with a long script that compared the results of cut, 3 variations of sed, and the binary editor bbe. I did these on Linux and Solaris 10 (but bbe omitted on Solaris). I used cmp as the first test, and then diff for the detailed comparison. On Linux, GNU diff bails out quickly, simply saying that the binary files differ.

The sed variations that I used were:
Code:
sed 's/^..//' ...
sed 's/^.\{2\}//' ...
sed 's/^.{2}//' ...

They failed on both Linux and Solaris:
Code:
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0

OS, ker|rel, machine: SunOS, 5.10, i86pc

I used the cut output as the standard. The bbe output compared successfully.

Observations:

1) mixed-mode files are not best-practice

2) cut knows about bytes, and appears to use its byte "knowledge" as its character knowledge

3) sed is advertised as:
Code:
sed - stream editor for filtering and transforming text

not necessarily collections of arbitrary bytes in mixed-mode files

4) At the center where I worked, we said that we could make processes (almost) as fast as you desired as long as you didn't care about the results. If you get good results in a reasonable time from a particular process, then use it. You may end up wasting more time trying to find the fastest method than if you just let the original process run. This is a kind of case of premature optimization, along with the notion that people time is the most expensive (in most cases).

5) I did not time bbe, but it might be as fast as sed -- it is:
Code:
bbe is a sed-like editor for binary files. It performs binary transfor-
       mations on the blocks of input stream.

I can post the script and results, however, as I said, they are lengthy. As usual, it is possible that I have incorporated an error of some kind, but in general I agree with the OP ... cheers, drl
# 14  
Old 05-01-2010
A quick peek at sample.conf shows that the first byte in the lines is a null byte. It's a safe bet that what you are seeing is a sideeffect of c string functions interpreting a null byte as end of string.

For experimentation's sake, does the discrepancy persist if you substitute a 001 byte for the 000 bytes?
Code:
tr \\000 \\001 < withnull > withoutnull

If it does not, mystery solved. If it does, weird. Could that sed implementation be filtering out control characters?

Regards,
Alister
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Non printing option in sed is behaving oddly

Hi I'm having a problem with a sed command that I thought I was using correctly but apparently that's not the case. I was hoping someone here could point out what it is I am doing wrong? I am using the print, no print option for a matched pattern in sed. Everything seemed to be working fine... (5 Replies)
Discussion started by: Paul Walker
5 Replies

2. Filesystems, Disks and Memory

Different partitions of a drive behaving differently in Windows

I have a memory card of my Nokia N73 attached to laptop. There are a few partitions. Why all partitions behave differently? As clear from the attachments, for some partition, delete option is disabled. See 'Disk 1' which is my memory card. Here, patition 'G' (CHECK), i created in windows. The... (6 Replies)
Discussion started by: ravisingh
6 Replies

3. Shell Programming and Scripting

sed behaving oddly, repeats lines

Hi, all. Here's the problem: sed '/FOO/,/BAR/p' That should print anything between FOO and BAR, right? Well, let's say I have file.txt that contains just one line "how are you today?". Then I run something like the above and get: $ sed '/how/,/today/p' file.txt how are you... (9 Replies)
Discussion started by: pereyrax
9 Replies

4. Shell Programming and Scripting

Same KSH behaving differently on diff servers

HI all I have written a ksh to execute PL/sql procedure and generate the log file. The script is working fine to the extent of calling the taking input, executing PL/SQL procedure. On one server the log file is getting generated properly. i,e it shows the DBMS output . The log file size was... (9 Replies)
Discussion started by: ramakrishnakini
9 Replies

5. UNIX for Advanced & Expert Users

cut command Behaving Differnetly in different Version

Hi, We have few hundered scripts using cut command in thousands of lines. On HP-UX shell script developer used echo "ABCEFG" | cut -c -1-3 to cut first three character of the string. We recently moved to Linux and this command throws error. I think this might be due to different version of... (3 Replies)
Discussion started by: ajazurrahman
3 Replies

6. Shell Programming and Scripting

jobs command behaving differently in script

Here is my test script: #!/bin/sh result=`jobs` echo " Jobs: "$result result=`ls` echo " LS "$result Here is the output: Jobs: LS 0 1 2 3 4 5 6 7 gcd initialize.sh #inter_round_clean.sh# inter_round_clean.sh inter_round_clean.sh~ look parallel_first_run.sh... (3 Replies)
Discussion started by: nealh
3 Replies

7. Shell Programming and Scripting

Why is a variable behaving differently in ksh script.

Guys i have strange behaviour with command output being saved in a variable instead of a tmp file. 1. I suck command output into a variable Sample command output # cleanstats DRIVE INFO: ---------- Drv Type Mount Time Frequency Last Cleaned Comment *** ****... (1 Reply)
Discussion started by: lavascript
1 Replies

8. UNIX for Advanced & Expert Users

Script behaving differently on two servers

All, I have a script that runs on 2 servers and there seems to be something wrong. It's producing different results on the 2 servers. Here is the script on server1 which is behaving correctly but on 2 behaving differently. 2nd server: I couldn't make out whats the error is?... (5 Replies)
Discussion started by: mhssatya
5 Replies

9. Shell Programming and Scripting

Script behaving differently in Crontab..

Hi, I wrote a script to stop a process,truncate its log files and re-start the process... We are using Progress Software in Unix ( Sun Sparc) When ever I start this progress program , it should kick off a C pgm in the background.. The script work perfectly fine when I run it from command... (4 Replies)
Discussion started by: newtoxinu
4 Replies

10. UNIX for Advanced & Expert Users

Script behaving differently in Crontab..

I posted this in Shell scripting... maybe I'll try it in this forum.. ***************** I wrote a script to stop a process,truncate its log files and re-start the process... We are using Progress Software in Unix ( Sun Sparc) When ever I start this progress program , it should kick off a... (1 Reply)
Discussion started by: newtoxinu
1 Replies
Login or Register to Ask a Question