Need help in SED


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need help in SED
# 1  
Old 12-02-2007
Need help in SED

Hi,

I am quite new to Bash shell and Unix recently.

I would like to use SED to replace characters in my files.
The input I have is the result of SED which removes all whitespaces which you can see "charc" is from "char c".Assume I have the below input character in my file:-

addstr(c,outset,j,maxset){
charc;
char*outset;
int*j;
}

I wanted to tokenise "*", "," and ";" as one separate token respectively let say a character "s" each time it is seen. And for other characters I would like to tokenize to character "t". FOr the () and { } I would also like to tokenize as one character "s".While for char, int or str type, I would like to replace with "d" as one group.

The output I am looking after is:-

ttttttststtttttststtttttss
dts
dstttttts
dsts

And then I plan to count the number of t,s,and d for each line.

I tried to find information on SED tutorials on google but it doesnt detailed to this specific case. Could anyone help.


Appreciate alot.

Thanks.

Rgrds,
Jason
# 2  
Old 12-02-2007
Start by taking the full words to tokenise and exchange them for something 'safe' that subsequent replacements won't hit.
Then replace the remaining 'word' characters with your t token.
The symbols come next, then lastly, replace the 'safe' symbol for the t token.
Basic syntax you want:

sed 's/thing/token/g'

Untested, but try the following:
Code:
sed 's/(char|int|str)/\@/g' < file_to_convert | sed 's/[\w]/t/g' | sed 's/[\*\,\;\(\)\{\}]/s/g' | sed 's/\@/d/g'


Last edited by Smiling Dragon; 12-02-2007 at 11:20 PM.. Reason: Too many seds
# 3  
Old 12-02-2007
An alternative might be to create a little script to just count the things you want without bothering to monkey with it first - pretty much just a big case with a few smarts to understand how to break the bits up.
# 4  
Old 12-02-2007
Hi again,

One more thing,

In the command you suggested there is this :- file_to_convert . May I know what is this for?


Thanks.

-Jason

Hi Smiling Dragon.

Thanks for the reply. I am actually using

cat bla.txt| sed 's/(char|int|str)/\@/g' | sed 's/[\w]/t/g' | sed 's/[\*\,\;\(\)\{\}]/s/g' | sed 's/\@/d/g'>blabla.txt

And i get this:-

addstrscsoutsetsjsmaxsets
charcs
charsoutsets
intsjs

However, the function name "addstr" and other characters could not be converted to other characters.

One thing I do not get you is :-
sed 's/(char|int|str)/\@/g' | sed 's/[\w]/t/g'
The above is to replace char, int and str with "@" right? And what is "w" means in this case?

In your previous reply, you mentioned that by taking the full words to tokenise and exchange them for something 'safe' that subsequent replacements won't hit.What possible way can these int,char,str be recognized? For example; charc could be recognized as "char" and "c"
separately.

One more thing, assume that the previous example; could I just delete the function name off just right before "(" so that the tokenization works much simpler way? Can it be done using SED? If so, how is that possible.

Hope to hear from you soon!Thanks.


-Jason

Last edited by ahjiefreak; 12-03-2007 at 12:07 AM.. Reason: Missing info
# 5  
Old 12-03-2007
The file_to_convert refernce is just away to get your input stream in there - I didn't know if you were bringing it in on STDIN or in a file passed to your script. Substirute file_to_convert for your filename or omit it entirely if you intend to pass the file in on STDIN.

Add 'addstr' and any other things to match to the regex looking for char etc. Put longer names first so that 'addstr' gets matched before 'str' for example.

It's looking like sed doesn't like my (thing|otherthing) regex syntax. This should be something you'll be able to debug though. If not, just use multiple sed calls (ie sed 's/char/\@/g' | sed 's/int/\@/g' etc).

/w is meant to match all alphanumeric characters. If your version of sed doesn't support it, you can use the tr command to look for [a-Z] instead.

The idea behind exchanging those words like char etc for another symbol is to prevent exactly what you are describing. We don't want the substitution of all characters to the 't' token to convert the letters making up char. And we can't change it to 't' yet as that's a letter too. It would be simpler to use tokens that are not normally present in the file, that way you could switch each one in turn without needing 'placeholder' tokens to protect the integrity of the information.

I would think it's fine to remove the function name, but it depends on the main script calling this section. I seldom use ksh functions in small script like this.
# 6  
Old 12-03-2007
Hi,

I tried to used

cat b.txt |tr '[a-zA-Z*]' 't'|sed 's/char/\@/g' |sed 's/int/\@/g' |sed 's/str/\@/g'|tr '[a-zA-Z]' 't'|sed 's/==/`/g'|sed 's/>=/`/g'|sed 's/<=/`/g'| sed 's/[\*\,\;\(\)\{\}\+\-\&\=\/\<\>\!\&\||\#]/`/g'>c.txt

The input file (b.txt):-

addstr(c,outset,j,maxset){
charc;
char*outset;
int*j;
int maxset;
}

The otuput file (c.txt) :-

tttttt`t` tttttt` t` tttttt`
tttt t`
tttt ttttttt`
ttt tt`
ttt tttttt`
`

I still could not figure out how to set regex to differentiate between charstr and char. I tried to set |tr '[a-zA-Z*]' 't'| to hope that charstr can be recognize as part of token "t" rather than @ as char.

"charc" also caused confusion as to whether it is a character (@) or other words (t). Previously I do a sed on b.txt to remove all the whitespaces in between. Instead of "char c" , it becomes "charc". Should I not remove them at the first place?

Could anyone have any idea to turn around this problem?

Appreciate alot!Thanks.
-Jason
# 7  
Old 12-03-2007
As I said before, you need to add the additional substitutions (such as charc) before the smaller ones (such as char) to prevent it matching the wrong thing. The tr needs to be after the step to replace the keywords with @ symbols.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

I am learning regular expression in sed,Please help me understand the use curly bracket in sed,

I am learning SED and just following the shell scripting book, i have trouble understanding the grep and sed statement, Question : 1 __________ /opt/oracle/work/antony>cat teledir.txt jai sharma 25853670 chanchal singhvi 9831545629 anil aggarwal 9830263298 shyam saksena 23217847 lalit... (7 Replies)
Discussion started by: Antony Ankrose
7 Replies

2. Shell Programming and Scripting

sed and awk giving error ./sample.sh: line 13: sed: command not found

Hi, I am running a script sample.sh in bash environment .In the script i am using sed and awk commands which when executed individually from terminal they are getting executed normally but when i give these sed and awk commands in the script it is giving the below errors :- ./sample.sh: line... (12 Replies)
Discussion started by: satishmallidi
12 Replies

3. Shell Programming and Scripting

sed inside sed for replacing string

My need is : Want to change docBase="/something/something/something" to docBase="/only/this/path/for/all/files" I have some (about 250 files)xml files. In FileOne it contains <Context path="/PPP" displayName="PPP" docBase="/home/me/documents" reloadable="true" crossContext="true">... (1 Reply)
Discussion started by: linuxadmin
1 Replies

4. Shell Programming and Scripting

How to use sed to replace the a string in the same file using sed?

How do i replace a string using sed into the same file without creating a intermediate file? (7 Replies)
Discussion started by: gomes1333
7 Replies

5. UNIX for Dummies Questions & Answers

SED: Can't Repeat Search Character in SED Output

I'm not sure if the problem I'm seeing is an artifact of sed or simply a beginner's mistake. Here's the problem: I want to add a zero-width space following each underscore between XML tags. For example, if I had the following xml: <MY_BIG_TAG>This_is_a_test</MY_BIG_TAG> It should look like... (8 Replies)
Discussion started by: rhetoric101
8 Replies

6. Shell Programming and Scripting

deleting text records with sed (sed paragraphs)

Hi all, First off, Thank you all for the knowledge I have gleaned from this site! Deleting Records from a text file... sed paragraphs The following code works nearly perfect, however each time it is run on the log file it adds a newline at the head of the file, run it 5 times, it'll have 5... (1 Reply)
Discussion started by: Festus Hagen
1 Replies

7. Shell Programming and Scripting

sed has zeored my files. Help me with sed please

i made a script to update a lot of xml files. to save me some time. Ran it and it replaced all the the files with a 0kb file. The problem i was having is that I am using sed to change xml node <doc_root>. The problem with this is it has a / in the closing xml tag and the stuff inside will also have... (4 Replies)
Discussion started by: timgolding
4 Replies

8. Shell Programming and Scripting

sed over writes my original file (using sed to remove leading spaces)

Hello and thx for reading this I'm using sed to remove only the leading spaces in a file bash-280R# cat foofile some text some text some text some text some text bash-280R# bash-280R# sed 's/^ *//' foofile > foofile.use bash-280R# cat foofile.use some text some text some text... (6 Replies)
Discussion started by: laser
6 Replies

9. Shell Programming and Scripting

Issue with a sed one liner variant - sed 's/ ; /|/g' $TMP1 > $TMP

Execution of the following segment is giving the error - Script extract:- OUT=$DATADIR/sol_rsult_orphn.bcp TMP1=${OUT}_tmp1 TMP=${OUT}_tmp ( isql -w 400 $dbConnect_OPR <<EOF select convert(char(10), s.lead_id) +'|' + s.pho_loc_type, ";", s.sol_rsult_cmnt, ";", +'|'+ s.del_ind... (3 Replies)
Discussion started by: kzmatam
3 Replies

10. Shell Programming and Scripting

Sed Question 1. (Don't quite know how to use sed! Thanks)

Write a sed script to extract the year, rank, and stock for the most recent 10 years available in the file top10_mktval.csv, and output in the following format: ------------------------------ YEAR |RANK| STOCK ------------------------------ 2007 | 1 | Exxon... (1 Reply)
Discussion started by: beibeiatNY
1 Replies
Login or Register to Ask a Question