sed and punctuations


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed and punctuations
# 1  
Old 04-07-2013
sed and punctuations

I wish to remove punctuation except a few of my choice, for example, I wish to keep (#) and remove rest of the punctuation marks. Can this script be modified suitably please help.

Code:
#!/bin/bash

echo "Enter the type of file: "
read f

a=(*.$f)

for (( i =0; i < ${#a[i]}; i++ ))
do
echo $i "  " ${a[i]}
done

echo "Enter the filenumber for processing:"
read j

cat ${a[$j]} | sed -e 's/\([[:punct:]]\)//g' > revised.txt

# 2  
Old 04-07-2013
Try changing:
Code:
cat ${a[$j]} | sed -e 's/\([[:punct:]]\)//g' > revised.txt

to:
Code:
sed -e 's/#/^g/g;s/\([[:punct:]]\)//g;s/^g/#/g' "${a[$j]}" > revised.txt

where the ^g in both cases is a control character of your choice that does not otherwise appear in your input file. (It first changes all #s to the control character of your choice, then deletes all punctuation characters, and then changes the control characters you chose back to #s.)
These 2 Users Gave Thanks to Don Cragun For This Post:
# 3  
Old 04-07-2013
further improvements

I have made code more flexible in terms of choices. Can it be made better. Suggestions are most welcome.

Code:
for i in $k
do
cat ${a[$j]} | sed 's/${k[$i]}/^g/g;s/\([[:punct:]]\)//g;s/^g/${k[$i]}/g' > revised.txt
done

declare -a k=('#' '@')

There is some error in it as it is removing all punctuation.

Last edited by ambijat; 04-08-2013 at 12:10 AM..
# 4  
Old 04-08-2013
I have some suggestions to help make it work.
Code:
1) Not needed to say \([[:punct:]]\) as [[:punct:]] is all you need.
This makes the code a little simpler to view and understand.

2) I think using a control character such as ^G is problematic. It just complicates editing and viewing files. Is it CTL-G or "^G"? Confusing. Better to stay all text-based. I use a unique mark instead.

3) Enclosing the sed script in 'single quotes' means the shell cannot "see" the variables such as ${k[$i]} so it cannot work correctly. You will have to use "double quotes" or some other method to make sure the shell can see the shell variables.

Here is an exercise that shows it is possible to remove all the punctuation, except one particular punctuation, similar to what the previous respondent posted, but I think simpler and less prone to error.
Code:
$ cat punct.txt
Bb2~`!@#$%^&*()_-+={[}]|:;"'<,>.?/

Code:
$ cat punct.sh
mark=ZZZZZZZZZZZZZZZZZZZZ

echo "Deleting punctuation, except for :"
sed "s/:/$mark/g; s/[[:punct:]]//g; s/$mark/:/g" < punct.txt

echo "Deleting punctuation, except for ;"
sed "s/;/$mark/g; s/[[:punct:]]//g; s/$mark/;/g" < punct.txt

Code:
$ ./punct.sh
Deleting punctuation, except for :
Bb2:
Deleting punctuation, except for ;
Bb2;

# 5  
Old 04-08-2013
what i need that there has to be a number of iterations for each punctuation that I choose, should not be deleted. So, in that case I believe looping would be necessary. So, how can we do it.
# 6  
Old 04-08-2013
Quote:
Originally Posted by ambijat
what i need that there has to be a number of iterations for each punctuation that I choose, should not be deleted. So, in that case I believe looping would be necessary. So, how can we do it.
I must not understand what you're proposing to do here. If you start with a file that contains the punctuation characters comma and period (that you want to keep) and other punctuation characters you want to delete, surely you understand that if you remove all punctuation characters except comma and then remove all punctuation characters except period, all punctuation characters will be gone. Once you have removed all punctuation characters except comma, another time through the loop can't put back the periods that have already been removed AND will also delete the commas that you saved the first time through the loop!

You have at least two choice for solving a problem like this:
  1. Get a list of all possible punctuation characters in the locale you're using. Delete the list of punctuation characters you want to keep from this list. Remove all of the remaining punctuation characters from your input file. Or,
  2. Replace each punctuation character you want to keep with one or more non-null, non-punctuation characters that do not otherwise appear in your input file and are distinct from the replacements for other punctuation characters you want to keep. Then delete all of the remaining punctuation characters. Then convert all of the punctuation replacement strings back to the original punctuation characters. (NOTE: If you use replacement strings that are longer than the punctuation character being replaced AND you have input lines that are close to LINE_MAX bytes long, the conversions may fail.)
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 04-08-2013
If I understand you correctly, try this:
Code:
$ cat punct.txt
Bb2~`!@#$%^&*()_-+={[}]|:;"'<,>.?/

Code:
$ cat punct.sh
echo "Deleting punctuation, except for @ and #"
rm -f s1.sed s2.sed control_file

echo "UNIQMARK01 @" >> control_file
echo "UNIQMARK02 #" >> control_file

while read mark sym; do
  echo "s/$sym/$mark/g" >> s1.sed
  echo "s/$mark/$sym/g" >> s2.sed
done < control_file

echo "s/[[:punct:]]//g" >> s1.sed
sed -f s1.sed -f s2.sed < punct.txt

rm -f s1.sed s2.sed control_file

Code:
$ ./punct.sh
Deleting punctuation, except for # and @
Bb2@#


Last edited by hanson44; 04-08-2013 at 04:00 AM.. Reason: typo
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

I am learning regular expression in sed,Please help me understand the use curly bracket in sed,

I am learning SED and just following the shell scripting book, i have trouble understanding the grep and sed statement, Question : 1 __________ /opt/oracle/work/antony>cat teledir.txt jai sharma 25853670 chanchal singhvi 9831545629 anil aggarwal 9830263298 shyam saksena 23217847 lalit... (7 Replies)
Discussion started by: Antony Ankrose
7 Replies

2. Homework & Coursework Questions

Removing punctuations from file input or standard input

Just started learning Unix and received my first assignment recently. We haven't learned many commands and honestly, I'm stumped. I'd like to receive assistance/guidance/hints. 1. The problem statement, all variables and given/known data: How do I write a shell script that takes in a file or... (4 Replies)
Discussion started by: fozilla
4 Replies

3. Shell Programming and Scripting

sed and awk giving error ./sample.sh: line 13: sed: command not found

Hi, I am running a script sample.sh in bash environment .In the script i am using sed and awk commands which when executed individually from terminal they are getting executed normally but when i give these sed and awk commands in the script it is giving the below errors :- ./sample.sh: line... (12 Replies)
Discussion started by: satishmallidi
12 Replies

4. Shell Programming and Scripting

sed inside sed for replacing string

My need is : Want to change docBase="/something/something/something" to docBase="/only/this/path/for/all/files" I have some (about 250 files)xml files. In FileOne it contains <Context path="/PPP" displayName="PPP" docBase="/home/me/documents" reloadable="true" crossContext="true">... (1 Reply)
Discussion started by: linuxadmin
1 Replies

5. Shell Programming and Scripting

How to use sed to replace the a string in the same file using sed?

How do i replace a string using sed into the same file without creating a intermediate file? (7 Replies)
Discussion started by: gomes1333
7 Replies

6. UNIX for Dummies Questions & Answers

SED: Can't Repeat Search Character in SED Output

I'm not sure if the problem I'm seeing is an artifact of sed or simply a beginner's mistake. Here's the problem: I want to add a zero-width space following each underscore between XML tags. For example, if I had the following xml: <MY_BIG_TAG>This_is_a_test</MY_BIG_TAG> It should look like... (8 Replies)
Discussion started by: rhetoric101
8 Replies

7. Shell Programming and Scripting

deleting text records with sed (sed paragraphs)

Hi all, First off, Thank you all for the knowledge I have gleaned from this site! Deleting Records from a text file... sed paragraphs The following code works nearly perfect, however each time it is run on the log file it adds a newline at the head of the file, run it 5 times, it'll have 5... (1 Reply)
Discussion started by: Festus Hagen
1 Replies

8. Programming

Counting characters, words, spaces, punctuations, etc.

I am very new to C programming. How could I write a C program that could count the characters, words, spaces, and punctuations in a text file? Any help will be really appreciated. I am doing this as part of my C learning exercise. Thanks, Ajay (4 Replies)
Discussion started by: ajay41aj
4 Replies

9. Shell Programming and Scripting

sed over writes my original file (using sed to remove leading spaces)

Hello and thx for reading this I'm using sed to remove only the leading spaces in a file bash-280R# cat foofile some text some text some text some text some text bash-280R# bash-280R# sed 's/^ *//' foofile > foofile.use bash-280R# cat foofile.use some text some text some text... (6 Replies)
Discussion started by: laser
6 Replies

10. Shell Programming and Scripting

Issue with a sed one liner variant - sed 's/ ; /|/g' $TMP1 > $TMP

Execution of the following segment is giving the error - Script extract:- OUT=$DATADIR/sol_rsult_orphn.bcp TMP1=${OUT}_tmp1 TMP=${OUT}_tmp ( isql -w 400 $dbConnect_OPR <<EOF select convert(char(10), s.lead_id) +'|' + s.pho_loc_type, ";", s.sol_rsult_cmnt, ";", +'|'+ s.del_ind... (3 Replies)
Discussion started by: kzmatam
3 Replies
Login or Register to Ask a Question