Remove Special Characters Within Text


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove Special Characters Within Text
# 1  
Old 04-14-2014
Remove Special Characters Within Text

Hi,

I have a "|" delimited file that is exported from a database.
There is one column in the file which has description/comments entered by some application user. It has "Control-M" character and "New Line" character in between the text.
Hence, when i export the data, this record with the new line character splits the record into 2.
Hence, the load fails. Following is an example :

1|user|i am happy.
2|user|i live
in india.
3|user|i am male.

I tried to replace the characters but was unable to do it between the 3rd column. And replace caused legal ones to be replaced as well.
Basically, i want to search a particular special character within the string of nth column.

Appreciate any inputs on this.
# 2  
Old 04-14-2014
In your example, it appears that you have an extra <CR> and <LF> in the 3rd field that are before the end of field/line. Is that what you are trying to eliminate?

Perhaps this will get you thinking:
Code:
$ echo "jim|jones|hairy" | sed "s/$/|~/g" | awk '{FS=OFS="|"; $0=$0; gsub("\151","e",$3); print }'
jim|jones|haery|~

I added a delimiter | and ~ characters to end of each line; this may be needed so as only to change CR and LF within the 3rd field.
the \151 is the octal representation for i in hairy. Thus, find the codes for CR and LF and try the above for your example.

Last edited by joeyg; 04-14-2014 at 11:16 AM..
# 3  
Old 04-14-2014
Yes, they are between the string.
# 4  
Old 04-14-2014
Assuming you want to get rid of embedded carriage returns and change embedded newlines to spaces, you could try something like:
Code:
awk '
BEGIN {	FS = OFS = "|"}
{	gsub(/\r/, "")}
NF==3 {	if(out != "") print out
	out = $0
}
NF==1 {	out = out " " $0}
END {	if(out != "") print out}' input_file

If you're using a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of /usr/bin/awk.
# 5  
Old 04-14-2014
Relying on what you describe, i.e. that the <CR> char is the indicator of the inappropriate line split (implying there will be no <CR> in the last line), this might do:
Code:
awk '/\r$/{sub(/\r/," ");getline X;$0=$0 X}1' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove special characters?

Hi Gurus, I have file which contains some unicode charachator like "ü". I want to replace it with some charactors. I searched in internet and got command sed "s/ü/-/g", but I don't know how to type ü in unix command line. Please help me for this one. Thanks in advance (7 Replies)
Discussion started by: ken6503
7 Replies

2. Shell Programming and Scripting

How to remove some special characters in a string?

Hi, I have string like this ="Lookup Procedure" But i want the output like this Lookup Procedure =," should be removed. Please suggest me the solution. Regards, Madhuri (2 Replies)
Discussion started by: srimadhuri
2 Replies

3. Shell Programming and Scripting

remove special characters

hello all I am writing a perl code and i wish to remove the special characters for text. I wish to remove all extended ascii characters. If the list of special characters is huge, how can i do this using substitute command s/specialcharacters/null/g I really want to code like... (3 Replies)
Discussion started by: vasuarjula
3 Replies

4. UNIX for Dummies Questions & Answers

Files with special characters - how to remove

Hi, I have a directory that has a file which contained special characters in the filename. Can someone please advise how to remove the file, preferably with a rm -i ? Thanks in advance. Listing is as below: {oracle}> ls -1b bplog.bkup.001 bplog.bkup.002 bplog.bkup.003 bplog.bkup.004... (1 Reply)
Discussion started by: newbie_01
1 Replies

5. Shell Programming and Scripting

Remove special characters from text file

Hi All, i am trying to remove all special charecters().,/\~!@#%^$*&^_- and others from a tab delimited file. I am using the following code. while read LINE do echo $LINE | tr -d '=;:`"<>,./?!@#$%^&(){}'|tr -d "-"|tr -d "'" | tr -d "_" done < trial.txt > output.txt Problem ... (10 Replies)
Discussion started by: kkb
10 Replies

6. UNIX for Dummies Questions & Answers

How to Remove Special Characters

Dear Members, We have a file which contains some special characters. I need to replace these special character by a new line character(\n). The Special character is \x85. I am not sure what this character means and how we can remove it. Any inputs are greatly appreciated. Thanks... (5 Replies)
Discussion started by: sandeep_1105
5 Replies

7. Shell Programming and Scripting

How to remove special characters from each line?

Hello, Is there a simpler way to remove special characters (color codes) from each lines in a log file? I use sed like in the example below but I think there should be a more simple way to achieve the same result: $ cat -vet file1 ^, , , , Maybe to convert the file somehow? ... (5 Replies)
Discussion started by: majormark
5 Replies

8. UNIX for Dummies Questions & Answers

Remove directory that has special Characters

Hi All, I have a script written that creates a new directory within the shell program and if a parameter isn't passed in, it creates a strange directory name by mistake. So I have a directory like "-_12" and I am unable to remove it. I tried removing it using double quote and many others. I have... (12 Replies)
Discussion started by: datherriault
12 Replies

9. Shell Programming and Scripting

remove special characters from text using PERL

Hi, I am stuck with a problem here. Suppose i have a variable which is assigned some string containing special charatcers. for eg: $a="abcdef^bbwk#kdbcd@"; I have to remove the special characters using Perl. The text is assigned to the variable implicitly. How to do it? (1 Reply)
Discussion started by: agarwal
1 Replies

10. UNIX for Dummies Questions & Answers

remove special and unicode characters

Hi, How do I remove the lines where special characters or Unicode characters appear? The following query does work but I wonder if there is a better way. cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.' The following lines show that my query is incomplete. Warning: The word "*Khan" is... (1 Reply)
Discussion started by: shantanuo
1 Replies
Login or Register to Ask a Question