Need help to delete special characters exists only at the end of the each record in UNIX file?


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Need help to delete special characters exists only at the end of the each record in UNIX file?
# 15  
Old 01-30-2019
Quote:
Originally Posted by RudiC
Difficult to believe...
if we only assume that the fifteenth field is the last at the same time

Code:
awk 'BEGIN {FS="|"; OFS="|"} {gsub("[^a-zA-Z0-9 ]", "", $FN)} 1' file


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!


--- Post updated at 09:19 ---

Code:
awk 'BEGIN {FS="|"; OFS="|"} {sub("[\r]$", ""); gsub("[^a-zA-Z_0-9 ]", "", $15)} 1' file

gsub("[^[:alnum:][:space:]]"

Last edited by nezabudka; 01-30-2019 at 05:33 AM..
This User Gave Thanks to nezabudka For This Post:
# 16  
Old 01-30-2019
Of course it is being removed. You refuse to define what you believe are special characters and the code that has been supplied assumes that all non-alphanumeric characters except (in some cases) the <vertical-bar> character are special (and that includes <space>). And you refuse to tell us whether ^M represents the two characters ^ and M or represent a single <carriage-return> character.

If you keep telling us that our code isn't working without answering our questions, we'll continue to make bad guesses about what you really mean and we'll all continue to be frustrated.

What command (including the utility name and the options you gave it) did you use to display the sample input and output you showed us in post #13 in this thread?

How do you expect a line containing three field delimiters to have twenty fields? The data you showed us in post #13 can't possibly be related to the problem you're trying to solve in this thread.

Please show us some representative sample input data and then show us the output you are hoping to get from that sample input.

Please answer our questions and help us help you! If you continue to refuse to answer our questions, it is obvious that we won't be able to guess at what you're really trying to do and we are all just wasting our time trying to help you.
# 17  
Old 01-30-2019
Hi Don,

Except space,all other non numeric characters are treated as special characters. ^M represents two characters.

Thanks
Rakesh

--- Post updated at 07:10 AM ---

Actual data contains 16fields only, but I have shown here only the sample data where I cannot show the actual data here.

--- Post updated at 07:12 AM ---

I understand that I should show the actual sample data, But i cannot do that here
# 18  
Old 01-30-2019
Quote:
Originally Posted by rakeshp
Hi Don,

Except space,all other non numeric characters are treated as special characters. ^M represents two characters.

Thanks
Rakesh

--- Post updated at 07:10 AM ---

Actual data contains 16fields only, but I have shown here only the sample data where I cannot show the actual data here.

--- Post updated at 07:12 AM ---

I understand that I should show the actual sample data, But i cannot do that here
I never asked you to show us actual data; I only asked you to show us representative sample data. You could easily have taken a couple of examples of real data and replaced every occurrence of a numeric character with a "9" and replaced every occurrence of a lowercase alphabetic character with an "x" and every occurrence of an uppercase alphabetic character with an "X" (except for things like <circumflex><capital-latin-M> where you should have left the "M" as it appears in your real data.

Why you would need to tell us that there are 20 fields and that you want to remove special characters from field 15 when there are only 15 fields in your actual data makes absolutely no sense to me. Why you would explicitly want us to remove the characters "^" and "M" from the 15th field (which is also the last field in your actual data) and then remove any characters that are not numeric or <space> characters from the same field makes no sense to me. Since "^" and "M" are both non-<space> and non-<digit> characters, removing what you are calling "special" characters from the 15th field will remove "^" and "M" from the end of the record without adding any special code to just remove those two characters.

If the above accurately describes your real data and what you are trying to do to it (i.e., remove all characters that are not <space> and are not numeric from the last field of each record where each record contains 15 fields), then all you need is something like:
Code:
awk 'BEGIN {FS="|"; OFS="|"} {gsub("[^ [:digit:]]", "", $15)} 1'

If you want to remove all characters that are not <space> and are not numeric from the 15th field in each record where each record contain 15 or more fields, and you also want to remove the two character sequence ^M from the end of each input record, then you need something more like:
Code:
awk 'BEGIN {FS="|"; OFS="|"} {sub("[^]M$", ""); gsub("[^ [:digit:]]", "", $15)} 1' file

# 19  
Old 02-06-2019
Hi Don Cragun/RudiC/nezabudka/All,

As you suggested,Please see the sample data below replaced from actual data

Code:
XXXX99999999|x99999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999|9999999  999999|||X ^M
XXXX99999999|X99999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999|||X ^M
XXXX99999999|X99999999999|9999999999999|X|X|99999999|99999999|X|99999999|99999999|99999999||||X ^M
XXXX99999999|X99999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999||||X ^M
XXXX99999999|X9999999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999||||X ^M

Thanks
Rakesh Puli

Last edited by vgersh99; 02-06-2019 at 02:35 PM.. Reason: Code tags, please!
# 20  
Old 02-06-2019
Quote:
Originally Posted by rakeshp
Hi Don Cragun/RudiC/nezabudka/All,

As you suggested,Please see the sample data below replaced from actual data

Code:
XXXX99999999|x99999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999|9999999  999999|||X ^M
XXXX99999999|X99999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999|||X ^M
XXXX99999999|X99999999999|9999999999999|X|X|99999999|99999999|X|99999999|99999999|99999999||||X ^M
XXXX99999999|X99999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999||||X ^M
XXXX99999999|X9999999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999||||X ^M

Thanks
Rakesh Puli
Thank you. That is a good start.

We now know that your input looks like what you described in post #1 in this thread and that it does not look like what you described on post #10 (where there were 20 fields and two of those fields were to be changed) and it does not look like what you described in post #13 (where there were 4 fields and absolutely nothing was supposed to be changed) and it does not like what you described in post #17 update 1 (where there were 16 fields).

In addition to the number of fields changing, your description of what characters are to be considered "special" seems to vary from post to post.

If what you want to do is:
  • remove all adjacent special characters at the end of each record,
  • where a character is considered special if it is not a numeric character and it is not a <space> character,
  • the data you want to process is located in a file named file, and
  • you want the results of the above conversion to be written to standard output
then the following command should do what you want:
Code:
sed 's/[^ [:digit:]]*$//' file

This will not be sufficient if any other field is to be modified and it will not be sufficient if any other character is not to be treated as special.

Since you still have not shown us what output you hope to produce from the above sample input, we have no way of knowing which of your many different statements about what changes are to be made is the correct set of requirements. The above code produces the output:
Code:
XXXX99999999|x99999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999|9999999  999999|||X 
XXXX99999999|X99999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999|||X 
XXXX99999999|X99999999999|9999999999999|X|X|99999999|99999999|X|99999999|99999999|99999999||||X 
XXXX99999999|X99999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999||||X 
XXXX99999999|X9999999999999|999999999|X|X|99999999|99999999|X|99999999|99999999|99999999||||X

Note that the above output include a single <space> character at the end of each line.

Last edited by Don Cragun; 02-06-2019 at 05:06 PM.. Reason: Add note.
This User Gave Thanks to Don Cragun For This Post:
# 21  
Old 02-07-2019
Hi Don,

I am considering a character is special, if it not a alphanumeric and not space. In your code you considered only non numeric and not space special character. Please correct it to alphanumeric. I think i might mentioned it wrongly in the previous post. Sorry about that. I have only 15 fields as of now in each record. But I mentioned there were 20 fields in the each record in post#13. It is due to the requirement might change in future.

Thanks
Rakesh
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete special characters

My sed is not working on deleting the entire special characters and leaving what is necessary.grep connections_per a|sed -e 's/\<\!\-\-//g' INPUT: <!-- <connections_per_instance>1</connections_per_instance> --> <method>HALF</method> <!--... (10 Replies)
Discussion started by: kenshinhimura
10 Replies

2. Shell Programming and Scripting

UNIX Special Characters

Any time I do : ls *.txt > mytext.txt I get something like this in the output file: ^ Tue Jan 22 16:19:19 EST 2013 x86_64 x86_64 x86_64 GNU/Linux t1Fam_BrOv :~>alias | grep ls alias l.='ls -d .* --color=tty' alias lR='ls -R' alias la='ls -Al' alias lc='ls -ltcr' alias ldd='ls -ltr |... (5 Replies)
Discussion started by: genehunter
5 Replies

3. Shell Programming and Scripting

How to add trailer record at the end of the flat file in the unix ksh shell scripting?

Hi, How to add trailer record at the end of the flat file in the unix ksh shell scripting can you please let me know the procedure Regards Srikanth (3 Replies)
Discussion started by: srikanth_sagi
3 Replies

4. Shell Programming and Scripting

Need unix commands to delete records from one file if the same record present in another file...

Need unix commands to delete records from one file if the same record present in another file... just like join ... if the record present in both files.. delete from first file or delete the particular record and write the unmatched records to new file.. tried with grep and while... (6 Replies)
Discussion started by: msathees
6 Replies

5. Shell Programming and Scripting

Windows to UNIX FTP Special characters!

I have a file that has the name in one of the lines as MARíA MENDOZA in Windows. When this gets FTPed over to UNIX it appears as MAR�A MENDOZA. Is there anyway to overcome this? Its causing a issue because the file is Postional and fields are getting pushed by 2 digits.. Any help would be... (4 Replies)
Discussion started by: venky338
4 Replies

6. Shell Programming and Scripting

how to delete special characters from the file content

Hello Team, Any one suggest how to delte the below special character from a file which is having one column 10 rows of same below content. ---------------------------------------- Kosten|bersicht gemd_ ' =Welche Kosten kvnnen... (2 Replies)
Discussion started by: kanakaraju
2 Replies

7. Shell Programming and Scripting

sed delete pattern with special characters

Hi all, I have the following lines <b>A gtwrhwrthwr text hghthwrhtwrtw </b><font color='#06C'>; text text (text) <b>B gtwrhwrthwr text hghthwrhtwrtw </b><font color='#06C'>; text text (text) <b>J gtwrhwrthwr text hghthwrhtwrtw </b><font color='#06C'>; text text (text) and I would like to... (5 Replies)
Discussion started by: stinkefisch
5 Replies

8. UNIX for Dummies Questions & Answers

Advice on extracting special characters from a DB2 table to a file in the UNIX ENV

need some advice on the following situation. I have a DB2 table which has a varchar Column. This varchar column can have special characters like ©, ®, ™ . When I extract from this table to a sequential file for this varchar column I am only able to get © and ® . To Get the ™... (1 Reply)
Discussion started by: cosec
1 Replies

9. UNIX for Dummies Questions & Answers

How to delete a file with special characters

I don't now exactly how I did it, but I created a file named " -C " cexdi:/home1 $ls -lt total 1801336 -rw------- 1 cexdi ced-group 922275840 23 mars 10:03 -C How do I delete this file ? cexdi:/home1 $rm -C rm: invalid option -- C Syntax : rm filename ... Doesn't work...... (5 Replies)
Discussion started by: yveslagace
5 Replies

10. UNIX for Dummies Questions & Answers

Unix file does not display special characters

We have a unix file that contains special characters (ie. Ñ, °, É, ¿ , £ , ø ). When I try to read this file I get a codepage error and the characters are replaced by the # symbol. How do I keep the special characters from being read? Thanks. Ryan (3 Replies)
Discussion started by: Ryan2786
3 Replies
Login or Register to Ask a Question