Removal Extended ASCII using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removal Extended ASCII using awk
# 1  
Old 01-01-2015
Removal Extended ASCII using awk

Hi All,

I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only.

Thanks & Regads
# 2  
Old 01-01-2015
Is this another homework assignment?

What have you tried so far?

What do you mean by Extended ASCII? Are you trying to remove a single character? Are you trying to remove individually specified characters with each character specified as a separate argument? Are you trying to remove a string of characters? Are you trying to remove individual characters included in a single argument string?

What do you mean by Awk(sic) based on ad hoc basis?
# 3  
Old 01-01-2015
Hi Don

This is a part of script enhancement. The script would take ascii values as input arguments, generally Extended ASCII (i.e. ASCII values >=128 ) and remove them from input file.

Since the place within script that I need to modify is in awk script, I have to implement this within awk itself instead of any other commands such as tr or sed.
# 4  
Old 01-01-2015
I asked 8 questions. You partially answered one of them (generically, but not specifically for this assignment).

Unless you convince us that this is not a homework assignment, show us that you have made an attempt at solving this, show us the part of your existing awk script that you're trying to modify, show us that you have some idea of what your input arguments need to look like, and provide us with some sample input and output for your script; this thread will be closed.

We are here to help you learn how to write code using the tools available on UNIX and Linux systems to perform various tasks. We are not here to act as your unpaid programming staff trying to guess at why you're trying to do, coaxing descriptions of the tasks that need to be performed out of you, and then designing and writing your code for you. And we most certainly are not here to do your homework assignments for you!

Last edited by Don Cragun; 01-01-2015 at 08:43 PM.. Reason: Fix typo.
# 5  
Old 01-02-2015
Quote:
Unless you convince us that this is not a homework assignment, show us that you have made an attempt at solving this
This is not a homework assignment. It is part of script which I am currently modifying. I am not well aware of awk. I can do the same using tr or sed. I want to know if there is any function in awk that can perform similar function. I was using sub/gsub function, but the manual contains how to replace a pattern. Here I am not looking for a specific pattern, but a match of ANY of the characters.

Quote:
show us the part of your existing awk script that you're trying to modify,
The script is on client secured network, which cannot be copied.

Quote:
show us that you have some idea of what your input arguments need to look like, and provide us with some sample input and output for your script;
The input arguments would be range of ascii values and/or comma separated ascii values.

Code:
eg: 128-140, 145, 147

If any of the input ascii values appear in any of the lines of input file, then it has to be replaced with empty string.

suppose I have input as

Code:
testing_Š_testing

I need the output as

Code:
testing__testing

# 6  
Old 01-02-2015
It appears that your strings are UTF-8; not extend ASCII. Furthermore, printing your strings through od shows that the byte values that you said you wanted to remove are not present in your input string or output string samples:
Code:
printf '%s' 'testing_Š_testing' | od -t cu1
printf '%s' 'testing__testing' | od -t cu1

shows us that the unsigned decimal byte values of the two bytes you want to remove are 197 and 160:
Code:
0000000    t   e   s   t   i   n   g   _   Š  **   _   t   e   s   t   i
          116 101 115 116 105 110 103  95 197 160  95 116 101 115 116 105
0000020    n   g                                                        
          110 103                                                        
0000022
printf '%s' 'testing__testing' | od -t cu1
0000000    t   e   s   t   i   n   g   _   _   t   e   s   t   i   n   g
          116 101 115 116 105 110 103  95  95 116 101 115 116 105 110 103
0000020

If you are working with UTF-8 input and want "extended ASCII" output (where you may be removing 1 or more bytes out of a multi-byte UTF-8 character, but might not be removing complete characters), you may end up with an unintelligible mess. If you want to remove a specific set of UTF-8 characters, that is easy to do. If you want to remove all non-(7-bit)ASCII characters, that is easy to do on some systems (depending on how well your version of awk handles locales and multi-byte characters).

What OS (including version) and shell are you using?

What Locale are you using when your run this script?

Is it OK to just remove all bytes from your input stream that have the high order bit set? If not, is there a specific list of UTF-8 characters you want to remove? If not, and you really want to remove individual bytes from strings containing multi-byte characters, this may be hard to do in some versions of awk.

You said you know how to do what you want using sed. Show us the sed substitute command that does what you want and we can show you how to easily change that into an awk sub() or gsub() function call.
# 7  
Old 01-02-2015
Hi Don,

I want to remove any character specified as argument (decimal ascii value).

eg. For values 128-140, 145, 147


I am trying to implement below code
Code:
tr -d '\145\147\128-\140' < InputFileName > OutputFileName

OR

Code:
cat InputFileName  | sed -e 's/\d145//g' -e 's/\d147//g'  -e s'/\d128-\d140//g' > OutputFileName

I am making these changes using korn shell (Version AJM 93t+ 2010-06) on Linux OS (2.6.18)

Last edited by tostay2003; 01-02-2015 at 05:45 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print byte position of extended ascii character

Hello, I am on AIX. When I encounter extended ascii characters and special characters on a file I need to print.. Byte position, actual character and line number. Is there a simple command that can give me the above result ? Thanks in advance (38 Replies)
Discussion started by: rosebud123
38 Replies

2. Shell Programming and Scripting

Extended ASCII Characters keep on getting reintroduced to text files

I am working with a log file that I am trying to clean up by removing non-English ASCII characters. I am using Bash via Cygwin on Windows. Before I start I set: export LC_ALL=C I clean it up by removing all non-English ASCII characters with the following command; grep -v $''... (4 Replies)
Discussion started by: lewk
4 Replies

3. Programming

How to read extended ASCII characters from stdin?

Hi, I want to read extended ASCII characters from keyboard using c language on unix/linux. How to read extended characters from keyboard or by copy-paste in terminal irrespective of locale set in the system. I want to read the input characters from keyboard, store it in an array or some local... (3 Replies)
Discussion started by: sanzee007
3 Replies

4. Shell Programming and Scripting

Search and Replace Extended Ascii Characters

We are getting extended Ascii characters in the input file and my requirement is to search and replace them with a space. I am using the following command LANG=C sed -e 's// /g' It is doing a good job, but in some cases it is replacing the extended characters with two spaces. So my input... (12 Replies)
Discussion started by: ysvsr1
12 Replies

5. Shell Programming and Scripting

Identify extended ascii characters in a file

Hi, Is there a way to identify the lines in a file having extended ascii characters and display the same? For instance I have a file abc.txt having below data aaa|bbb|111|This is first line aaa|bbb|222|This is sec๕nd line aaa|bbb|333|This is third line aaa|bbb|444|This is fo๙rth line... (3 Replies)
Discussion started by: decci_7
3 Replies

6. Shell Programming and Scripting

Removal of HTML ASCII Codes from file

Hi all, I have a file with extended ASCII codes in the description which needs to be removed. List of extended ascii codes "Œ", "œ", "Š", "š", "Ÿ", "ƒ", "-", "-", "‘", "'", "‚", "“", "”", "„","†", "‡", "•", "...", "‰", "€", "™" Sample data: Test Details-HAVE BEEN PUBLISHED... (1 Reply)
Discussion started by: btt3165
1 Replies

7. AIX

Printing extended ASCII

Hi All, I'm trying to send extended ascii characters to my HP2055 as part of PCL printer control codes. What I want to do is select a bar code font, print the bar code and reset the printer to the default font. Selecting the bar code font works good. Printing the bar code goes almost ok too. ... (5 Replies)
Discussion started by: petervg
5 Replies

8. UNIX for Advanced & Expert Users

Processing extended ascii character file names in UNIX (BASH scipts)

Hi, I have a accentuated letter (๖) in a script for an Installer. It's a file name. This is not working and I'm told to try using the octal value for the extended ascii character. Does anyone no how to do this? If I had the word "filf๖rval", can I just put in the value between the letters, like... (9 Replies)
Discussion started by: peli
9 Replies

9. Shell Programming and Scripting

extended ascii problem

hi i would like to check text files if they contain extended ascii characters within or not. i really dont have any idea how to start your kind help would be very much appreciated thanks. (7 Replies)
Discussion started by: smooth
7 Replies

10. Programming

Extended ascii

Hi all, I would like to change the extended ascii code ( 128 - 255). I tried to change LC_ALL and LANG in current session ( values from locale -a) and for no good. Thanks. (0 Replies)
Discussion started by: avis
0 Replies
Login or Register to Ask a Question