Removal Extended ASCII using awk

01-01-2015

Registered User

73, 0

Join Date: Aug 2007

Last Activity: 1 September 2016, 12:26 PM EDT

Posts: 73

Thanks Given: 7

Thanked 0 Times in 0 Posts

Removal Extended ASCII using awk

Hi All,

I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only.

Thanks & Regads

tostay2003

View Public Profile for tostay2003

Find all posts by tostay2003

01-01-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Is this another homework assignment?

What have you tried so far?

What do you mean by Extended ASCII? Are you trying to remove a single character? Are you trying to remove individually specified characters with each character specified as a separate argument? Are you trying to remove a string of characters? Are you trying to remove individual characters included in a single argument string?

What do you mean by Awk(sic) based on ad hoc basis?

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-01-2015

Registered User

73, 0

Join Date: Aug 2007

Last Activity: 1 September 2016, 12:26 PM EDT

Posts: 73

Thanks Given: 7

Thanked 0 Times in 0 Posts

Hi Don

This is a part of script enhancement. The script would take ascii values as input arguments, generally Extended ASCII (i.e. ASCII values >=128 ) and remove them from input file.

Since the place within script that I need to modify is in awk script, I have to implement this within awk itself instead of any other commands such as tr or sed.

tostay2003

View Public Profile for tostay2003

Find all posts by tostay2003

01-01-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

I asked 8 questions. You partially answered one of them (generically, but not specifically for this assignment).

Unless you convince us that this is not a homework assignment, show us that you have made an attempt at solving this, show us the part of your existing awk script that you're trying to modify, show us that you have some idea of what your input arguments need to look like, and provide us with some sample input and output for your script; this thread will be closed.

We are here to help you learn how to write code using the tools available on UNIX and Linux systems to perform various tasks. We are not here to act as your unpaid programming staff trying to guess at why you're trying to do, coaxing descriptions of the tasks that need to be performed out of you, and then designing and writing your code for you. And we most certainly are not here to do your homework assignments for you!

Last edited by Don Cragun; 01-01-2015 at 08:43 PM.. Reason: Fix typo.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-02-2015

Registered User

73, 0

Join Date: Aug 2007

Last Activity: 1 September 2016, 12:26 PM EDT

Posts: 73

Thanks Given: 7

Thanked 0 Times in 0 Posts

Quote:

Unless you convince us that this is not a homework assignment, show us that you have made an attempt at solving this

This is not a homework assignment. It is part of script which I am currently modifying. I am not well aware of awk. I can do the same using tr or sed. I want to know if there is any function in awk that can perform similar function. I was using sub/gsub function, but the manual contains how to replace a pattern. Here I am not looking for a specific pattern, but a match of ANY of the characters.

Quote:

show us the part of your existing awk script that you're trying to modify,

The script is on client secured network, which cannot be copied.

Quote:

show us that you have some idea of what your input arguments need to look like, and provide us with some sample input and output for your script;

The input arguments would be range of ascii values and/or comma separated ascii values.

Code:

eg: 128-140, 145, 147

If any of the input ascii values appear in any of the lines of input file, then it has to be replaced with empty string.

suppose I have input as

Code:

testing_�_testing

I need the output as

Code:

testing__testing

tostay2003

View Public Profile for tostay2003

Find all posts by tostay2003

01-02-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

It appears that your strings are UTF-8; not extend ASCII. Furthermore, printing your strings through od shows that the byte values that you said you wanted to remove are not present in your input string or output string samples:

Code:

printf '%s' 'testing_�_testing' | od -t cu1
printf '%s' 'testing__testing' | od -t cu1

shows us that the unsigned decimal byte values of the two bytes you want to remove are 197 and 160:

Code:

0000000    t   e   s   t   i   n   g   _   �  **   _   t   e   s   t   i
          116 101 115 116 105 110 103  95 197 160  95 116 101 115 116 105
0000020    n   g                                                        
          110 103                                                        
0000022
printf '%s' 'testing__testing' | od -t cu1
0000000    t   e   s   t   i   n   g   _   _   t   e   s   t   i   n   g
          116 101 115 116 105 110 103  95  95 116 101 115 116 105 110 103
0000020

If you are working with UTF-8 input and want "extended ASCII" output (where you may be removing 1 or more bytes out of a multi-byte UTF-8 character, but might not be removing complete characters), you may end up with an unintelligible mess. If you want to remove a specific set of UTF-8 characters, that is easy to do. If you want to remove all non-(7-bit)ASCII characters, that is easy to do on some systems (depending on how well your version of awk handles locales and multi-byte characters).

What OS (including version) and shell are you using?

What Locale are you using when your run this script?

Is it OK to just remove all bytes from your input stream that have the high order bit set? If not, is there a specific list of UTF-8 characters you want to remove? If not, and you really want to remove individual bytes from strings containing multi-byte characters, this may be hard to do in some versions of awk.

You said you know how to do what you want using sed. Show us the sed substitute command that does what you want and we can show you how to easily change that into an awk sub() or gsub() function call.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-02-2015

Registered User

73, 0

Join Date: Aug 2007

Last Activity: 1 September 2016, 12:26 PM EDT

Posts: 73

Thanks Given: 7

Thanked 0 Times in 0 Posts

Hi Don,

I want to remove any character specified as argument (decimal ascii value).

eg. For values 128-140, 145, 147

I am trying to implement below code

Code:

tr -d '\145\147\128-\140' < InputFileName > OutputFileName

OR

Code:

cat InputFileName  | sed -e 's/\d145//g' -e 's/\d147//g'  -e s'/\d128-\d140//g' > OutputFileName

I am making these changes using korn shell (Version AJM 93t+ 2010-06) on Linux OS (2.6.18)

Last edited by tostay2003; 01-02-2015 at 05:45 AM..

tostay2003

View Public Profile for tostay2003

Find all posts by tostay2003

Shell Programming and Scripting

Removal Extended ASCII using awk

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print byte position of extended ascii character

Discussion started by: rosebud123

2. Shell Programming and Scripting

Extended ASCII Characters keep on getting reintroduced to text files

Discussion started by: lewk

3. Programming

How to read extended ASCII characters from stdin?

Discussion started by: sanzee007

4. Shell Programming and Scripting

Search and Replace Extended Ascii Characters

Discussion started by: ysvsr1

5. Shell Programming and Scripting

Identify extended ascii characters in a file

Discussion started by: decci_7

6. Shell Programming and Scripting

Removal of HTML ASCII Codes from file

Discussion started by: btt3165

7. AIX

Printing extended ASCII

Discussion started by: petervg

8. UNIX for Advanced & Expert Users

Processing extended ascii character file names in UNIX (BASH scipts)

Discussion started by: peli

9. Shell Programming and Scripting

extended ascii problem

Discussion started by: smooth

10. Programming

Extended ascii

Discussion started by: avis