Identify extended ascii characters in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Identify extended ascii characters in a file
# 1  
Old 07-16-2014
Identify extended ascii characters in a file

Hi,

Is there a way to identify the lines in a file having extended ascii characters and display the same?

For instance I have a file abc.txt having below data

Code:
aaa|bbb|111|This is first line
aaa|bbb|222|This is secõnd line
aaa|bbb|333|This is third line
aaa|bbb|444|This is foùrth line

Since the 2nd and 4th line contains the extended ascii characters (õ and ù), I would like to print those lines. Any ideas?
# 2  
Old 07-16-2014
Code:
perl -ne 'print "$_" if /[\x80-\xFF]/' abc.txt

# 3  
Old 07-17-2014
Thanks for the solution. It works perfectly fine. There is one more issue though. I'm trying to identify the number of bytes, lines and max line length using below command.
Code:
wc -l -c -L

But it ends up in error whenever it comes across any line having extended ascii characters in it and ends up with below error.
Code:
Invalid or imcomplete multibyte or wide character

Any solution to ensure it doesn't fail? I dont want to remove the extd. ascii characters from the line. One way I could think is to create a temp file, replace the extd. ascii characters with a normal one (e.g. "a") and then run the command but then again not familar with how to replace the same Smilie

---------- Post updated 07-17-14 at 12:32 AM ---------- Previous update was 07-16-14 at 11:27 PM ----------

Looking around the net I found couple of approaches which I can use on the same lines perl solution was suggested.

To find out the lines having extended ascii characters
Code:
grep -P "[\x80-\xFF]" abc.txt > abc1.txt

To replace the extended ascii character with a single byte character(a).
Code:
LANG=C sed 's/[\x80-\xFF]/a/g' abc.txt > abc2.txt

and once its done the wc can be used on the resultant file.

This approach seems to be working fine for me. Please suggest if there is a better approach that can be used.
# 4  
Old 07-17-2014
Just a longhand very quick example; OSX 10.7.5, default bash terminal:-
Code:
#!/bin/bash
# len_line.sh
text='secõnd'
printf "$text" > /tmp/data
# Character length.
printf "Character length = ${#text}\n\n"
# Real length.
printf "Real length ="
wc -c < /tmp/data
echo ""
# Hexdump as proof.
hexdump -C < /tmp/data

Results:-
Code:
Last login: Thu Jul 17 07:59:20 on ttys000
AMIGA:barrywalker~> ./len_line.sh
Character length = 6

Real length =       7

00000000  73 65 63 c3 b5 6e 64                              |sec..nd|
00000007
AMIGA:barrywalker~> _

---------- Post updated at 12:24 PM ---------- Previous update was at 08:11 AM ----------

Using CygWin, default bash terminal manually:-
Code:
AMIGA:~> text='secõnd'
AMIGA:~> printf "$text" | wc -c
7
AMIGA:~> chars=$(printf "$text" | wc -c)
AMIGA:~> echo "$chars"
7
AMIGA:~> _


Last edited by wisecracker; 07-17-2014 at 08:25 AM.. Reason: See above...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print byte position of extended ascii character

Hello, I am on AIX. When I encounter extended ascii characters and special characters on a file I need to print.. Byte position, actual character and line number. Is there a simple command that can give me the above result ? Thanks in advance (38 Replies)
Discussion started by: rosebud123
38 Replies

2. Shell Programming and Scripting

Extended ASCII Characters keep on getting reintroduced to text files

I am working with a log file that I am trying to clean up by removing non-English ASCII characters. I am using Bash via Cygwin on Windows. Before I start I set: export LC_ALL=C I clean it up by removing all non-English ASCII characters with the following command; grep -v $''... (4 Replies)
Discussion started by: lewk
4 Replies

3. Shell Programming and Scripting

Removal Extended ASCII using awk

Hi All, I am trying to remove (SELECTIVE - passed as argument) Extended ASCII using Awk based on adhoc basis. Can you please let me know how to do it. I have to implement this using awk only. Thanks & Regads (14 Replies)
Discussion started by: tostay2003
14 Replies

4. Programming

How to read extended ASCII characters from stdin?

Hi, I want to read extended ASCII characters from keyboard using c language on unix/linux. How to read extended characters from keyboard or by copy-paste in terminal irrespective of locale set in the system. I want to read the input characters from keyboard, store it in an array or some local... (3 Replies)
Discussion started by: sanzee007
3 Replies

5. Shell Programming and Scripting

Search and Replace Extended Ascii Characters

We are getting extended Ascii characters in the input file and my requirement is to search and replace them with a space. I am using the following command LANG=C sed -e 's// /g' It is doing a good job, but in some cases it is replacing the extended characters with two spaces. So my input... (12 Replies)
Discussion started by: ysvsr1
12 Replies

6. AIX

Printing extended ASCII

Hi All, I'm trying to send extended ascii characters to my HP2055 as part of PCL printer control codes. What I want to do is select a bar code font, print the bar code and reset the printer to the default font. Selecting the bar code font works good. Printing the bar code goes almost ok too. ... (5 Replies)
Discussion started by: petervg
5 Replies

7. UNIX for Advanced & Expert Users

Processing extended ascii character file names in UNIX (BASH scipts)

Hi, I have a accentuated letter (ö) in a script for an Installer. It's a file name. This is not working and I'm told to try using the octal value for the extended ascii character. Does anyone no how to do this? If I had the word "filförval", can I just put in the value between the letters, like... (9 Replies)
Discussion started by: peli
9 Replies

8. Shell Programming and Scripting

extended ascii problem

hi i would like to check text files if they contain extended ascii characters within or not. i really dont have any idea how to start your kind help would be very much appreciated thanks. (7 Replies)
Discussion started by: smooth
7 Replies

9. HP-UX

Hex characters of ascii file

Hi, Whats the command or how do you display the hexadecimal characters of an ascii file. thanks Bud (2 Replies)
Discussion started by: budrito
2 Replies

10. Programming

Extended ascii

Hi all, I would like to change the extended ascii code ( 128 - 255). I tried to change LC_ALL and LANG in current session ( values from locale -a) and for no good. Thanks. (0 Replies)
Discussion started by: avis
0 Replies
Login or Register to Ask a Question