Sponsored Content
Top Forums Shell Programming and Scripting Grep to remove non-ASCII characters Post 302872709 by owwow14 on Saturday 9th of November 2013 01:18:17 PM
Old 11-09-2013
Grep to remove non-ASCII characters

I have been having an encoding problem that I need to solve.
I have an 4-column tab-separated file: I need to remove all of the lines that contain the string 'vis-à-vis'

Code:
achiever-n    vis-à-vis+ns-j+vp    oppose-v    1
achiever-n    vis-à-vis+ns-the+vg    assess-v    1
administrator-n    vis-à-vis+n-the+n    position-n    1
adobe-n    vis-à-vis+n-a-j+n-a-j    ad-n    1

In this way, if my file contains 4 lines that contain 'vis-à-vis' they will all be filterd.
How can I do this with a one liner grep?

---------- Post updated at 01:18 PM ---------- Previous update was at 01:09 PM ----------

or I need something that removes all non-ascii characters..

or that does the opposite of this grep

Code:
grep --color='auto' -P -n '[^\x00-\x7F]' file

I have tried
Code:
grep --color='auto' -P -n '![^\x00-\x7F]' file

with no success

Last edited by owwow14; 11-09-2013 at 02:23 PM..
 

10 More Discussions You Might Find Interesting

1. Programming

stupid question about ascii characters

i know it's out there, but I cannot remember how to check if a given ascii character string contains all digits or not ... any ideas? ie...function("123") --> OK function("NOT_A_NUMBER") --> returns error thanks!! (2 Replies)
Discussion started by: jalburger
2 Replies

2. Shell Programming and Scripting

Replace characters in a string using their ascii value

Hi All, In the HP Unix that i'm using when i initialise a string as Stalled="'30¬G'" Stalled=$Stalled" '30¬C'", it is taking the character ¬ as a comma. I need to grep for 30¬G 30¬C in a file and take its count. But since this character ¬ is not being understood, the count returns a zero. The... (2 Replies)
Discussion started by: roops
2 Replies

3. HP-UX

Hex characters of ascii file

Hi, Whats the command or how do you display the hexadecimal characters of an ascii file. thanks Bud (2 Replies)
Discussion started by: budrito
2 Replies

4. Shell Programming and Scripting

Multibyte characters to ASCII

Hello, Is there any UNIX utility/command/executable that will convert mutlibyte characters to standard single byte ASCII characters in a given file? and Is there any UNIX utility/command/executable that will recognize multibyte characters in a given file name? The typical multibyte... (8 Replies)
Discussion started by: jerardfjay
8 Replies

5. Shell Programming and Scripting

convert ascii values into ascii characters

Hi gurus, I have a file in unix with ascii values. I need to convert all the ascii values in the file to ascii characters. File contains nearly 20000 records with ascii values. (10 Replies)
Discussion started by: sandeeppvk
10 Replies

6. Shell Programming and Scripting

New line characters in Ascii file

I am having a file(1234.txt) downloaded from windows server (in Ascii format).However when i ftp this file to Unix server and try to work with it..i am unable to do anything.When i try to open the file using vi editor the file opens in the following format ... @ @ @ @ @ @ @ @... (4 Replies)
Discussion started by: appu2176
4 Replies

7. Shell Programming and Scripting

grep or sed. How to remove certain characters

Here is my problem. I have a list of phone numbers that I want to use only the last 4 digits as PINs for something I am working on. I have all the numbers in a file but now I want to be removed all items EXCEPT the last 4 digits. I have seen sed commands and some grep commands but I am... (10 Replies)
Discussion started by: Sucio
10 Replies

8. Shell Programming and Scripting

Removing these non-ASCII characters from a file

Hi, I have many text files which contain some non-ASCII characters. I attach the screenshots of one of the files for people to have a look at. The issue is even after issuing the non-ASCII removal commands one of the characters does not go away. The character that goes away is the black one with a... (2 Replies)
Discussion started by: shoaibjameel123
2 Replies

9. Shell Programming and Scripting

Grep to remove and add specified characters

I have the following type of 2 column file: motility - role - supplementation - age b ancestry b purity b recommendation b serenity b unease b carving f expansion f I would like to print only certain sections of the file depending on the value of the second column. For instance,... (6 Replies)
Discussion started by: owwow14
6 Replies

10. UNIX for Beginners Questions & Answers

Lower ASCII characters.

Hi, I'm writing a BBS telnet program. I'm having issues with it not displaying lower ASCII characters. For example, instead of displaying the "smiley face" character (Ctrl-B), it displays ^B. Is this because i'm using Ncurses? If so, is there any way around this? Thanks. (3 Replies)
Discussion started by: ignatius
3 Replies
vis(1)							      General Commands Manual							    vis(1)

NAME
vis, inv - make unprintable and non-ASCII characters in a file visible or invisible SYNOPSIS
file ... file ... DESCRIPTION
reads characters from each file in sequence and writes them to the standard output, converting those that are not printable or not ASCII into a visible form. inv performs the inverse function, reading printable characters from each file, returning them to non-printable or non-ASCII form, if appropriate, then writing them to standard output; Non-printable ASCII characters are represented using C-like escape conventions: backslash backspace escape form-feed new-line carriage return space horizontal tab vertical tab the character whose ASCII code is the 3-digit octal number n. the character whose ASCII code is the 2-digit hexadecimal number n. Non-ASCII single- or multi-byte characters are examined one byte at a time. For each byte, if it can be displayed as an ASCII character, it is treated as if it is an ASCII character; Otherwise, it is represented in the following conventions: the 8-bit character whose code value is the 3-digit octal number n. the 8-bit character whose code value is the 2-digit hexadecimal number n. Space, horizontal-tab, and new-line characters can be treated as printable (and therefore passed unaltered to the output) or non-printable depending on the options selected. Backslash, although printable, is expanded by vis, to a pair of backslashes so that when they are passed back through inv, they convert back to a single backslash. If no input file is given, or if the argument is encountered, and inv read from the standard input. Options and recognize the following options: Treat new-line, space, and horizontal tab as non-printable characters. expands them visibly as and rather than passing them directly to the output. discards these characters, expecting only the printable expansions. New-line characters are inserted by every 16 bytes so that the output will be in a form that is usable by most editors. Make and silent about non-existent files, identical input and output, and write errors. Normally, no input file can be the same as the output file unless it is a special file. Treat horizontal-tab and space characters as non-printable in the same manner that treats them. Cause output to be unbuffered (byte-by-byte); normally, output is buffered. Cause output to be in hexadecimal form rather than the default octal form. Either form is accepted to as input. EXTERNAL INFLUENCES
Environment Variables determines the language in which messages are displayed. International Code Set Support Single- and multi-byte character code sets are supported. WARNINGS
Redirecting output to an input file destroys the original data. Therefore, command forms such as should be avoided unless the source file can be safely discarded. AUTHOR
was developed by HP. SEE ALSO
cat(1), echo(1), od(1). vis(1)
All times are GMT -4. The time now is 09:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy