Filtering out Non-Lingual characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Filtering out Non-Lingual characters
# 1  
Old 03-11-2013
Filtering out Non-Lingual characters

In one of our project requirements , we will be
SCANNING ALL RECORDS OF AN INPUT TEXT FILE AND WILL BE FILTERING OUT RECORDS WHICH CONTAINS NON-LINGUAL CHARACTERS

What's meant by this requirement is that we will be retaining records that contains alphabets used in any
language , like English , French, German , Spanish, etc., numbers and any other symbols which can be keyed in from
and ANSII compliants keyboard
.

Any record containing anything other than the ones mentioned above will be treated as a BAD/Garbage records and will be
writtern in a seperate txt file.


I need to create a Unix script for it.
How to achieve this objective ?
Thanks
Kumarjit
# 2  
Old 03-11-2013
You may want to consider character classes in regexes. man regex:
Quote:
Within a bracket expression, the name of a character class enclosed in "[:" and ":]" stands for the list of all characters belonging
to that class. Standard character class names are:

alnum digit punct
alpha graph space
blank lower upper
cntrl print xdigit

These stand for the character classes defined in wctype(3). A locale may provide others. A character class may not be used as an endpoint of a range.
Your locale may influence which chars are treated as e.g. alpha etc.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Outputting characters after a given string and reporting the characters in the row below --sed

I have this fastq file: @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86 GGGGGGGGGGGGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCA +test-1 GGGGGGGGGGGGGGGGGCCGGGGGFF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8... (10 Replies)
Discussion started by: Xterra
10 Replies

2. Shell Programming and Scripting

Remove first 2 characters and last two characters of each line

here's what im trying to do. i have a file containing lines similar to this: data.txt: 1hsRmRsbHRiSFZNTTA1dlEyMWFkbU5wUW5CSlIyeDFTVU5SYjJOSFRuWmpia0ZuWXpKV2FHTnRU 1lKUnpWMldrZFZaMG95V25oYQpSelEyWTBka2QyRklhSHBrUjA1b1kwUkJkd3BOVXpWM1lVaG5k... (5 Replies)
Discussion started by: SkySmart
5 Replies

3. Shell Programming and Scripting

Help need to convert bi-lingual files in sub-title format

I have a large number of files in the standard subtitle format with the additional proviso that the files are bi-lingual i.e. English and a second language: in this case Hindi. A small sample is given below: 00 04 07 08 00 04 11 00 I mean very high fever... He even vomited. 00 04 07 08 00... (6 Replies)
Discussion started by: gimley
6 Replies

4. Shell Programming and Scripting

Reducing multiple entries in a tri-lingual dictionary to single entries

Dear all, I am editing a tri-lingual dictionary for open source which has the following data structure English headwords <Tab>Devanagari Headwords<Tab>PersoArabic headwords as in the example below to mark, to number अंगणु (اَنگَڻُ) The English headword entry has at times more than one word,... (2 Replies)
Discussion started by: gimley
2 Replies

5. Shell Programming and Scripting

Filtering

Hi I am interested in DNS resolving a set of sites and each time the output is different- $ host www.yahoo.com www.yahoo.com is an alias for fd-fp3.wg1.b.yahoo.com. fd-fp3.wg1.b.yahoo.com is an alias for ds-fp3.wg1.b.yahoo.com. ds-fp3.wg1.b.yahoo.com is an alias for... (1 Reply)
Discussion started by: jamie_123
1 Replies

6. AIX

Need help with filtering

Hi!! I have a bit of a task here and filtering/scripting not my strongest. I have to collect info of approx 1100 hdiskpower.so i have appended all the hdisk into a text file and i need it to run the command lscfg -vl to confirm if the drive is symmetrix. here's what i have so far at... (3 Replies)
Discussion started by: vpundit
3 Replies

7. Shell Programming and Scripting

sed replacing specific characters and control characters by escaping

sed -e "s// /g" old.txt > new.txt While I do know some control characters need to be escaped, can normal characters also be escaped and still work the same way? Basically I do not know all control characters that have a special meaning, for example, ?, ., % have a meaning and have to be escaped... (11 Replies)
Discussion started by: ijustneeda
11 Replies

8. Shell Programming and Scripting

Replace special characters with Escape characters?

i need to replace the any special characters with escape characters like below. test!=123-> test\!\=123 !@#$%^&*()-= to be replaced by \!\@\#\$\%\^\&\*\(\)\-\= (8 Replies)
Discussion started by: laknar
8 Replies

9. Shell Programming and Scripting

Please help me to do some filtering

I have to grep a pattern. scenario is like :- Suppose "/etc/sec/one" is a string, i need to check if this string contains "one" using any utility something like if /etc/sec/one | grep ; then Thanks in advance Renjesh Raju (3 Replies)
Discussion started by: Renjesh
3 Replies

10. Shell Programming and Scripting

How to replace characters with random characters

I've got a file (numbers.txt) filled with numbers and I want to replace each one of those numbers with a new random number between 0 and 9. This is my script so far: #!/bin/bash rand=$(($RANDOM % 9)) sed -i s//$rand/g numbers.txtThe problem that I have is that it replaces each number with just... (2 Replies)
Discussion started by: hellocatfood
2 Replies
Login or Register to Ask a Question