Regex to identify word in second position on a line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Regex to identify word in second position on a line
# 1  
Old 04-21-2013
Regex to identify word in second position on a line

I am interested in finding a regex to find a word in second position on a line. The word in question is या
I tried the following PERL EXPRESSION but it did not work:
Code:
[^[:word:]] या 
or
^\W या

But both gave Null results
I am giving below a Sample file:
Code:
देना या सौंपना=delegate
तह जमना या जमाना=film
झुकना या झुकाना=slant
घुलना या घोलना=dissolve
घिसना या घिसाना=grate
खुले आम या प्रकट=avow
एड़ लगाना या देना=spur
उभरना या उभारना=heave
उचकाना या झाड़ना=shrug
आना या कुछ करना=mistime
हो जाना या होना=orient
होना या हो जाना=double
लिखना या लगाना=preface
बल या शिकन पड़ना=cockle

I would like the REGEX to single out all lines where the word या occurs in second position.
Many thanks for the help.
# 2  
Old 04-21-2013
I am somewhat at a handicap because of the script. Maybe time I learned. Smilie Anyway, maybe the following explanations will solve for you:

[^[:word:]] means a single character that is NOT a word character.
^[[:word:]] means a single word character at start of line.

^\W means a NON-word character at beginning of line.
^\w means a word character at beginning of line.
# 3  
Old 04-21-2013
Maybe too simplistic an approach, but try
Code:
$ awk '$2=="या"' file
देना या सौंपना=delegate
झुकना या झुकाना=slant
घुलना या घोलना=dissolve
घिसना या घिसाना=grate
उभरना या उभारना=heave
उचकाना या झाड़ना=shrug
आना या कुछ करना=mistime
होना या हो जाना=double
लिखना या लगाना=preface
बल या शिकन पड़ना=cockle

# 4  
Old 04-21-2013
Many thanks the Awk script worked, but I am still curious about finding a regex to identify the position of a word in a string
# 5  
Old 04-21-2013
Quote:
Originally Posted by gimley
... I am still curious about finding a regex to identify the position of a word in a string
Try:
Code:
perl -ne 'print if /^(\S+)\sया(\s|$)/'

or
Code:
perl -ne 'print if /^((\S+)\s+){1}या(\s|$)/'

The word boundary \b does not seem to fly here..

Last edited by Scrutinizer; 04-21-2013 at 03:38 PM..
# 6  
Old 04-21-2013
Quote:
Originally Posted by Scrutinizer
The word boundary \b does not seem to fly here..
\b will work with the -CD command-line switch. But, then you'll need to be careful about what \w and \W will match.
# 7  
Old 04-21-2013
If the environment variables are correctly set, I believe that use locale; or -Mlocale would inform perl on how to interpret character classes.

Please do not mistake me for a competent perl hacker.

EDIT: I was just skimming through perllocale. Wow. What a mess. Long story short, my suggestion may not be safe.

Regards,
Alister

Last edited by alister; 04-21-2013 at 06:23 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Regex to identify pattern

Hi In a file I have string in multiple lines. Like below: <?=test.getObjectName("L", "testTBL","D") ?> <?=test.getObjectName("L", "testTBL","testDB", "D") ?> I want to use regex to search for the pattern "<?=test.getObjectName...?>" If the parenthesis has 3 parameters then return 2nd... (5 Replies)
Discussion started by: dashing201
5 Replies

2. Shell Programming and Scripting

Regex to identify illegal characters in a perso-arabic database

I am working on Sindhi: a perso-Arabic script and since it shares the Unicode-block with over 400 other languages, quite often the database contains characters which are not wanted: illegal characters. I have identified the character set of Sindhi which is given below: For clarity's sake, each... (8 Replies)
Discussion started by: gimley
8 Replies

3. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies

4. Shell Programming and Scripting

Identify the First Column Position in Second Column and add the position value

Identify the First Column Position in Second Column and add the position value in 3rd column. Sample data: a|c b|d c|a d|b e|e f|g g|f |h |i Expected Output: a|c|1 b|d|2 c|a|3 d|b|4 (6 Replies)
Discussion started by: BrahmaNaiduA
6 Replies

5. Shell Programming and Scripting

Write a word at 72nd position of a matched line in a file

Hi, I need to search a file for a pattern,replace some other word and write a word at its 72nd position. For example, My name is Mano.Im learning Unix. I want to search the file in all lines containing the word "Mano".In that matched line,replace the word "Unix" with "Java".And... (5 Replies)
Discussion started by: mano1 n
5 Replies

6. Shell Programming and Scripting

Regex to identify a full-stop as a sentence delimiter

Hello, Splitting a sentence using the full-stop/question-mark/exclamation is a common device. Whereas the question-mark / exclamation do not pose too much of a problem; the full-stop as a sentence delimiter raises certain issues because of its varied use: just to name a few. Standard parsers... (9 Replies)
Discussion started by: gimley
9 Replies

7. UNIX for Dummies Questions & Answers

Use Regex to identify / format a complex string

First of all, please have mercy on me. I am not a noob to programming, but I am about as noob as you can get with regex. That being said, I have a problem. I've got a string that looks something like this: Publication - Bob M. Jones, Tony X. Stark, and Fred D. Man, \"Really Awesome Article... (1 Reply)
Discussion started by: egill
1 Replies

8. Shell Programming and Scripting

regex - start with a word but ignore that word

Hi Guys. I guess I have a very basic query but stuck with it :( I have a file in which I want to extract particular content. The content is between standard format like : Verify stats A=0 B=12 C=34 TEST Failed Now I want to extract data between "Verify stats" & "TEST Failed" but do... (6 Replies)
Discussion started by: ratneshnagori
6 Replies

9. Shell Programming and Scripting

Identify the position of character

Hi, Can some one guide me to identify the position of a character using index in UNIX. I have a record like "17/11/2010 15:16:39;reject;10.44.48.65;daemon alert; src: 10.44.48.112; dst: 172.21.52.88" . I need to identify the value which comes after _src:_ (_ denotes space). I am able to... (15 Replies)
Discussion started by: suneel.mekala
15 Replies

10. Shell Programming and Scripting

Sed : identify a pattern and append a word at the end of a line

Hello to all, On aix, I want to identify a term on a line in a file and then add a word at the end of the line identified. I do not want the word to be added when the line contains the symbol "#". I use the following command, but it deletes the term identified then adds the word. #sed... (4 Replies)
Discussion started by: dantares
4 Replies
Login or Register to Ask a Question