Sponsored Content
Top Forums Shell Programming and Scripting Matching only the strings I provide - sed Post 302964384 by jozo95 on Thursday 14th of January 2016 05:07:54 AM
Old 01-14-2016
Linux Matching only the strings I provide - sed - SOLVED

Hello..

I am currently learning sed and have found myself in some trouble..

I wrote this command:
Code:
sed -ne 's/[^-<>]*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*\).*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*\).*/\1\2/p'

and some of the output i get is :

Code:
->stockholm->paris<-stockholmpi<-tokyo->paris<-stockholmpi
->stockholm<-stockholm->tokyo<-tokyo<-paris->stockholmtao
->paris<-stockholm<-tokyo<-paris<-tokyo->stockholm
<-tokyo<-stockholm->tokyo<-tokyo->stockholm->paris

As you can see, at the very end, it does not end with stockholm/paris/tokyo, because it still matches those extra letters because of my patter, now, how would I change my pattern to avoid these troubles ?

I tried (stockholm|tokyo|paris) but then I dont get the last city, stockholmpi for example (it should be stockholm only).

EDIT: Here is some of the data I use:

Code:
Wed3.14153<-paris<-stockholm->tokyo'->paris<-stockholm->parisphi$$
fubartao<-tokyo<-stockholm<-tokyoJul->paris->tokyo<-parisRed3.14153
$chi<-tokyo<-paris<-stockholmMar->tokyo<-stockholm->tokyoGreen 
Feb3.14153<-tokyo->tokyo<-parisBLACK<-paris<-tokyo->tokyoMar 
1011102.8<-stockholm<-tokyo<-tokyoblah<-stockholm<-stockholm<-tokyo3.14153001111
taoBLACK<-tokyo->paris->paris ->stockholm<-paris->stockholmThu3.14153
MayJun<-paris->paris<-stockholmSun->stockholm->tokyo->stockholm011011Green
NILLNULL->tokyo<-paris<-parisSep->stockholm->tokyo<-parisJunFri
AugFeb->stockholm<-stockholm->parisBLACK<-tokyo<-paris<-tokyoVOIDpi
 <-paris->paris->parisfoo->stockholm->paris->stockholm$NULL
chi3.14153<-paris<-paris<-tokyofoo<-stockholm<-paris->stockholm`100110
foo$$<-tokyo<-stockholm<-stockholm101101<-paris<-tokyo<-tokyo"Purple
fubarPurple->tokyo<-paris->paris ->tokyo<-paris<-tokyo`3.14
BlueMay->paris->stockholm<-stockholmVOID->stockholm->paris<-tokyoYellowphi
0101002.8<-tokyo->paris<-tokyotao<-tokyo<-tokyo->stockholmfooNULL
RedWed->paris->paris<-stockholmNILL<-tokyo<-paris->tokyoPurple 
100100$$$->paris->paris<-tokyo001011<-paris->paris->tokyoMonSep
Jan010001->paris->paris<-stockholmAug->tokyo<-paris->stockholmPurpleSep
->paris->paris<-tokyoblah<-stockholm<-stockholm<-paris010001tao
Purplefubar->stockholm<-paris->tokyoDec->paris->stockholm->tokyo$3.1415
010001->paris<-stockholm->tokyoVOID->tokyo<-stockholm<-tokyoMarFeb
SunFri->tokyo->paris<-tokyoJan->paris<-stockholm->tokyoWHITEMon


EDIT After RudiC's post:

Okay so the logic behind this pattern is,
1. It starts with a '->' or a '<-' followed by a city, example; ->tokyo.
2. After the city comes another arrow followed by another city, example; ->tokyo->paris.
3. Then again, an arrow, followed by a city, example; ->tokyo->paris<-tokyo.
4. Then some random texts come between, if you look at the last line in the data ive posted, you can see that after " ->tokyo->paris<-tokyo" comes "Jan" which is random text, we dont want this.
5. Then we meet our pattern again, same pattern as the previous.

This is the ideal result: ->tokyo->paris<-tokyo->paris<-stockholm->tokyo
Which I do get on this specific line, but on some other lines I get output like this:
Code:
 ->stockholm->paris<-stockholmpi<-tokyo->paris<-stockholmpi

And we see that the third city has two extra letters (pi) and the last city, has two extra letters (pi), that is because in my pattern i write :
Code:
[stockholm,paris,tokyo]*

which in turn matches 'p' and 'i' from paris.

Now how would I force sed to choose between the exact strings I provided, which is stockholm,paris and tokyo ?


EDIT: Solved it by using parantheses. Here is the solution:

Code:
sed -ne 's/[^-<>]*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*\).*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}
\(stockholm\|paris\|tokyo\)\{1\}\).*/{Phil}2053,\1{5872Phil}\2[->->]/p' datasets/q14target.txt


Last edited by jozo95; 01-14-2016 at 10:28 AM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

matching characters between strings

please send the logic or program to find the matching characters between two strings for ex string1 :abc string2 :adc no .of matching characters is 2(a,c) (9 Replies)
Discussion started by: akmtcs
9 Replies

2. Shell Programming and Scripting

Matching strings in unix shell programming

#!bin/sh `clear` for i in $(seq -w 15 37); do #echo $i wget --dns-timeout=0.0005 http://napdweb${i}.eao.abn-iad.ea.com:8000/webcore/test/test.jsp -o 1 A=`cat 1` C=$(expr "$A" :... (7 Replies)
Discussion started by: veerumahanthi41
7 Replies

3. Shell Programming and Scripting

Strings not matching

Hi, I have got two variables holding strings, if i echo them, they print the same value but if i compare the condition fails?? can somebody suggest something?? I have checked the word count too, they are also same. Thanks, Atul (4 Replies)
Discussion started by: atulmt
4 Replies

4. Programming

help with matching strings

In C programming how do i check if a char is equal to a vowel , like a e i o or u, small or big case. in my function i have the parameter like *word, and i am using word in a for loop, to check if its equal. i use tolower(word)=='a' || ..... but for some reason it only matches on lower case and... (1 Reply)
Discussion started by: omega666
1 Replies

5. UNIX for Dummies Questions & Answers

Help with finding matching position on strings

I have a DNA file like below and I am able to write a short program which finds/not an input motif, but I dont understand how I can include in the code to report which position the motif was found. Example I want to find the first or all "GAT" motifs and want the program to report which position... (12 Replies)
Discussion started by: pawannoel
12 Replies

6. Shell Programming and Scripting

Picking matching strings

I have a list of file names. However in some instances I might have a "-" at the beginning of the filename or an "=". For example I might have something like this set Lst = "file1 file2 file3 -file4 file5=" I want to pick up the ones having "-" at the beginning or "=" and store them in... (22 Replies)
Discussion started by: kristinu
22 Replies

7. Shell Programming and Scripting

matching strings from different files

I want to compare file 1 to file 2 and if a string from file 1 appears in file 2, then print the file 2 row, where the string appears, onto file3. file 1 looks like this. DOG_0004340 blah blah2 j 22424 DOG_3010311 blah blah3 o 24500 DOG_9949221 blah blah6 x 35035 file 2 looks like... (5 Replies)
Discussion started by: verse123
5 Replies

8. Shell Programming and Scripting

Concatenating 2 lines from 2 files having matching strings

Hello All Unix Users, I am still new to Unix, however I am eager to learn it.. I have 2 files, some lines have some matching substrings, I would like to concatenate these lines into one lines, leaving other untouched. Here below is an example for that.. File 1 (fasta file): >292183... (6 Replies)
Discussion started by: Mohamed EL Hadi
6 Replies

9. Shell Programming and Scripting

Need to append matching strings in a file

Hi , I am writing a shell script to check pvsizes in linux box. # for i in `cat vgs1` > do > echo "########### $i ###########" > pvs|grep -i $i|awk '{print $2,$1,$5}'>pvs_$i > pvs|grep -i $i|awk '{print $1}'|while read a > do > fdisk -l $a|head -2|tail -1|awk '{print $2,$3}'>pvs_$i1 >... (3 Replies)
Discussion started by: nanduri
3 Replies

10. UNIX for Beginners Questions & Answers

(g)awk: Matching strings from one file in another file between two strings

Hello all, I can get close to what I am looking for but cannot seem to hit it exactly and was wondering if I could get your help. I have the following sample from textfile with many thousands of lines: File 1 PS001,001 HLK PS002,004 L<G PS004,002 XNN PS004,006 BVX PS004,006 ZBX=... (7 Replies)
Discussion started by: jvoot
7 Replies
All times are GMT -4. The time now is 09:00 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy