Matching only the strings I provide - sed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matching only the strings I provide - sed
# 1  
Old 01-14-2016
Linux Matching only the strings I provide - sed - SOLVED

Hello..

I am currently learning sed and have found myself in some trouble..

I wrote this command:
Code:
sed -ne 's/[^-<>]*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*\).*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*\).*/\1\2/p'

and some of the output i get is :

Code:
->stockholm->paris<-stockholmpi<-tokyo->paris<-stockholmpi
->stockholm<-stockholm->tokyo<-tokyo<-paris->stockholmtao
->paris<-stockholm<-tokyo<-paris<-tokyo->stockholm
<-tokyo<-stockholm->tokyo<-tokyo->stockholm->paris

As you can see, at the very end, it does not end with stockholm/paris/tokyo, because it still matches those extra letters because of my patter, now, how would I change my pattern to avoid these troubles ?

I tried (stockholm|tokyo|paris) but then I dont get the last city, stockholmpi for example (it should be stockholm only).

EDIT: Here is some of the data I use:

Code:
Wed3.14153<-paris<-stockholm->tokyo'->paris<-stockholm->parisphi$$
fubartao<-tokyo<-stockholm<-tokyoJul->paris->tokyo<-parisRed3.14153
$chi<-tokyo<-paris<-stockholmMar->tokyo<-stockholm->tokyoGreen 
Feb3.14153<-tokyo->tokyo<-parisBLACK<-paris<-tokyo->tokyoMar 
1011102.8<-stockholm<-tokyo<-tokyoblah<-stockholm<-stockholm<-tokyo3.14153001111
taoBLACK<-tokyo->paris->paris ->stockholm<-paris->stockholmThu3.14153
MayJun<-paris->paris<-stockholmSun->stockholm->tokyo->stockholm011011Green
NILLNULL->tokyo<-paris<-parisSep->stockholm->tokyo<-parisJunFri
AugFeb->stockholm<-stockholm->parisBLACK<-tokyo<-paris<-tokyoVOIDpi
 <-paris->paris->parisfoo->stockholm->paris->stockholm$NULL
chi3.14153<-paris<-paris<-tokyofoo<-stockholm<-paris->stockholm`100110
foo$$<-tokyo<-stockholm<-stockholm101101<-paris<-tokyo<-tokyo"Purple
fubarPurple->tokyo<-paris->paris ->tokyo<-paris<-tokyo`3.14
BlueMay->paris->stockholm<-stockholmVOID->stockholm->paris<-tokyoYellowphi
0101002.8<-tokyo->paris<-tokyotao<-tokyo<-tokyo->stockholmfooNULL
RedWed->paris->paris<-stockholmNILL<-tokyo<-paris->tokyoPurple 
100100$$$->paris->paris<-tokyo001011<-paris->paris->tokyoMonSep
Jan010001->paris->paris<-stockholmAug->tokyo<-paris->stockholmPurpleSep
->paris->paris<-tokyoblah<-stockholm<-stockholm<-paris010001tao
Purplefubar->stockholm<-paris->tokyoDec->paris->stockholm->tokyo$3.1415
010001->paris<-stockholm->tokyoVOID->tokyo<-stockholm<-tokyoMarFeb
SunFri->tokyo->paris<-tokyoJan->paris<-stockholm->tokyoWHITEMon


EDIT After RudiC's post:

Okay so the logic behind this pattern is,
1. It starts with a '->' or a '<-' followed by a city, example; ->tokyo.
2. After the city comes another arrow followed by another city, example; ->tokyo->paris.
3. Then again, an arrow, followed by a city, example; ->tokyo->paris<-tokyo.
4. Then some random texts come between, if you look at the last line in the data ive posted, you can see that after " ->tokyo->paris<-tokyo" comes "Jan" which is random text, we dont want this.
5. Then we meet our pattern again, same pattern as the previous.

This is the ideal result: ->tokyo->paris<-tokyo->paris<-stockholm->tokyo
Which I do get on this specific line, but on some other lines I get output like this:
Code:
 ->stockholm->paris<-stockholmpi<-tokyo->paris<-stockholmpi

And we see that the third city has two extra letters (pi) and the last city, has two extra letters (pi), that is because in my pattern i write :
Code:
[stockholm,paris,tokyo]*

which in turn matches 'p' and 'i' from paris.

Now how would I force sed to choose between the exact strings I provided, which is stockholm,paris and tokyo ?


EDIT: Solved it by using parantheses. Here is the solution:

Code:
sed -ne 's/[^-<>]*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*\).*\([-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}[stockholm,paris,tokyo]*[-><]\{2\}
\(stockholm\|paris\|tokyo\)\{1\}\).*/{Phil}2053,\1{5872Phil}\2[->->]/p' datasets/q14target.txt


Last edited by jozo95; 01-14-2016 at 10:28 AM..
# 2  
Old 01-14-2016
That spec is not too helpful. Please describe the desired results and the logics/algorithm to achieve them.
# 3  
Old 01-14-2016
Quote:
Originally Posted by RudiC
That spec is not too helpful. Please describe the desired results and the logics/algorithm to achieve them.
Edited my post, thanks.
# 4  
Old 01-14-2016
Quote:
This is the ideal result: ->tokyo->paris<-tokyoJan->paris<-stockholm->tokyo
Why that? Is "Jan" acceptable?

I'm afraid that without a list of acceptable cities, there's no chance to remove random text.
# 5  
Old 01-14-2016
Quote:
Originally Posted by RudiC
Why that? Is "Jan" acceptable?

I'm afraid that without a list of acceptable cities, there's no chance to remove random text.
Sorry, only "stockholm", "tokyo", "paris" is acceptable, I made a miss, sorry for that.
# 6  
Old 01-14-2016
I don't think sed can solve this efficiently. How about awk? Try
Code:
awk '
BEGIN   {for (n=split("stockholm paris tokyo", T); n>0; n--) C[T[n]] = n
        }
        {for (i=1; i<=NF; i++)  {V = ""
                                 for (c in C) if ($i ~ c) V = c
                                 sub (/[^<>]+/, V, $i)
                                 printf "%s%s", $i, (i<NF)?FS:""
                                }
         printf "\n"
        }
' FS="-" file
<-paris<-stockholm->tokyo->paris<-stockholm->paris
<-tokyo<-stockholm<-tokyo->paris->tokyo<-paris
<-tokyo<-paris<-stockholm->tokyo<-stockholm->tokyo
<-tokyo->tokyo<-paris<-paris<-tokyo->tokyo
<-stockholm<-tokyo<-tokyo<-stockholm<-stockholm<-tokyo
<-tokyo->paris->paris->stockholm<-paris->stockholm
.
.
.

---------- Post updated at 12:44 ---------- Previous update was at 12:42 ----------

Or even
Code:
awk '
BEGIN   {for (n=split("stockholm paris tokyo", T); n>0; n--) C[T[n]] = n
        }
        {for (i=1; i<=NF; i++)  {V = ""
                                 for (c in C) if ($i ~ c) V = c
                                 sub (/[^<>]+/, V, $i)
                                }
        }
1
' FS="-" OFS="-" file

# 7  
Old 01-14-2016
Quote:
Originally Posted by RudiC
I don't think sed can solve this efficiently. How about awk?

I kinda need to use sed for this assignment :/
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

(g)awk: Matching strings from one file in another file between two strings

Hello all, I can get close to what I am looking for but cannot seem to hit it exactly and was wondering if I could get your help. I have the following sample from textfile with many thousands of lines: File 1 PS001,001 HLK PS002,004 L<G PS004,002 XNN PS004,006 BVX PS004,006 ZBX=... (7 Replies)
Discussion started by: jvoot
7 Replies

2. Shell Programming and Scripting

Need to append matching strings in a file

Hi , I am writing a shell script to check pvsizes in linux box. # for i in `cat vgs1` > do > echo "########### $i ###########" > pvs|grep -i $i|awk '{print $2,$1,$5}'>pvs_$i > pvs|grep -i $i|awk '{print $1}'|while read a > do > fdisk -l $a|head -2|tail -1|awk '{print $2,$3}'>pvs_$i1 >... (3 Replies)
Discussion started by: nanduri
3 Replies

3. Shell Programming and Scripting

Concatenating 2 lines from 2 files having matching strings

Hello All Unix Users, I am still new to Unix, however I am eager to learn it.. I have 2 files, some lines have some matching substrings, I would like to concatenate these lines into one lines, leaving other untouched. Here below is an example for that.. File 1 (fasta file): >292183... (6 Replies)
Discussion started by: Mohamed EL Hadi
6 Replies

4. Shell Programming and Scripting

matching strings from different files

I want to compare file 1 to file 2 and if a string from file 1 appears in file 2, then print the file 2 row, where the string appears, onto file3. file 1 looks like this. DOG_0004340 blah blah2 j 22424 DOG_3010311 blah blah3 o 24500 DOG_9949221 blah blah6 x 35035 file 2 looks like... (5 Replies)
Discussion started by: verse123
5 Replies

5. Shell Programming and Scripting

Picking matching strings

I have a list of file names. However in some instances I might have a "-" at the beginning of the filename or an "=". For example I might have something like this set Lst = "file1 file2 file3 -file4 file5=" I want to pick up the ones having "-" at the beginning or "=" and store them in... (22 Replies)
Discussion started by: kristinu
22 Replies

6. UNIX for Dummies Questions & Answers

Help with finding matching position on strings

I have a DNA file like below and I am able to write a short program which finds/not an input motif, but I dont understand how I can include in the code to report which position the motif was found. Example I want to find the first or all "GAT" motifs and want the program to report which position... (12 Replies)
Discussion started by: pawannoel
12 Replies

7. Programming

help with matching strings

In C programming how do i check if a char is equal to a vowel , like a e i o or u, small or big case. in my function i have the parameter like *word, and i am using word in a for loop, to check if its equal. i use tolower(word)=='a' || ..... but for some reason it only matches on lower case and... (1 Reply)
Discussion started by: omega666
1 Replies

8. Shell Programming and Scripting

Strings not matching

Hi, I have got two variables holding strings, if i echo them, they print the same value but if i compare the condition fails?? can somebody suggest something?? I have checked the word count too, they are also same. Thanks, Atul (4 Replies)
Discussion started by: atulmt
4 Replies

9. Shell Programming and Scripting

Matching strings in unix shell programming

#!bin/sh `clear` for i in $(seq -w 15 37); do #echo $i wget --dns-timeout=0.0005 http://napdweb${i}.eao.abn-iad.ea.com:8000/webcore/test/test.jsp -o 1 A=`cat 1` C=$(expr "$A" :... (7 Replies)
Discussion started by: veerumahanthi41
7 Replies

10. UNIX for Dummies Questions & Answers

matching characters between strings

please send the logic or program to find the matching characters between two strings for ex string1 :abc string2 :adc no .of matching characters is 2(a,c) (9 Replies)
Discussion started by: akmtcs
9 Replies
Login or Register to Ask a Question