Replacing a string with its substring


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Replacing a string with its substring
# 8  
Old 08-22-2011
Sorry, I just noticed that the one-liner posted above will work only if, within the double-brackets:
(a) there's a single string with no embedded "|"s
(b) there are exactly two strings with exactly one embedded "|"

So, cases like the following:

Code:
[[abc|def|ghi]]
[[abc|def|ghi|jkl]]

will not be matched, and hence will not be altered by the script.
An example follows (I've modified your data a bit):

Code:
$
$
$ cat f8
There are many types and traditions of anarchism, some of which are [[mutually exclusive]].
Strains of anarchism have been divided into the categories of [[social anarchism|social]]
and [[individualist anarchism]] or similar dual classifications. Anarchism is often
considered to be a radical [[left-wing]] ideology, and much of [[anarchist economics]]
and [[anarchist law|anarchist legal philosophy]] reflect [[anti-statism|anti-statist|non-statist]]
interpretations of [[anarcho-communism|communism]], [[collectivist anarchism|collectivism]],
[[anarcho-syndicalism|syndicalism|blah|BLAH]] or [[participatory economics]].
$
$
$ # Old script that does NOT work for more than 2 delimited tokens within double-brackets
$ perl -plne 's/\[\[[^|]*?\|*([^|]*?)\]\]/$1/g' f8
There are many types and traditions of anarchism, some of which are mutually exclusive.
Strains of anarchism have been divided into the categories of social
and individualist anarchism or similar dual classifications. Anarchism is often
considered to be a radical left-wing ideology, and much of anarchist economics
and anarchist legal philosophy reflect [[anti-statism|anti-statist|non-statist]]
interpretations of communism, collectivism,
[[anarcho-syndicalism|syndicalism|blah|BLAH]] or participatory economics.
$
$

The fix for this is to modify the regex so that it:
(a) matches all characters, including "|"s, as much as possible
(b) matches a single "|" character (if it exists at all)
(c) matches the remainder that does not include "|", and set it to position 1

Something like this:

Code:
$
$ cat f8
There are many types and traditions of anarchism, some of which are [[mutually exclusive]].
Strains of anarchism have been divided into the categories of [[social anarchism|social]]
and [[individualist anarchism]] or similar dual classifications. Anarchism is often
considered to be a radical [[left-wing]] ideology, and much of [[anarchist economics]]
and [[anarchist law|anarchist legal philosophy]] reflect [[anti-statism|anti-statist|non-statist]]
interpretations of [[anarcho-communism|communism]], [[collectivist anarchism|collectivism]],
[[anarcho-syndicalism|syndicalism|blah|BLAH]] or [[participatory economics]].
$
$ # New script that should work
$ perl -plne 's/\[\[.*?\|*([^|]*?)\]\]/$1/g' f8
There are many types and traditions of anarchism, some of which are mutually exclusive.
Strains of anarchism have been divided into the categories of social
and individualist anarchism or similar dual classifications. Anarchism is often
considered to be a radical left-wing ideology, and much of anarchist economics
and anarchist legal philosophy reflect non-statist
interpretations of communism, collectivism,
BLAH or participatory economics.
$
$

tyler_durden
This User Gave Thanks to durden_tyler For This Post:
# 9  
Old 08-22-2011
Thats good Tyler. I had also tried only with a single | delimiter.

Thank you for correcting that. I am searching for a good perl xml parser package to develop my own wikipedia xml parser. Could you help me in this if you have some knowledge in perl xml parsers.

As i am new to perl i do not know about where to search the packages and all. Any help is appreciated.

Regards
Satheesh
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting substring within string between 2 token within the string

Hello. First best wishes for everybody. here is the input file ("$INPUT1") contents : BASH_FUNC_message_begin_script%%=() { local -a L_ARRAY; BASH_FUNC_message_debug%%=() { local -a L_ARRAY; BASH_FUNC_message_end_script%%=() { local -a L_ARRAY; BASH_FUNC_message_error%%=() { local... (3 Replies)
Discussion started by: jcdole
3 Replies

2. Shell Programming and Scripting

sed - replacing a substring containing a hyphen

I'm attempting to replace a substring that contains a hyphen and not having much success, can anyone point out where i'm going wrong or suggest an alternative. # echo /var/lib/libvirt/images/vm888b-clone.qcow | sed -e 's|vm888-clone|qaz|g' /var/lib/libvirt/images/vm888b-clone.qcow (1 Reply)
Discussion started by: squrcles
1 Replies

3. Shell Programming and Scripting

Date substring from a string

Hi, I have 2 statements in a file a.sh start time is Fri Jan 9 17:17:33 CST 2015 a.sh end time is Fri Jan 9 17:47:33 CST 2015 I am required to get only the time out of it. like 17:17:33 & 17:47:33 PLs suggest (21 Replies)
Discussion started by: usrrenny
21 Replies

4. Shell Programming and Scripting

Remove a substring from string

Good morning friends, how can i remove a string with linux scripting from a file? In specific i want to remove from a file all the tweet names and links eg @aerta and links such as http://dst.co/pIiu3i9c Thanx!!! (4 Replies)
Discussion started by: paladinaeon
4 Replies

5. Shell Programming and Scripting

How to extract a substring from a string

Hi, I have an input string say for example: ABC,DEF,IJK,LMN,...,XYZ The above string is comma delimited. Now I have to extract the last part after the comma i.e. XYZ. :b: (3 Replies)
Discussion started by: bghosh
3 Replies

6. Shell Programming and Scripting

Help with string and substring also I/O

#!/bin/sh PRINTF=/usr/bin/printf PASSWD=/etc/passwd $PRINTF "Enter a UserID\n" read USERID if ; then $PRINTF "$USERID does not exist, please contact IT service\n" exit 1 fi USERHOME=`grep "^$USERID:" $PASSWD | awk -F : '{print $6}'` USERSHELL=`grep "^$USERID:"... (1 Reply)
Discussion started by: ikeQ
1 Replies

7. Shell Programming and Scripting

get substring from string

Hi All, Problem Description: XML_REP_REQUEST=`CONCSUB "$LOGIN" "SQLAP" "$RESP_NAME" "$USRNM" WAIT="Y" "CONCURRENT" "APPLICATION_SHORT_NAME" "CP_SHORT_NAME"` echo Report Request: $XML_REP_REQUEST --to print value in log file While execution the value of 'XML_REP_REQUEST' is 'Prozess... (5 Replies)
Discussion started by: suman.g
5 Replies

8. UNIX for Dummies Questions & Answers

How to get the substring from the string

Hi All, Can anybody help me to get the substring from the given string. (3 Replies)
Discussion started by: Anshu
3 Replies

9. Shell Programming and Scripting

getting a substring from a string

hi all, I am trying to extract SUBSTRINGS out of a string using ksh. The string is "SAPR3K.FD0.FA.TJ.B0010.T050302" I tried using a= `expr substr $stringZ 1 2` which is giving me a syntax error, donno why?? any ideas why its not working?? I also tried echo "welcome" | awk '{... (3 Replies)
Discussion started by: maradona
3 Replies

10. Programming

can i get a substring from a string?

for example, the string a is "abcdefg", can i get a substring "bcd" (from ato a) from string a? thank you (4 Replies)
Discussion started by: dell9
4 Replies
Login or Register to Ask a Question