Perl regular expression - To match a Dynamic URL


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl regular expression - To match a Dynamic URL
# 1  
Old 03-23-2011
Perl regular expression - To match a Dynamic URL

Hello All,

I have a requirement to match a dynamic url and extract each of the directory and page and store it -Only PERL style Regular EXP as it will be used in informatica - REG_EXTRACT function

Example Input URLs:
Code:
1) www-example-com/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/index.html
2) www-example-com/dir1/dir2/dir3/dir4/index.html
3) www-example-com/dir1/dir2/index.html
4) www-example-com/dir1/dir2/dir3/dir4/dir5/dir6/index.html

Requirement :
1)The maximum level i am looking for in a URL is 10 level of subdir.
2)I will be having 12 variable one will store the website name and one will store the page name and rest will be reserved for the directiry information.
If a directory is not there then it will be considered as null below are few examples to consider -

INPUT: www-example-com/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/index.html
OUTPUT: for first URL match should be ;
Code:
web_site=www-example-com
V_DIR1=dir1
V_DIR2=dir2
V_DIR3=dir3
V_DIR4=dir4
V_DIR5=dir5
V_DIR6=dir6
V_DIR7=dir7
V_DIR8=dir8
V_DIR9=dir9
V_DIR10=dir10
web_page=index.html

2) URL 2
INPUT : www-example-com/dir1/dir2/dir3/dir4/index.html
OUTPUT :
Code:
web_site=www-example-com
V_DIR1=dir1
V_DIR2=dir2
V_DIR3=dir3
V_DIR4=dir4
V_DIR5=null
V_DIR6=null
V_DIR7=null
V_DIR8=null
V_DIR9=null
V_DIR10=null
web_page=index.html


Input URL 3 ; www-example-com/dir1/dir2/index.html
OUTPUT :
Code:
web_site=www-example-com
V_DIR1=dir1
V_DIR2=dir2
V_DIR3=null
V_DIR4=null
V_DIR5=null
V_DIR6=null
V_DIR7=null
V_DIR8=null
V_DIR9=null
V_DIR10=null
web_page=index.html


Note : Website name changed to www-example-com to from dot .

Last edited by Franklin52; 03-23-2011 at 06:21 AM.. Reason: Please use code tags
# 2  
Old 03-23-2011
Could this help you ?
Code:
#!/usr/bin/perl

while (<DATA>) {
chomp;
@flds=split(/\//);
print "web_site=",$flds[0],"\n";
$count=$#flds;
$flds[11]=$flds[$count];
undef $flds[$count];
for($i=1;$i<=10;$i++) {
if (defined $flds[$i]) {
        print  "V_DIR".$i."=",$flds[$i],"\n";
}else{
        print "V_DIR".$i."=NULL","\n";
}
}
print "web_page=",$flds[11],"\n";
undef @flds;
}
__DATA__
www-example-com/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/index.html
www-example-com/dir1/dir2/dir3/dir4/index.html
www-example-com/dir1/dir2/index.html
www-example-com/dir1/dir2/dir3/dir4/dir5/dir6/index.html

OR With REGEX
Code:
#!/usr/bin/perl

for($i=0;$i<=11;$i++) {
$format[$i]='(.+?)\/' x $i;
}
print $i,"\n";
print $format[11],"\n";

while (<DATA>) {
chomp;
for($j=11;$j>=0;$j--) {
if (/$format[$j](.+?)$/) {
print "web_site=".$1,"\n";
$webpg=${$j+1};
for($k=2;$k<=$j;$k++) { print ${$k},"\n";}
for ($k=$j;$k<11;$k++) { print "NULL\n";}
print "web_page=".$webpg,"\n";
last;
}
}
}
__DATA__
www-example-com/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/index.html
www-example-com/dir1/dir2/dir3/dir4/index.html
www-example-com/dir1/dir2/index.html
www-example-com/dir1/dir2/dir3/dir4/dir5/dir6/index.html


Last edited by pravin27; 03-23-2011 at 06:56 AM..
# 3  
Old 03-23-2011
A line processor:
Code:
echo  'www-example-com/dir1/dir2/dir3/dir4/index.html'|perl -lne '
       @a=split/\//;
       $f=$a[$#a];
       $a[($#a)]=""  if ( $#a < 11 );
       print "web_site=$a[0]";
for $i (2 ..10) {
    $a[$i]="NULL" unless $a[$i] ;
    print  "V_DIR".$i."=",$a[$i];
}
print "web_page=",$f;'

Code:
web_site=www-example-com
V_DIR2=dir2
V_DIR3=dir3
V_DIR4=dir4
V_DIR5=NULL
V_DIR6=NULL
V_DIR7=NULL
V_DIR8=NULL
V_DIR9=NULL
V_DIR10=NULL
web_page=index.html

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular expression match

echo 20110101 | awk '{ print match($0,/^((17||18||19||20)|)-*(|0|1)-*(|0||3)$/)) I am getting a match for the above, where as it shouldn't, as there is no hyphen in the echoed date. Another question is what is the difference between || and | in the above statement (4 Replies)
Discussion started by: tostay2003
4 Replies

2. Shell Programming and Scripting

Perl split match regular expression with or

I cannot seem to get this to work correct: my ($k, $v) = split(/F/, $fc{$DIR}{symbolic}, 2); Below is the input (the $fc{$DIR}{symbolic} variable): QMH2562 FW:v5.06.03 DVR:v8.03.07.15.05.09-kbut i also need it to break on FV: Emulex NC553i FV4.2.401.6 DV8.3.5.86.2pthe code above... (2 Replies)
Discussion started by: rusted_planet
2 Replies

3. Homework & Coursework Questions

Regular Expression to match files in Perl

Hi Everybody! I need some help with a regular expression in Perl that will match files named messages, but also files named message.1, message.2 and so on. So really I need one that will find messages and messages that might be followed by a period and a digit without matching other files like... (2 Replies)
Discussion started by: Hax0rc1ph3r
2 Replies

4. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

5. Shell Programming and Scripting

regular expression exact match

hi everyone suppose we have two scenario echo ABCD | grep \{4\} DATE echo SYSDATE | grep \{4\} SYSDATE i want to match the string of four length only please help (5 Replies)
Discussion started by: aishsimplesweet
5 Replies

6. Shell Programming and Scripting

Dynamic regular expression in nawk

Hi FolksI want to write a nawk script that dynamically creates variable numbers of regular expressions on the fly rather than using constants. Is this possible?They would need to be used within an if statement within the nawk program: - if ( "my string of regex's" ) Thanks (7 Replies)
Discussion started by: steadyonabix
7 Replies

7. Shell Programming and Scripting

regular expression match

I am trying to match a similar line using grep with regular expression the line is /remote/mac/pbbbb/abc/def/hij/hop/include/abc/tif/element/test/testfiles/Office.cpp:57: const OfficeType& getType().get() const; I just need to extract the bold characters using grep with regular expression.... (5 Replies)
Discussion started by: prasbala
5 Replies

8. Shell Programming and Scripting

Regular expression match

Hi all, any idea how to match the following: char*<no or any string or space> buf and char *<no or any string or space> buf i need to capture the buf characters too. currently i need two checks to cover this: #search char* <any string> buf or char *<any string> buf @noarray =... (2 Replies)
Discussion started by: ChaMeN
2 Replies

9. UNIX for Dummies Questions & Answers

Regular Expression - match 'b' that follows 'a' and is at the end of a string

Hi, I'm struggling with a regex that would match a 'b' that follows an 'a' and is at the end of a string of non-white characters. For example: Line 1: aba abab b abb aab bab baa I can find the right strings but I'm lacking knowledge of how to "discard" the bits that precede bs.... (2 Replies)
Discussion started by: machinogodzilla
2 Replies

10. UNIX for Dummies Questions & Answers

Exact match with regular expression

Hi I have a file with data arranged into columns. The first column is the chromosome name. When I use grep to subset only rows with chr1, I get chr1 but also chr10, chr11,.. How do I get only rows with chr1? grep chr1 filein > fileout head fileout chr1 59757841 chr11 108258691 ... (2 Replies)
Discussion started by: jdhahbi
2 Replies
Login or Register to Ask a Question