07-24-2008
text manipulation and pattern matching
Hi guys,
I need help:
I started receiving automatic emails containing download information. The problem is that these emails are coming in a rich format (I have no control of this) so the important information is buried under a bunch of mumbo-jumbo. To complicated things even further I need to automated the download process too so I need to somehow identify and extract the exact path to the file and forward it for further processing
the relevant part of the email looks something like this:
more_blah_before
style=3D"font-size: 11px; margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: =
0px; padding-bottom: 0px; padding-left: 0px; ">Software</td><td =
style=3D"font-size: 11px; margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: =
0px; padding-bottom: 0px; padding-left: 0px; "><a =
href=3D"afp://server.company.com/del/e/QQ888-9999/Q=
Q888-9999-3/QQ888-9999-3.dmg">del/QQ888-9999/QQ888-9999-3</a></td=
></tr><tr style=3D"vertical-align: top; margin-top: 0px; margin-right: =
0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; =
padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><td =
style=3D"font-size: 11px; margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: =
more_blah_after
so the part that I need to extract from here is
afp://server.company.com/del/e/QQ888-9999/QQ888-9999-3/QQ888-999-3.dmg
the problem is that the path to the file is split with "=" so that would have to be removed somehow (if present)
also I am not sure how to remove anything present before afp:// (like href=3D" in this case) or anything present after .dmg (
">del/QQ888-9999/QQ888-9999-3</a></td= in this case)
any help would be appreciated
thank you
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
I have file 1.txt with following entries as shown:
0152364|134444|10.20.30.40|015236433
0233654|122555|10.20.30.50|023365433
**
**
**
In file 2.txt I have the following entries as shown:
0152364|134444|10.20.30.40|015236433
0233654|122555|10.20.30.50|023365433... (4 Replies)
Discussion started by: imas
4 Replies
2. UNIX for Advanced & Expert Users
Hi,
I have two files that I need to match patterns with and the second file has comma delimited rows of data that match but I'm having trouble getting a script to work that gives me the match output to these sets :
file 1:
PADG_05255
PADG_06803
PADG_07148
PADG_02849
PADG_02886... (8 Replies)
Discussion started by: greptastic
8 Replies
3. Shell Programming and Scripting
Hi all,
I'm looking for some help. I have a file (very long) that is organized like below:
>Cluster 0
0 283nt, >01_FRYJ6ZM12HMXZS... at +/99%
1 279nt, >01_FRYJ6ZM12HN12A... at +/99%
2 281nt, >01_FRYJ6ZM12HM4TS... at +/99%
3 283nt, >01_FRYJ6ZM12HM946... at +/99%
4 279nt,... (4 Replies)
Discussion started by: d.chauliac
4 Replies
4. Shell Programming and Scripting
i am not sure what i should be using but would like a simple command that is able to insert a certain block of text that i define or from another text file into a xml file after a certain match is done
for e.g
insert the text
</servlet-mapping>
<!-- beechac added - for epic post-->... (3 Replies)
Discussion started by: cookie23patel
3 Replies
5. Shell Programming and Scripting
Can someone please assist me, I'm trying to get vi to remove all the occurences of the text in a file i.e. "DEVICE=/dev/mt??". The "??" represents a number variable. Is there a globel search and delete command that I can use?
Thank You in Advance. (3 Replies)
Discussion started by: roadrunner
3 Replies
6. Shell Programming and Scripting
Hello dear Unix shell professionals,
I am desperately trying to get a seemingly simple logic to work. I need to extract words from a text line and save them in an array. The text can look anything like that:
aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}...I am handicapped... (5 Replies)
Discussion started by: Grünspanix
5 Replies
7. Shell Programming and Scripting
'Hi
I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match.
Which option is to be used to exclude the line containing the pattern?
sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies
8. Shell Programming and Scripting
The sample file:
dept1: user1,user2,user3
dept2: user4,user5,user6
dept3: user7,user8,user9
I want to match by '/^dept2.*/' but don't want to have substring 'dept2:' in output. How to compose such regex? (8 Replies)
Discussion started by: urello
8 Replies
9. UNIX for Dummies Questions & Answers
Hi all!
Thanks for taking the time to view this!
I want to grep out all lines of a file that starts with pattern 1 but also does not match with the second pattern.
Example:
Drink a soda
Eat a banana
Eat multiple bananas
Drink an apple juice
Eat an apple
Eat multiple apples
I... (8 Replies)
Discussion started by: demmel
8 Replies
10. Shell Programming and Scripting
In the awk I am trying to add :p.=? to the end of each $9 that matches the pattern NM_. The below executes andis close but I can not seem to figure out why the :p.=? repeats in the split as in the green in the current output. I have added comments as well. Thank you :).
file
... (4 Replies)
Discussion started by: cmccabe
4 Replies
LEARN ABOUT PHP
preg_replace_callback
PREG_REPLACE_CALLBACK(3) 1 PREG_REPLACE_CALLBACK(3)
preg_replace_callback - Perform a regular expression search and replace using a callback
SYNOPSIS
mixed preg_replace_callback (mixed $pattern, callable $callback, mixed $subject, [int $limit = -1], [int &$count])
DESCRIPTION
The behavior of this function is almost identical to preg_replace(3), except for the fact that instead of $replacement parameter, one
should specify a $callback.
PARAMETERS
o $pattern
- The pattern to search for. It can be either a string or an array with strings.
o $callback
- A callback that will be called and passed an array of matched elements in the $subject string. The callback should return the
replacement string. This is the callback signature:
string handler (array $matches) You'll often need the $callback function for a preg_replace_callback(3) in just one place. In
this case you can use an anonymous function to declare the callback within the call to preg_replace_callback(3). By doing it this
way you have all information for the call in one place and do not clutter the function namespace with a callback function's name
not used anywhere else.
Example #1
preg_replace_callback(3) and anonymous function
<?php
/* a unix-style command line filter to convert uppercase
* letters at the beginning of paragraphs to lowercase */
$fp = fopen("php://stdin", "r") or die("can't read stdin");
while (!feof($fp)) {
$line = fgets($fp);
$line = preg_replace_callback(
'|<p>s*w|',
function ($matches) {
return strtolower($matches[0]);
},
$line
);
echo $line;
}
fclose($fp);
?>
o $subject
- The string or an array with strings to search and replace.
o $limit
- The maximum possible replacements for each pattern in each $subject string. Defaults to -1 (no limit).
o $count
- If specified, this variable will be filled with the number of replacements done.
RETURN VALUES
preg_replace_callback(3) returns an array if the $subject parameter is an array, or a string otherwise. On errors the return value is NULL
If matches are found, the new subject will be returned, otherwise $subject will be returned unchanged.
CHANGELOG
+--------+---------------------------------+
|Version | |
| | |
| | Description |
| | |
+--------+---------------------------------+
| 5.1.0 | |
| | |
| | The $count parameter was added |
| | |
+--------+---------------------------------+
EXAMPLES
Example #2
preg_replace_callback(3) example
<?php
// this text was used in 2002
// we want to get this up to date for 2003
$text = "April fools day is 04/01/2002
";
$text.= "Last christmas was 12/24/2001
";
// the callback function
function next_year($matches)
{
// as usual: $matches[0] is the complete match
// $matches[1] the match for the first subpattern
// enclosed in '(...)' and so on
return $matches[1].($matches[2]+1);
}
echo preg_replace_callback(
"|(d{2}/d{2}/)(d{4})|",
"next_year",
$text);
?>
The above example will output:
April fools day is 04/01/2003
Last christmas was 12/24/2002
Example #3
preg_replace_callback(3) using recursive structure to handle encapsulated BB code
<?php
$input = "plain [indent] deep [indent] deeper [/indent] deep [/indent] plain";
function parseTagsRecursive($input)
{
$regex = '#[indent]((?:[^[]|[(?!/?indent])|(?R))+)[/indent]#';
if (is_array($input)) {
$input = '<div style="margin-left: 10px">'.$input[1].'</div>';
}
return preg_replace_callback($regex, 'parseTagsRecursive', $input);
}
$output = parseTagsRecursive($input);
echo $output;
?>
SEE ALSO
PCRE Patterns, preg_quote(3), preg_replace(3), preg_last_error(3), Anonymous functions, information about the callback type.
PHP Documentation Group PREG_REPLACE_CALLBACK(3)