07-24-2008
text manipulation and pattern matching
Hi guys,
I need help:
I started receiving automatic emails containing download information. The problem is that these emails are coming in a rich format (I have no control of this) so the important information is buried under a bunch of mumbo-jumbo. To complicated things even further I need to automated the download process too so I need to somehow identify and extract the exact path to the file and forward it for further processing
the relevant part of the email looks something like this:
more_blah_before
style=3D"font-size: 11px; margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: =
0px; padding-bottom: 0px; padding-left: 0px; ">Software</td><td =
style=3D"font-size: 11px; margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: =
0px; padding-bottom: 0px; padding-left: 0px; "><a =
href=3D"afp://server.company.com/del/e/QQ888-9999/Q=
Q888-9999-3/QQ888-9999-3.dmg">del/QQ888-9999/QQ888-9999-3</a></td=
></tr><tr style=3D"vertical-align: top; margin-top: 0px; margin-right: =
0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; =
padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "><td =
style=3D"font-size: 11px; margin-top: 0px; margin-right: 0px; =
margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: =
more_blah_after
so the part that I need to extract from here is
afp://server.company.com/del/e/QQ888-9999/QQ888-9999-3/QQ888-999-3.dmg
the problem is that the path to the file is split with "=" so that would have to be removed somehow (if present)
also I am not sure how to remove anything present before afp:// (like href=3D" in this case) or anything present after .dmg (
">del/QQ888-9999/QQ888-9999-3</a></td= in this case)
any help would be appreciated
thank you
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
I have file 1.txt with following entries as shown:
0152364|134444|10.20.30.40|015236433
0233654|122555|10.20.30.50|023365433
**
**
**
In file 2.txt I have the following entries as shown:
0152364|134444|10.20.30.40|015236433
0233654|122555|10.20.30.50|023365433... (4 Replies)
Discussion started by: imas
4 Replies
2. UNIX for Advanced & Expert Users
Hi,
I have two files that I need to match patterns with and the second file has comma delimited rows of data that match but I'm having trouble getting a script to work that gives me the match output to these sets :
file 1:
PADG_05255
PADG_06803
PADG_07148
PADG_02849
PADG_02886... (8 Replies)
Discussion started by: greptastic
8 Replies
3. Shell Programming and Scripting
Hi all,
I'm looking for some help. I have a file (very long) that is organized like below:
>Cluster 0
0 283nt, >01_FRYJ6ZM12HMXZS... at +/99%
1 279nt, >01_FRYJ6ZM12HN12A... at +/99%
2 281nt, >01_FRYJ6ZM12HM4TS... at +/99%
3 283nt, >01_FRYJ6ZM12HM946... at +/99%
4 279nt,... (4 Replies)
Discussion started by: d.chauliac
4 Replies
4. Shell Programming and Scripting
i am not sure what i should be using but would like a simple command that is able to insert a certain block of text that i define or from another text file into a xml file after a certain match is done
for e.g
insert the text
</servlet-mapping>
<!-- beechac added - for epic post-->... (3 Replies)
Discussion started by: cookie23patel
3 Replies
5. Shell Programming and Scripting
Can someone please assist me, I'm trying to get vi to remove all the occurences of the text in a file i.e. "DEVICE=/dev/mt??". The "??" represents a number variable. Is there a globel search and delete command that I can use?
Thank You in Advance. (3 Replies)
Discussion started by: roadrunner
3 Replies
6. Shell Programming and Scripting
Hello dear Unix shell professionals,
I am desperately trying to get a seemingly simple logic to work. I need to extract words from a text line and save them in an array. The text can look anything like that:
aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}...I am handicapped... (5 Replies)
Discussion started by: Grünspanix
5 Replies
7. Shell Programming and Scripting
'Hi
I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match.
Which option is to be used to exclude the line containing the pattern?
sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies
8. Shell Programming and Scripting
The sample file:
dept1: user1,user2,user3
dept2: user4,user5,user6
dept3: user7,user8,user9
I want to match by '/^dept2.*/' but don't want to have substring 'dept2:' in output. How to compose such regex? (8 Replies)
Discussion started by: urello
8 Replies
9. UNIX for Dummies Questions & Answers
Hi all!
Thanks for taking the time to view this!
I want to grep out all lines of a file that starts with pattern 1 but also does not match with the second pattern.
Example:
Drink a soda
Eat a banana
Eat multiple bananas
Drink an apple juice
Eat an apple
Eat multiple apples
I... (8 Replies)
Discussion started by: demmel
8 Replies
10. Shell Programming and Scripting
In the awk I am trying to add :p.=? to the end of each $9 that matches the pattern NM_. The below executes andis close but I can not seem to figure out why the :p.=? repeats in the split as in the green in the current output. I have added comments as well. Thank you :).
file
... (4 Replies)
Discussion started by: cmccabe
4 Replies
LEARN ABOUT DEBIAN
text2ps
TEXT2PS(L) TEXT2PS(L)
NAME
text2ps - convert text files to PostScript
SYNOPSIS
text2ps [ options ] [ files ]
DESCRIPTION
Text2ps reads the input files (standard input if none are specified) and produces PostScript code which, when fed to a PostScript printer,
will print the files. With text2ps it is possible to select any font, point size and number of columns. Options and files can be inter-
mixed on the command line. Options are effective for all following files until they are overridden.
Options
Here follows a list of options that text2ps recognizes. Most numeric arguments are significant to one decimal place. Options are evalu-
ated from left to right. Later options override earlier ones.
-# n Print n copies of each page. (Default 1.)
-c n Print in n columns. (Default 1.)
-f font
Print using font font. (Default Courier.)
-p n Print with point size n. (Default 9.)
-v n Use a vertical spacing of n points. If the vertical spacing is set to 0, the spacing will be 1.2 times the point size. (Default
0.)
-l n Print n lines per column. When the line count is 0, print as many lines as will fit. (Default 0.)
-r [p|l]
Set the orientation to either portrait mode (p) or landscape mode (l). (Default p.)
-b [+|-]
Set page break mode. An argument + will force new files to be always printed on a new page (this is the default). After - new
files will be put on the same page if there are still empty columns and the number of columns, the orientation or the number of
copies didn't change. New files always start new columns. (Default -.)
-mt n The top margin is n points. (Default 63.)
-mb n The bottom margin is n points. (Default 63.)
-ml n The left margin is n points. (Default 59.)
-mr n The right margin is n points. (Default 59.)
-mg n The inter-column gap is n points. (Default 25.)
-t [+|-]
If the argument is + the name of the file being printed will be printed on each page. If the argument is - the file name will not
be printed. -t + implies -b +.
-T text
Print text as title on each page. This implies -t - and -b +. This option can be switched off by specifying -t - or -t +.
(Default no title.)
-F font
Set the title font to font. (Default Helvetica.)
-P n Set the title point size to n. (Default 12.)
-B n Draw borders around each page. The number n specifies how to draw borders. N can have any of the following values or-ed in:
1 Draw a line along the left of the page.
2 Draw a line along the bottom of the page.
4 Draw a line along the right of the page.
8 Draw a line along the top of the page.
16 Draw a line between columns. This line does not connect to the lines along the top or bottom.
32 Draw a connecting line between the line between columns and the line along the top.
64 Draw a connecting line between the line between columns and the line along the bottom.
When n is 0, no border lines are drawn. (Default no bordering lines.)
-w n Tab stops are set every n spaces. Set the width of the TAB character. (Default 8.)
-1 Sets up options to print in one column in portrait mode with the Courier font, so that you get 66 lines on a page. Equivalent to
specifying the options -c 1 -f Courier -p 9 -v 0 -r p -l 0 -mt 63 -mb 63 -ml 59 -mr 59.
This is the default.
-2 Sets up options to print in two columns in landscape mode with the Courier font, so that you get two 66-line columns on a page.
Equivalent to specifying the options -c 2 -f Courier -p 6 -v 0 -r l -l 0 -mt 63 -mb 63 -ml 59 -mr 59 -mg 25. Together with the -1
option, this is probably the most useful option.
The name - means standard input.
BUGS
Too many options.
There is no way to specify where the title will be placed.
If the font being used is not a constant width font and there are other characters than just tabs and spaces in front of a tab, the next
character may not align properly.
TEXT2PS(L)