The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Page feed in Troff dpmore UNIX for Dummies Questions & Answers 2 05-18-2008 05:22 AM
Three feed plugins for WordPress iBot UNIX and Linux RSS News 0 04-07-2008 05:40 AM
Firefox feed extensions iBot UNIX and Linux RSS News 0 02-25-2008 05:10 AM
Needing a line feed for windows app benefactr Shell Programming and Scripting 5 11-13-2007 01:39 PM
Form Feed... johnny_woo UNIX for Dummies Questions & Answers 5 10-31-2003 06:02 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 04-02-2008
ropers's Avatar
ropers ropers is offline
Registered User
  
 

Join Date: Dec 2001
Location: Dublin
Posts: 48
How do I feed numbers from awk(1) to tail(1)?

Hello,

I am too daft to remember how to properly feed numbers that I've extracted with awk(1) to tail(1).

The actual question is probably a lot more simple than the context, but let me give you the context anyway:

I've just received some email that was sent with MS Outlook and arrived in my mailbox garbled. It was supposed to contain .jpeg images, but it just contained garbled text. Looking at the raw source of the email, I realised that for some reason the .jpg files had been uuencoded but not uudecoded. Here is a truncated part of the email:


Code:
Delivered-To: ropers
(...)
Received: from mail.gmx.net (mail.gmx.net)
        by mx.google.com (...)
Received-SPF: pass (google.com: permitted sender)
Authentication-Results: mx.google.com; spf=pass (google.com: permitted (...)
Received: (qmail invoked by alias); 02 Apr 2008 11:47:01 -0000
Received: from p57B96BD3.dip.t-dialin.net (EHLO babe) by mail.gmx.net (mp006) with SMTP; 02 Apr 2008 13:47:01 +0200
(...)
From: "GRopers"
To: "'Ropers'" 
Subject: Pictures
Date: Wed, 2 Apr 2008 13:46:58 +0200
X-Mailer: Microsoft Office Outlook, Build 11.0.5510
(...)
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
X-Y-GMX-Trusted: 0

begin 666 IMG_1232.jpg
M_]C_X `02D9)1@`!`0$`M "T``#_X1'317AI9@``24DJ``@````/``\!`@`&
M````P@```! !`@`6````R ```!H!!0`!````W@```!L!!0`!````Y@```"@!
M`P`!`````@```#(!`@`4````[@```!,"`P`!`````0````$0`P`!``````L`
(...)
MLVBN)"$(3.<X_2@1!8-Y=OM/8FIO,))/-4[,$QMUQO-8NL>+-/TNX>VG:1YD
MZJB]/QI6`I?$J3_B70*3C,G'Y&O,;E]D;$<UVOCB^%[IEA.B.B2$L XP:X.]
M?Y"O<UFU>11!;Y+$C[Q[GM4^#GDDFH[9=J$XY-2$TGN CU&QISFF4(!">6IK
6?=H8YW4UONTP!N@Y[44UN@^E%,#_V0``
`
end
(...)

Now the email contains just a bunch of jpg pics. I figured out that I could uudecode(1) the pics by saving the email's raw source text as /tmp/tmp.mail and issuing:


Code:
uudecode /tmp/tmp.mail

That worked, but only sort of. It extracted the first JPG, but only the first one, and there are several jpegs in that file. I then found that I can grep for the "begin" string that all the jpeg files start with (see above email excerpt), and what's more, I can tell grep(1) to print me the line numbers for each of the "begin" lines it spits out:


Code:
grep -n begin < /tmp/tmp.mail

The result is that grep prints this:


Code:
35:begin 666 IMG_1232.jpg
587:begin 666 IMG_1229.jpg
1154:begin 666 IMG_1221.jpg
2012:begin 666 IMG_1217.jpg
2938:begin 666 IMG_1215.jpg
4034:begin 666 IMG_1192.jpg
4538:begin 666 IMG_1190.jpg
5227:begin 666 IMG_1189.jpg
5644:begin 666 IMG_1188.jpg
6280:begin 666 IMG_1185.jpg
6891:begin 666 IMG_1184.jpg
7733:begin 666 IMG_1183.jpg
8237:begin 666 IMG_1247.jpg
9134:begin 666 IMG_1244.jpg
9826:begin 666 IMG_1242.jpg
10613:begin 666 IMG_1238.jpg
11297:begin 666 IMG_1237.jpg
11893:begin 666 IMG_1235.jpg
12325:begin 666 IMG_1234.jpg
13217:begin 666 IMG_1233.jpg

So far so good. Now I want to use awk(1) to extract only the line numbers. I currently use:


Code:
grep -n begin < /tmp/tmp.mail | awk -F : '{print $1}'

In case you're wondering, -F specifies the field separator character to be the colon, meaning awk will print only the stuff before the ":". Now I've got a list of the line numbers at which the respective uuencoded jpeg files start:


Code:
35
587
1154
2012
2938
4034
4538
5227
5644
6280
6891
7733
8237
9134
9826
10613
11297
11893
12325
13217

Now I can use tail(1) to feed the jpegs starting at these lines to uudecode(1) for decoding. Because uudecode ignores all but the first jpegs it encounters, I don't need to locate the end of the individual respective jpegs; I should be able to simply use tail(1) to make uudecode see everything from line 35, then from line 587, then 1154, and so on.

I can successfully do this manually by issuing e.g.:


Code:
tail -n +1154 /tmp/tmp.mail | uudecode

Here, tail will list the contents of /tmp/tmp.mail from line 1154 to the end of the file (EOF), and uudecode will decode the first (and only the first) jpeg file it sees, which is the one starting at line 1154.

Of course, I now could simply be stupid, and issue the same command again and again and again, manually iterating through the line numbers I have, but there has to be a better way and I want to learn (and thus be able to be lazy in the future ).

I tried defining a variable $LINE and feeding that to tail, but I just could not figure out how to properly glue the awk and tail commands together. My last attempts resulted in me having a $LINE variable that contained all the line numbers on one line, and of course tail interpreted these as extraneous file names and bitterly complained. This probably runs down to something really simple, but I just could not figure it out.

Any help would be very much appreciated.

PS: The "666" in the email excerpt, in case you're wondering, is just the permissions of the extracted file (rw-rw-rw-). No need to get all Christian about it .
  #2 (permalink)  
Old 04-02-2008
unilover unilover is offline
Registered User
  
 

Join Date: Mar 2008
Location: Toronto, Canada
Posts: 66
try this:

Code:
egrep -n 'zcat|touch' preou*|\
awk -F: '{print $1}'|\
paste -sd ",\n" -|\
sed 's=\(.*\)=sed -n "\1p" mymail | uudecode='|\
sh

You can run each consequtive piped-command to see what it produces and how the complete pipe does he job.
  #3 (permalink)  
Old 04-02-2008
unilover unilover is offline
Registered User
  
 

Join Date: Mar 2008
Location: Toronto, Canada
Posts: 66
Sorry! "zcat|touch" preou* was in my test!!

You should have "begin|end" mymail.txt
  #4 (permalink)  
Old 04-02-2008
Franklin52 Franklin52 is offline Forum Staff  
Moderator
  
 

Join Date: Feb 2007
Posts: 4,334
Another approach:


Code:
awk '/begin 666/{f=1;close("name");name=$3;next}f{print > name}' /tmp/tmp.mail

Regards
  #5 (permalink)  
Old 04-02-2008
ropers's Avatar
ropers ropers is offline
Registered User
  
 

Join Date: Dec 2001
Location: Dublin
Posts: 48
Thanks for your replies, unilover and Franklin52. I intend to work through both of your suggestions, to learn from them. I am of course aware that there are probably dozens or hundreds of possible solutions to this problem; but I'm also trying to take things one step at a time and figure out what was missing in my attempts.

Right now I'm looking at unilover's solution, and I'm a little bit stumped:

I've tried the first part of his solution, and this is what I initially got:


Code:
grep -n -E "begin|end" /tmp/tmp.mail
11:Received-SPF: pass (google.com: domain of gropers@xxx.xxx designates 192.168.64.20 as permitted sender) client-ip=192.168.64.20;
12:Authentication-Results: mx.google.com; spf=pass (google.com: domain of gropers@xxx.xx designates 192.168.64.20 as permitted sender) smtp.mail=gropers@xxx.xxx
35:begin 666 IMG_1232.jpg
585:end
587:begin 666 IMG_1229.jpg
1152:end
1154:begin 666 IMG_1221.jpg
2010:end
2012:begin 666 IMG_1217.jpg
2936:end
2938:begin 666 IMG_1215.jpg
4032:end
4034:begin 666 IMG_1192.jpg
4536:end
4538:begin 666 IMG_1190.jpg
5225:end
5227:begin 666 IMG_1189.jpg
5642:end
5644:begin 666 IMG_1188.jpg
6278:end
6280:begin 666 IMG_1185.jpg
6889:end
6891:begin 666 IMG_1184.jpg
7731:end
7733:begin 666 IMG_1183.jpg
8235:end
8237:begin 666 IMG_1247.jpg
9132:end
9134:begin 666 IMG_1244.jpg
9824:end
9826:begin 666 IMG_1242.jpg
10611:end
10613:begin 666 IMG_1238.jpg
11295:end
11297:begin 666 IMG_1237.jpg
11891:end
11893:begin 666 IMG_1235.jpg
12323:end
12325:begin 666 IMG_1234.jpg
13215:end
13217:begin 666 IMG_1233.jpg
13765:end

Note the output lines 11 and 12, which don't contain "begin" or "end". (Yes, I changed the email addresses and IP addresses, but I did that in /tmp/tmp.mail, in which lines 11 and 12 really don't contain either string.)

It also didn't matter whether I used grep -E or egrep, or single or double quotes.

At long last, I finally figured out what tripped up grep: It turns out that lines 11 and 12 both contained carriage returns (\r). vi confirmed this by showing the familiar ^M characters. I then did :%s/\r//g in vi and saved the file, after which grep worked as expected, and the lines 11 and 12 were no longer included in its output. So I know what made the error occur. What I don't understand is why this error occurred. Why would extraneous carriage returns cause grep to include these lines?

Many thanks for your help.
  #6 (permalink)  
Old 04-02-2008
ropers's Avatar
ropers ropers is offline
Registered User
  
 

Join Date: Dec 2001
Location: Dublin
Posts: 48
Arrgh!!! Nevermind the aforesaid, I've just figured things out -- turns out I was wrong when I wrote that the lines 11 and 12 don't contain "begin" or "end". Both lines contain the word "sender".

The carriage returns were a total red herring. I was wrong when I thought that removing them had fixed things. It turns out that when I tested grep after removing them I had only grepped "begin" and not "begin|end".

So it's probably not a good idea to grep for "end" in this case without throwing away the email headers first. In my initial --only partially successful-- approach this also was unnecessary, because tail lends itself really well to clipping off the upper parts of the file without even looking at them (and uudecode ignores anything boyond the first uuencoded file, so the rest can be left as it).

On a more positive note, I've found that
Code:
 grep -n -E "begin|end" /tmp/tmp.mail | awk -F : '{print $1}' | paste -s -d ",\n" - | sed 's=\(.*\)=sed -n "\1p" /tmp/tmp.mail | uudecode='| sh -

does indeed work -- though it complains:
Code:
uudecode: stdin: No `begin' line

which is entirely understandable, because the command line
Code:
sed -n "11,12p" /tmp/tmp.mail | uudecode

that's generated and passed to sh is bogus, and of course line 11 has no "begin" in it.

I still don't fully understand the nested sed stuff though. I'll try some more and/or come back with more questions. Also, if someone has a hint to get my initial approach with awk and tail to work, that would be really cool.

But again, many thanks so far.

Last edited by ropers; 04-02-2008 at 09:51 PM..
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 12:45 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0