How do I feed numbers from awk(1) to tail(1)?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How do I feed numbers from awk(1) to tail(1)?
# 1  
Old 04-02-2008
How do I feed numbers from awk(1) to tail(1)?

Hello,

I am too daft to remember how to properly feed numbers that I've extracted with awk(1) to tail(1).

The actual question is probably a lot more simple than the context, but let me give you the context anyway:

I've just received some email that was sent with MS Outlook and arrived in my mailbox garbled. It was supposed to contain .jpeg images, but it just contained garbled text. Looking at the raw source of the email, I realised that for some reason the .jpg files had been uuencoded but not uudecoded. Here is a truncated part of the email:

Code:
Delivered-To: ropers
(...)
Received: from mail.gmx.net (mail.gmx.net)
        by mx.google.com (...)
Received-SPF: pass (google.com: permitted sender)
Authentication-Results: mx.google.com; spf=pass (google.com: permitted (...)
Received: (qmail invoked by alias); 02 Apr 2008 11:47:01 -0000
Received: from p57B96BD3.dip.t-dialin.net (EHLO babe) by mail.gmx.net (mp006) with SMTP; 02 Apr 2008 13:47:01 +0200
(...)
From: "GRopers"
To: "'Ropers'" 
Subject: Pictures
Date: Wed, 2 Apr 2008 13:46:58 +0200
X-Mailer: Microsoft Office Outlook, Build 11.0.5510
(...)
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
X-Y-GMX-Trusted: 0

begin 666 IMG_1232.jpg
M_]C_X `02D9)1@`!`0$`M "T``#_X1'317AI9@``24DJ``@````/``\!`@`&
M````P@```! !`@`6````R ```!H!!0`!````W@```!L!!0`!````Y@```"@!
M`P`!`````@```#(!`@`4````[@```!,"`P`!`````0````$0`P`!``````L`
(...)
MLVBN)"$(3.<X_2@1!8-Y=OM/8FIO,))/-4[,$QMUQO-8NL>+-/TNX>VG:1YD
MZJB]/QI6`I?$J3_B70*3C,G'Y&O,;E]D;$<UVOCB^%[IEA.B.B2$L XP:X.]
M?Y"O<UFU>11!;Y+$C[Q[GM4^#GDDFH[9=J$XY-2$TGN CU&QISFF4(!">6IK
6?=H8YW4UONTP!N@Y[44UN@^E%,#_V0``
`
end
(...)

Now the email contains just a bunch of jpg pics. I figured out that I could uudecode(1) the pics by saving the email's raw source text as /tmp/tmp.mail and issuing:

Code:
uudecode /tmp/tmp.mail

That worked, but only sort of. It extracted the first JPG, but only the first one, and there are several jpegs in that file. I then found that I can grep for the "begin" string that all the jpeg files start with (see above email excerpt), and what's more, I can tell grep(1) to print me the line numbers for each of the "begin" lines it spits out:

Code:
grep -n begin < /tmp/tmp.mail

The result is that grep prints this:

Code:
35:begin 666 IMG_1232.jpg
587:begin 666 IMG_1229.jpg
1154:begin 666 IMG_1221.jpg
2012:begin 666 IMG_1217.jpg
2938:begin 666 IMG_1215.jpg
4034:begin 666 IMG_1192.jpg
4538:begin 666 IMG_1190.jpg
5227:begin 666 IMG_1189.jpg
5644:begin 666 IMG_1188.jpg
6280:begin 666 IMG_1185.jpg
6891:begin 666 IMG_1184.jpg
7733:begin 666 IMG_1183.jpg
8237:begin 666 IMG_1247.jpg
9134:begin 666 IMG_1244.jpg
9826:begin 666 IMG_1242.jpg
10613:begin 666 IMG_1238.jpg
11297:begin 666 IMG_1237.jpg
11893:begin 666 IMG_1235.jpg
12325:begin 666 IMG_1234.jpg
13217:begin 666 IMG_1233.jpg

So far so good. Now I want to use awk(1) to extract only the line numbers. I currently use:

Code:
grep -n begin < /tmp/tmp.mail | awk -F : '{print $1}'

In case you're wondering, -F specifies the field separator character to be the colon, meaning awk will print only the stuff before the ":". Now I've got a list of the line numbers at which the respective uuencoded jpeg files start:

Code:
35
587
1154
2012
2938
4034
4538
5227
5644
6280
6891
7733
8237
9134
9826
10613
11297
11893
12325
13217

Now I can use tail(1) to feed the jpegs starting at these lines to uudecode(1) for decoding. Because uudecode ignores all but the first jpegs it encounters, I don't need to locate the end of the individual respective jpegs; I should be able to simply use tail(1) to make uudecode see everything from line 35, then from line 587, then 1154, and so on.

I can successfully do this manually by issuing e.g.:

Code:
tail -n +1154 /tmp/tmp.mail | uudecode

Here, tail will list the contents of /tmp/tmp.mail from line 1154 to the end of the file (EOF), and uudecode will decode the first (and only the first) jpeg file it sees, which is the one starting at line 1154.

Of course, I now could simply be stupid, and issue the same command again and again and again, manually iterating through the line numbers I have, but there has to be a better way and I want to learn (and thus be able to be lazy in the future Smilie).

I tried defining a variable $LINE and feeding that to tail, but I just could not figure out how to properly glue the awk and tail commands together. My last attempts resulted in me having a $LINE variable that contained all the line numbers on one line, and of course tail interpreted these as extraneous file names and bitterly complained. This probably runs down to something really simple, but I just could not figure it out.

Any help would be very much appreciated. Smilie

PS: The "666" in the email excerpt, in case you're wondering, is just the permissions of the extracted file (rw-rw-rw-). No need to get all Christian about it Smilie.
# 2  
Old 04-02-2008
try this:
Code:
egrep -n 'zcat|touch' preou*|\
awk -F: '{print $1}'|\
paste -sd ",\n" -|\
sed 's=\(.*\)=sed -n "\1p" mymail | uudecode='|\
sh

You can run each consequtive piped-command to see what it produces and how the complete pipe does he job.
# 3  
Old 04-02-2008
Sorry! "zcat|touch" preou* was in my test!!

You should have "begin|end" mymail.txt
Smilie
# 4  
Old 04-02-2008
Another approach:

Code:
awk '/begin 666/{f=1;close("name");name=$3;next}f{print > name}' /tmp/tmp.mail

Regards
# 5  
Old 04-02-2008
Thanks for your replies, unilover and Franklin52. I intend to work through both of your suggestions, to learn from them. I am of course aware that there are probably dozens or hundreds of possible solutions to this problem; but I'm also trying to take things one step at a time and figure out what was missing in my attempts.

Right now I'm looking at unilover's solution, and I'm a little bit stumped:

I've tried the first part of his solution, and this is what I initially got:

Code:
grep -n -E "begin|end" /tmp/tmp.mail
11:Received-SPF: pass (google.com: domain of gropers@xxx.xxx designates 192.168.64.20 as permitted sender) client-ip=192.168.64.20;
12:Authentication-Results: mx.google.com; spf=pass (google.com: domain of gropers@xxx.xx designates 192.168.64.20 as permitted sender) smtp.mail=gropers@xxx.xxx
35:begin 666 IMG_1232.jpg
585:end
587:begin 666 IMG_1229.jpg
1152:end
1154:begin 666 IMG_1221.jpg
2010:end
2012:begin 666 IMG_1217.jpg
2936:end
2938:begin 666 IMG_1215.jpg
4032:end
4034:begin 666 IMG_1192.jpg
4536:end
4538:begin 666 IMG_1190.jpg
5225:end
5227:begin 666 IMG_1189.jpg
5642:end
5644:begin 666 IMG_1188.jpg
6278:end
6280:begin 666 IMG_1185.jpg
6889:end
6891:begin 666 IMG_1184.jpg
7731:end
7733:begin 666 IMG_1183.jpg
8235:end
8237:begin 666 IMG_1247.jpg
9132:end
9134:begin 666 IMG_1244.jpg
9824:end
9826:begin 666 IMG_1242.jpg
10611:end
10613:begin 666 IMG_1238.jpg
11295:end
11297:begin 666 IMG_1237.jpg
11891:end
11893:begin 666 IMG_1235.jpg
12323:end
12325:begin 666 IMG_1234.jpg
13215:end
13217:begin 666 IMG_1233.jpg
13765:end

Note the output lines 11 and 12, which don't contain "begin" or "end". (Yes, I changed the email addresses and IP addresses, but I did that in /tmp/tmp.mail, in which lines 11 and 12 really don't contain either string.)

It also didn't matter whether I used grep -E or egrep, or single or double quotes.

At long last, I finally figured out what tripped up grep: It turns out that lines 11 and 12 both contained carriage returns (\r). vi confirmed this by showing the familiar ^M characters. I then did :%s/\r//g in vi and saved the file, after which grep worked as expected, and the lines 11 and 12 were no longer included in its output. So I know what made the error occur. What I don't understand is why this error occurred. Why would extraneous carriage returns cause grep to include these lines? Smilie

Many thanks for your help. Smilie
# 6  
Old 04-02-2008
Arrgh!!! Smilie Nevermind the aforesaid, I've just figured things out -- turns out I was wrong when I wrote that the lines 11 and 12 don't contain "begin" or "end". Both lines contain the word "sender".

The carriage returns were a total red herring. I was wrong when I thought that removing them had fixed things. It turns out that when I tested grep after removing them I had only grepped "begin" and not "begin|end". Smilie

So it's probably not a good idea to grep for "end" in this case without throwing away the email headers first. In my initial --only partially successful-- approach this also was unnecessary, because tail lends itself really well to clipping off the upper parts of the file without even looking at them (and uudecode ignores anything boyond the first uuencoded file, so the rest can be left as it).

On a more positive note, I've found that
Code:
 grep -n -E "begin|end" /tmp/tmp.mail | awk -F : '{print $1}' | paste -s -d ",\n" - | sed 's=\(.*\)=sed -n "\1p" /tmp/tmp.mail | uudecode='| sh -

does indeed work -- though it complains:
Code:
uudecode: stdin: No `begin' line

which is entirely understandable, because the command line
Code:
sed -n "11,12p" /tmp/tmp.mail | uudecode

that's generated and passed to sh is bogus, and of course line 11 has no "begin" in it.

I still don't fully understand the nested sed stuff though. I'll try some more and/or come back with more questions. Also, if someone has a hint to get my initial approach with awk and tail to work, that would be really cool. Smilie

But again, many thanks so far. Smilie

Last edited by ropers; 04-02-2008 at 09:51 PM..
# 7  
Old 04-04-2008
I think I've sort of cracked it. I've understood the gist of unilover's solution, and I've managed to incorporate part of his approach into my initial attempted solution. Now I've got a working solution that's based on what I tried initially and on unilover's solution. And look mom, no error messages! Smilie

Here it is:
Code:
grep -n begin /tmp/tmp.mail | awk -F : '{print $1}' | sed 's:\(.*\):tail -n +\1 /tmp/tmp.mail | uudecode:' | sh -

So first grep lists all the lines in /tmp/tmp.mail that contain "begin", and it prefixes the lines it prints with their respecive line numbers in /tmp/tmp.mail (that's what -n does).

Then awk throws away everything except the line numbers.

Then sed replaces each line number with "tail -n +\1 /tmp/tmp.mail | uudecode", where "\1" is substituted with the respective line numbers. Because this string contains slashes ("/"), we're not using the / as a separation character in the subsititute command, we're using an arbitrary other character instead (":" in our case). We could also escape them like so
Code:
sed 's/\(.*\)/tail -n +\1 \/tmp\/tmp.mail | uudecode/'

, but we're too lazy for that.

Then each of the "tail -n +\1 /tmp/tmp.mail | uudecode" strings are passed to sh, so they can be executed instead of being just standard output.

The tail and uuencode commands work as I described in my earlier post.

I guess that leaves me to try and crack Franklin52's solution next. SmilieSmilie

Last edited by ropers; 04-06-2008 at 09:33 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell script to tail a file with unknown numbers

Hello, I would like to write script to tail a file for different environment But the number of lines are keep changing How can I write a script For example: env could : A, B or C and log files could be a.log, b.log and c.log with the number of lines can change say sometimes it 100 last... (9 Replies)
Discussion started by: encrypt_decrypt
9 Replies

2. UNIX for Beginners Questions & Answers

awk Command to add Carriage Return and Line Feed

Hello, Can someone please share a Simple AWK command to append Carriage Return & Line Feed to the end of the file, If the Carriage Return & Line Feed does not exist ! Thanks (16 Replies)
Discussion started by: rosebud123
16 Replies

3. Shell Programming and Scripting

awk issue splitting a fixed-width file containing line feed in data

Hi Forum. I have the following script that splits a large fixed-width file into smaller multiple fixed-width files based on input segment type. The main command in the script is: awk -v search_col_pos=$search_col_pos -v search_str_len=$search_str_len -v segment_type="$segment_type"... (8 Replies)
Discussion started by: pchang
8 Replies

4. Shell Programming and Scripting

awk for replacing line feed

Hello all, I have data like "1"|"My_name"|"My_Email"|"My_Last"|My_other" "2"|"My_name"|"My_Email"|"My_Last"|My_other" "3"|"My_name"|"My_Email"|" "|My_other" "1"|"My_name"|"My_Email"|"My_Last"|My_other" Need output like "1"|"My_name"|"My_Email"|"My_Last"|My_other"... (10 Replies)
Discussion started by: lokaish23
10 Replies

5. Shell Programming and Scripting

Unix, awk to read .ksh feed

I need some advice, I have live feed containing xml messages which means there is new messages every minute. I need a script that will run every 2 hours using the current time minus 2 hours ( which I able to do) However I have problem with the date formatting i.e. One date is... (3 Replies)
Discussion started by: INHF
3 Replies

6. Shell Programming and Scripting

awk remove line feed

Hi, I've this file: 1, 2, 3, 4, 5, 6, I need to remove the line feed LF every 3 row. 1,2,3, 4,5,6, Thanks in advance, Alfredo (5 Replies)
Discussion started by: alfreale
5 Replies

7. Shell Programming and Scripting

AWK / tail Issue

Hello, I am using a tail command to fetch the line before last in a log file. the two last lines are as followed: 11-01-16 11:55:45.174 | CClientObject::InitTraceLevelInCache Starting CClientObject::InitTraceLevelInCache End I am doing a awk statement to gather only the numeric... (1 Reply)
Discussion started by: LiorAmitai
1 Replies

8. Shell Programming and Scripting

Extract URL from RSS Feed in AWK

Hi, I have following data file; <outline title="Matt Cutts" type="rss" version="RSS" xmlUrl="http://www.mattcutts.com/blog/feed/" htmlUrl="http://www.mattcutts.com/blog"/> <outline title="Stone" text="Stone" type="rss" version="RSS" xmlUrl="http://feeds.feedburner.com/STC-Art"... (8 Replies)
Discussion started by: fahdmirza
8 Replies

9. Shell Programming and Scripting

replace last form feed with line feed

Hi I have a file with lots of line feeds and form feeds (page break). Need to replace last occurrence of form feed (created by - echo "\f" ) in the file with line feed. Please advise how can i achieve this. TIA Prvn (5 Replies)
Discussion started by: prvnrk
5 Replies

10. Shell Programming and Scripting

Tail -f, awk and redirection

I want to run tail -f to continuously monitor a log file, outputing a specific field to a second log file. I can get the first portion to work with the following command: tail -f log | awk '{if ($1 == "Rough") print $5}' also: awk '{if ($1 == "Rough") print $5}' <(tail -f log) The... (2 Replies)
Discussion started by: mfajer
2 Replies
Login or Register to Ask a Question