awk: Print fields between two delimiters on separate lines and send to variables Post: 302686047

Sponsored Content

Top Forums Shell Programming and Scripting awk: Print fields between two delimiters on separate lines and send to variables Post 302686047 by tay9000 on Tuesday 14th of August 2012 05:23:36 AM

08-14-2012

Registered User

Thanks a lot for your help. I've learned a bit. I've basically got the script completely working as below (still need to clean it up a little). The only big problem I have right now is I would like to loop on output from find "/home/tay/spam-all/spam" -iname "*.gz" -mtime 1 | xargs ls -t instead of the entire directory. Keep in mind, I want to loop on them in order by the file creation date with the newest ending up at the top of the file.

I've tried for FILE in `find "/home/tay/spam-all/spam" -iname "*.gz" -mtime 1 | xargs ls -t` and for FILE in $Spams but the script treats the output as one filename and then errors out saying the filename is too long.

The other minor issue is that I need to add <table> and </table> to the beginning and ends of the outputted files. I am thinking about having the script, after it is done writing all of the files, go through each one and add the tags to the beginning and end. I am sure I can figure out some sort of way to do that but haven't gotten there yet.

One last thing! I would like to shorten the $Subject to 70 characters. I will probably end up using sed for that unless you suggest a better way.

Thanks a lot!

Code:

#!/bin/sh

SpamDir='/home/tay/spam-all/spam'
WorkingDir='/tmp/spam-summary'
Spams=`find "/home/tay/spam-all/spam" -iname "*.gz" -mtime 1 | xargs ls -t`

IFS=":"

# Loop on the files directly, instead of doing loops on line numbers
for FILE in ${SpamDir}/*.gz
do
        # Clear out variables
        From=
        To=
        Subject=
        Score=

        # Changed this to get the basename of the file.
        ID=`ls $FILE | xargs -n1 basename`

        # Your time functions look okay
        TimeEpoch=`ls -lh -D %s "$FILE" | awk '{print $6}'`
        TimeHuman=`date -r $TimeEpoch +"%Y-%m-%d %l:%M %p"`

        # Decompress file once instead of 9 times
        zcat "$FILE" > /tmp/$$

        # Read and process lines from the decompressed file one by one
        while read LINE
        do
                IFS=":" # Split on : so $1=X-Envelope-From, $2=<spammer@vnyu.com>
                set -- $LINE
                # If line has a : in it, save the header, then get rid of $1
                if [ "$#" -gt 1 ]
                then
                        HEADER="$1"
                        shift
                fi

                # Split on spaces, commas, and <>
                IFS="<>, "
                # Split <spammer@vnyu.com>, <whatever@...> into $1=spammer@vnyu.com, $2=whatever@..., etc
                set -- $1

                case "$HEADER" in
                X-Envelope-From) From=`echo "$From $@" | sed 's/<//g'`;;
                X-Envelope-To)     To=`echo "$To $@" | sed 's/<//g;s/[	]//g'`;;
                Subject)              Subject=`echo "$Subject $@" | sed 's/<//g'`;;
                X-Spam-Score)     Score="$@" ;;
                esac
        done < /tmp/$$

echo "$From	$Subject	$Score	$TimeHuman	$ID	$To" #Debug Output

set -- $To

for i in $To
do
echo "<tr><td>$From</td><td>$Subject</td><td><td>$Score</td><td>$TimeHuman</td><td>$ID</td></tr>" >> $WorkingDir/$i
done

done

rm -f /tmp/$$

---------- Post updated 2012-08-14 at 02:23 AM ---------- Previous update was 2012-08-13 at 08:49 PM ----------

Okay I got 2/3 down. For the shortening of the subject line, I updated the following line to:

Code:

Subject)              Subject=`echo "$Subject $@" | sed 's/<//g' | cut -c -70`;;

For adding the <table> to beginning and </table> the end of the script now looks like this. Probably better ways accomplish this but it works! =P

Code:

        for i in $To
        do
                echo "<tr><td>$From</td><td>$Subject</td><td><td>$Score</td><td>$TimeHuman</td><td>$ID</td></tr>" >> $WorkingDir/$i
        done

done

        for SUMMARY in $WorkingDir/*.com
        do
                text="<table>"; exec 3<> $SUMMARY && awk -v TEXT="$text" 'BEGIN {print TEXT}{print}' $SUMMARY >&3
                echo '</table>' >> $SUMMARY
        done
rm -f /tmp/$$

Last edited by tay9000; 08-14-2012 at 12:59 AM..

tay9000

View Public Profile for tay9000

Find all posts by tay9000

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

trying to print selected fields of selected lines by AWK

I am trying to print 1st, 2nd, 13th and 14th fields of a file of line numbers from 29 to 10029. I dont know how to put this in one code. Currently I am removing the selected lines by awk 'NR==29,NR==10029' File1 > File2 and then doing awk '{print $1, $2, $13, $14}' File2 > File3 Can...

2. Shell Programming and Scripting

extract nth line of all files and print in output file on separate lines.

Hello UNIX experts, I have 124 text files in a directory. I want to extract the 45678th line of all the files sequentialy by file names. The extracted lines should be printed in the output file on seperate lines. e.g. The input Files are one.txt, two.txt, three.txt, four.txt The cat of four...

3. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the...

4. Shell Programming and Scripting

awk print header as text from separate file with getline

I would like to print the output beginning with a header from a seperate file like this: awk 'BEGIN{FS="_";print ((getline < "header.txt")>0)} { if (! ($0 ~ /EL/ ) print }" input.txtWhat am i doing wrong?

5. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Hello everyone, Maybe somebody could help me with an awk script. I have this input (field separator is comma ","): 547894982,M|N|J,U|Q|P,98,101,0,1,1 234900027,M|N|J,U|Q|P,98,101,0,1,1 234900023,M|N|J,U|Q|P,98,54,3,1,1 234900028,M|H|J,S|Q|P,98,101,0,1,1 234900030,M|N|J,U|F|P,98,101,0,1,1...

6. Shell Programming and Scripting

How to print 1st field and last 2 fields together and the rest of the fields after it using awk?

Hi experts, I need to print the first field first then last two fields should come next and then i need to print rest of the fields. Input : a1,abc,jsd,fhf,fkk,b1,b2 a2,acb,dfg,ghj,b3,c4 a3,djf,wdjg,fkg,dff,ggk,d4,d5 Expected output: a1,b1,b2,abc,jsd,fhf,fkk...

7. Shell Programming and Scripting

awk sort based on difference of fields and print all fields

Hi I have a file as below <field1> <field2> <field3> ... <field_num1> <field_num2> Trying to sort based on difference of <field_num1> and <field_num2> in desceding order and print all fields. I tried this and it doesn't sort on the difference field .. Appreciate your help. cat...

8. UNIX for Beginners Questions & Answers

How to count lines of CSV file where 2 fields match variables?

I'm trying to use awk to count the occurrences of two matching fields of a CSV file. For instance, for data that looks like this... Joe,Blue,Yes,No,High Mike,Blue,Yes,Yes,Low Joe,Red,No,No,Low Joe,Red,Yes,Yes,Low I've been trying to use code like this... countvar=`awk ' $2~/$color/...

9. Shell Programming and Scripting

awk to print line is values between two fields in separate file

I am trying to use awk to find all the $3 values in file2 that are between $2 and $3 in file1. If a value in $3 of file2 is between the file1 fields then it is printed along with the $6 value in file1. Both file1 and file2 are tab-delimited as well as the desired output. If there is nothing to...

10. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

In the awk below I am trying to print the entire line, along with the header row, if $2 is SNV or MNV or INDEL. If that condition is met or is true, and $3 is less than or equal to 0.05, then in $7 the sub pattern :GMAF= is found and the value after the = sign is checked. If that value is less than...

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

trying to print selected fields of selected lines by AWK

Discussion started by: ananyob

2. Shell Programming and Scripting

extract nth line of all files and print in output file on separate lines.

Discussion started by: yogeshkumkar

3. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Discussion started by: rocket_dog

4. Shell Programming and Scripting

awk print header as text from separate file with getline

Discussion started by: sdf

5. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Discussion started by: Ophiuchus

6. Shell Programming and Scripting

How to print 1st field and last 2 fields together and the rest of the fields after it using awk?

Discussion started by: 100bees

7. Shell Programming and Scripting

awk sort based on difference of fields and print all fields

Discussion started by: newstart

8. UNIX for Beginners Questions & Answers

How to count lines of CSV file where 2 fields match variables?

Discussion started by: nmoore2843

9. Shell Programming and Scripting

awk to print line is values between two fields in separate file

Discussion started by: cmccabe

10. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

Discussion started by: cmccabe