awk: Print fields between two delimiters on separate lines and send to variables Post: 302685015

Sponsored Content

Top Forums Shell Programming and Scripting awk: Print fields between two delimiters on separate lines and send to variables Post 302685015 by Corona688 on Friday 10th of August 2012 02:01:49 PM

08-10-2012

Registered User

I'm not sure how writing random variable names into a file is going to help you figure out which variable names to use later, either.

Other problems with this script that I've spotted on first blush include...

Why run zcat | head 999 times, to read 999 lines? The shell is capable of reading lines one by one with read in a while loop.

Why read lines to get filenames, you can just do a loop over *.gz very easily.

Processing single lines with awk is like using an orbiting laser weapon to light a campfire, an awful lot of effort and expense to accomplish something simple. awk is meant to process thousands of lines at a go.

If you find yourself using awk | sed | grep, you might as well just use awk. awk is a power-tool which can accomplish all three in one operation, not a glorified cut.

And all of what you're doing here can be done in a basic shell without externals. Especially useful is set, which can be used to set your $1 $2 ... variables, like so:

Code:

set -- a b c
echo $1 # should print a
echo $2 # should print b

IFS=":." # Will split on one or more of any of these characters.
VAR="1:2.3:.4:.:5"
set -- $VAR

echo $1 # Should print 1
echo $2 # should print 2

I'd try reducing the script to something like this:

Code:

#!/bin/sh

SpamDir='/home/tay/spam'
WorkingDir='/tmp/spam-summary'

IFS=":"

# Loop on the files directly, instead of doing loops on line numbers
for FILE in ${SpamDir}/*.gz
do
        # Clear out variables
        From=
        To=
        Subject=
        Score=

        # ? What is column 9 on your ls -l ?
        ID=`ls -lh $Mail | awk '{print $9}'`

        # Your time functions look okay
        TimeEpoch=`ls -lh -D %s "$FILE" | awk '{print $6}'`
        TimeHuman=`date -r $TimeEpoch +"%Y-%m-%d %l:%M %p"`

        # Decompress file once instead of 9 times
        zcat "$FILE" > /tmp/$$

        # Read and process lines from the decompressed file one by one
        while read LINE
        do
                IFS=":" # Split on : so $1=X-Envelope-From, $2=<spammer@vnyu.com>
                set -- $LINE
                # If line has a : in it, save the header, then get rid of $1
                if [ "$#" -gt 1 ]
                then
                        HEADER="$1"
                        shift
                fi

                # Split on spaces, commas, and <>
                IFS="<>, "
                # Split <spammer@vnyu.com>, <whatever@...> into $1=spammer@vnyu.com, $2=whatever@..., etc
                set -- $1

                case "$HEADER" in
                X-Envelope-From) From="$From $@" ;;
                X-Envelope-To)     To="$To $@" ;;
                Subject)              Subject="$@" ;;
                X-Spam-Score)     Score="$@" ;;
                esac
        done < /tmp/$$

        echo
        echo "Processed $FILE"
        echo "To:$To"
        echo "From:$From"
        echo "Subject:$Subject"
        echo "Score:$Score"
done

rm -f /tmp/$$

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

trying to print selected fields of selected lines by AWK

I am trying to print 1st, 2nd, 13th and 14th fields of a file of line numbers from 29 to 10029. I dont know how to put this in one code. Currently I am removing the selected lines by awk 'NR==29,NR==10029' File1 > File2 and then doing awk '{print $1, $2, $13, $14}' File2 > File3 Can...

2. Shell Programming and Scripting

extract nth line of all files and print in output file on separate lines.

Hello UNIX experts, I have 124 text files in a directory. I want to extract the 45678th line of all the files sequentialy by file names. The extracted lines should be printed in the output file on seperate lines. e.g. The input Files are one.txt, two.txt, three.txt, four.txt The cat of four...

3. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the...

4. Shell Programming and Scripting

awk print header as text from separate file with getline

I would like to print the output beginning with a header from a seperate file like this: awk 'BEGIN{FS="_";print ((getline < "header.txt")>0)} { if (! ($0 ~ /EL/ ) print }" input.txtWhat am i doing wrong?

5. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Hello everyone, Maybe somebody could help me with an awk script. I have this input (field separator is comma ","): 547894982,M|N|J,U|Q|P,98,101,0,1,1 234900027,M|N|J,U|Q|P,98,101,0,1,1 234900023,M|N|J,U|Q|P,98,54,3,1,1 234900028,M|H|J,S|Q|P,98,101,0,1,1 234900030,M|N|J,U|F|P,98,101,0,1,1...

6. Shell Programming and Scripting

How to print 1st field and last 2 fields together and the rest of the fields after it using awk?

Hi experts, I need to print the first field first then last two fields should come next and then i need to print rest of the fields. Input : a1,abc,jsd,fhf,fkk,b1,b2 a2,acb,dfg,ghj,b3,c4 a3,djf,wdjg,fkg,dff,ggk,d4,d5 Expected output: a1,b1,b2,abc,jsd,fhf,fkk...

7. Shell Programming and Scripting

awk sort based on difference of fields and print all fields

Hi I have a file as below <field1> <field2> <field3> ... <field_num1> <field_num2> Trying to sort based on difference of <field_num1> and <field_num2> in desceding order and print all fields. I tried this and it doesn't sort on the difference field .. Appreciate your help. cat...

8. UNIX for Beginners Questions & Answers

How to count lines of CSV file where 2 fields match variables?

I'm trying to use awk to count the occurrences of two matching fields of a CSV file. For instance, for data that looks like this... Joe,Blue,Yes,No,High Mike,Blue,Yes,Yes,Low Joe,Red,No,No,Low Joe,Red,Yes,Yes,Low I've been trying to use code like this... countvar=`awk ' $2~/$color/...

9. Shell Programming and Scripting

awk to print line is values between two fields in separate file

I am trying to use awk to find all the $3 values in file2 that are between $2 and $3 in file1. If a value in $3 of file2 is between the file1 fields then it is printed along with the $6 value in file1. Both file1 and file2 are tab-delimited as well as the desired output. If there is nothing to...

10. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

In the awk below I am trying to print the entire line, along with the header row, if $2 is SNV or MNV or INDEL. If that condition is met or is true, and $3 is less than or equal to 0.05, then in $7 the sub pattern :GMAF= is found and the value after the = sign is checked. If that value is less than...