Sponsored Content
Top Forums Shell Programming and Scripting Problem in comparing 2 files string by string Post 302545864 by agama on Tuesday 9th of August 2011 09:43:52 PM
Old 08-09-2011
Here's the code with comments in hopes that you can figure out what is going on. I added a sort (some awk versions don't have a built-in sort function, so I wrote one that shouldn't be too inefficient based on the number of rows/columns you've indicated.

Code:
#!/usr/bin/env ksh

awk -F "," -v c1=${1:-1} -v c2=${2:-2} '
    function sort_order(  what, l,       i, j )
    {
        for( i = 0; i < l; i++ )        # big pass across array
        {
            big = 0;                      # set first entry as the largest value
            for( j = 1; j < l-i; j++ )       # small pass (up to length - i)
                if( what[j] > what[big] )  # if this value larger, mark it 
                    big = j;
            if( big != j-1 )              # if largest is not the last checked
            {
                hold = what[j-1];      # swap the last for the largest
                what[j-1] = what[big];
                what[big] = hold;
            }
        }

    }

    {
        if( $(c2) == "" )                   # if value in c2 is missing, we dont count
            next;

        if( $(c1) == "" )
            $(c1) = "BLANK";                # easy eycatcher for empty field

        if( !seen_1[$(c1)]++ )
            order_1[o1idx++] = $(c1);       # order each c1 was observed

        if( !seen_2[$(c2)]++ )
            order_2[o2idx++] = $(c2);       # order each c2 was observed

        count[$(c1),$(c2)]++;               # count the number of times the pair (c1,c2) appear together
    }

    END {
        sort_order( order_1, length( order_1 ) );       # sort values from c1, and c2
        sort_order( order_2, length( order_2 ) );

        printf( "%15s ", "COLA/COLB" );         # print the header line using the order_2 list
        for( j=0; j < o2idx; j++ )
            printf( "%15s ", order_2[j] );      # %15s will align columns based on a width of 15
        printf( "\n" );                         # new line for first row of matrix

        for( i = 0; i < o1idx; i++ )            # print matrix -- for each row (c1 value)
        {
            printf( "%15s ", order_1[i] );      # print the c1 value (again width of 15)
            for( j=0; j < o2idx; j++ )          # for each column (c2 values)
                printf( "%15d ", count[order_1[i],order_2[j]] );    # print each (width 15 again)

            printf( "\n" );                     # end the row by printing a newline
        }
    }
'

exit

As for it not lining up correctly, that seems odd. Each field is printed with a width of 15 characters (%15d or %15s) and provided that none of the data is more than 15 characters wide it should space out nicely. You might need to reduce the setting (from 15 to 8 or something like that) in order to suit your needs. The programme also prints the word "BLANK" for empty data (in column A) and does not count anything if column B is missing. I assumed that missing columns were adjacent comas (,,).

For testing I ran this small set of data:

Code:
6000,1000,100,5000,3000,100,1000,1000,2000,4000
2000,7000,4000,,8000,9000,3000,100,1000,2000
4000,6000,6000,9000,100,7000,8000,7000,9000,8000
4000,4000,5000,,8000,1000,6000,6000,5000,9000
9000,5000,8000,100,6000,4000,8000,8000,9000,7000
7000,1000,100,7000,8000,4000,6000,2000,5000,6000
100,5000,6000,3000,5000,5000,100,4000,4000,6000
7000,9000,4000,6000,4000,8000,8000,6000,6000,9000
9000,1000,4000,,5000,1000,1000,2000,9000,8000
7000,1000,5000,1,5000,3000,100,3000,4000,1000

to ensure sorting and blanks worked. With the command test_script 4 5 <test_data the following was the output:

Code:
      COLA/COLB             100            3000            4000            5000            6000            8000 
              1               0               0               0               1               0               0 
            100               0               0               0               0               1               0 
           3000               0               0               0               1               0               0 
           5000               0               1               0               0               0               0 
           6000               0               0               1               0               0               0 
           7000               0               0               0               0               0               1 
           9000               1               0               0               0               0               0 
          BLANK               0               0               0               1               0               2

The columns are 15 characters wide, and the BLANK value will sort out last if the data in the input is all numeric. You can replace "BLANK" with " " if you just want a space in the output. Unfortunately, I cannot say what might be happening if your columns are misaligned as it works for me in my test environment.

Hope this has been of some value.
This User Gave Thanks to agama For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed problem - replacement string should be same length as matching string.

Hi guys, I hope you can help me with my problem. I have a text file that contains lines like this: 78 ANGELO -809.05 79 ANGELO2 -5,000.06 I need to find all occurences of amounts that are negative and replace them with x's 78 ANGELO xxxxxxx 79... (4 Replies)
Discussion started by: amangeles
4 Replies

2. Shell Programming and Scripting

Extracting a string from one file and searching the same string in other files

Hi, Need to extract a string from one file and search the same in other files. Ex: I have file1 of hundred lines with no delimiters not even space. I have 3 more files. I should get 1 to 10 characters say substring from each line of file1 and search that string in rest of the files and get... (1 Reply)
Discussion started by: mohancrr
1 Replies

3. Shell Programming and Scripting

problem in comparing numeric with string

Hi all, I am having a problem in comparing numeric value with string. I have a variable in my script which gets the value dynamically. It can be a numeric value or a string. I have to do separate task based on its value numeric or sting variable VARIABLE. I grep FILE_COUNT and obtained... (7 Replies)
Discussion started by: naren_0101bits
7 Replies

4. Shell Programming and Scripting

Problem comparing String using IF stmt

Hi frnds Im facing an issues while trying to compare string using IF stmt, my code is: chkMsgName=`Service Fee Detail` if then if then if then echo "Valid File Ready for processing" fi fi ... (5 Replies)
Discussion started by: balesh
5 Replies

5. Shell Programming and Scripting

Parsing a long string string problem for procmail

Hi everyone, I am working on fetchmail + procmail to filter mails and I am having problem with parsing a long line in the body of the email. Could anyone help me construct a reg exp for this string below. It needs to match exactly as this string. GetRyt... (4 Replies)
Discussion started by: cwiggler
4 Replies

6. UNIX for Dummies Questions & Answers

Comparing a String variable with a string literal in a Debian shell script

Hi All, I am trying to to compare a string variable with a string literal inside a loop but keep getting the ./testifstructure.sh: line 6: #!/bin/sh BOOK_LIST="BOOK1 BOOK2" for BOOK in ${BOOK_LIST} do if then echo '1' else echo '2' fi done Please use next... (1 Reply)
Discussion started by: daveu7
1 Replies

7. Shell Programming and Scripting

grep exact string from files and write to filename when string present in file

I am attempting to grep an exact string from a series of files within a directory and append that output to the filename when it is present in the file. I've been after this all day with no luck. Thanks for your help in advance :wall:. (4 Replies)
Discussion started by: JC_1
4 Replies

8. Shell Programming and Scripting

How to append a string by comparing another string?

Hi , I have one file like BUD,BDL BUDCAR BUD,BDL BUDLAMP ABC,CDF,KLT ABISKAR ABC,CDF,KLT CORNEL ABC,CDF,KLT KANNAD JKL,HNM,KTY,KJY JAGAN JKL,HNM,KTY,KJY HOUSE JKL,HNM,KTY,KJY KATAK JKL,HNM,KTY,KJY KOLKA The o/p should be like BUD,BDL BUDCAR,BUDLAMP ABC,CDF,KLT... (4 Replies)
Discussion started by: jagdishrout
4 Replies

9. Shell Programming and Scripting

Grep string in files and list file names that contain the string

Hi, I have a list of zipped files. I want to grep for a string in all files and get a list of file names that contain the string. But without unzipping them before that, more like using something like gzcat. My OS is: SunOS test 5.10 Generic_142900-13 sun4u sparc SUNW,SPARC-Enterprise (8 Replies)
Discussion started by: apenkov
8 Replies

10. UNIX for Advanced & Expert Users

Help comparing string, please

Good morning, I need compare this string. if || || ; then But this line not work, somebody can say me what is the error. Thank you for advanced. (5 Replies)
Discussion started by: systemoper
5 Replies
All times are GMT -4. The time now is 10:38 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy