Problem in comparing 2 files string by string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Problem in comparing 2 files string by string
# 8  
Old 08-11-2011
Its almost done friend

Hi Agama,

Its almost done,
now the only issues left are....
1) blanks are not getting printed....
2) and some of the columns that are entirely blank ie: having the values as "all zeros" are not needed.....
3) in some cases the Blank column is also having some values for hoz fields....I want those values also to be added to the vertical column having values 255......I am pasting the O/p for your reference......

###############################

Matrix: 1
COLA/COLB 5065 5294 5672 5673 8059 8505 8508 8549 8587 8600 8632 8649 8706 8709 8725 8739 8748 8778 8798 8822 9008 9404
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 127 1140 1 79 84574 17436 23729 47678 9758 36855 102332 40437 34852 11229 15867 0 26227 36353 120463 1099 2504 10
1 0 0 0 0 148 32 143 80 30 128 207 89 100 16 26 0 78 116 262 4 1 0
3 0 0 0 0 53 52 9 1002 1 233 111 36 357 124 23 0 62 117 1548 2 10 0
17 0 0 0 0 3126 747 2010 3406 600 3579 9757 2633 1966 457 1526 0 3294 2582 14497 299 495 0
19 25 169 0 31 6408 1426 2931 80023 625 4306 17609 4860 3354 1083 2089 0 2818 4125 10834 120 260 0
28 0 0 0 0 0 0 0 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0
34 0 3 0 0 2928 736 860 2130 230 1625 2901 927 2976 842 572 0 625 2186 4527 27 23 0
38 0 0 0 0 2 0 0 26 0 6 2 13 4 1 5 0 2 2 7 1 1 0
131 0 0 0 0 7 0 1 0 0 0 10 6 1 0 1 0 0 5 16 0 4 0
133 0 6 0 21 3367 719 1188 3778 231 1014 3209 1943 1228 492 717 0 1513 1426 4204 113 74 0
137 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
138 0 0 0 0 127 43 98 122 14 36 201 144 131 52 46 0 12 210 485 0 0 0
254 2 31 0 0 5386 928 1584 4428 621 2071 5126 2576 2324 633 1064 0 2022 2798 5221 80 116 0
255 0 4 0 0 1162 366 428 917 202 555 1408 471 782 190 332 0 396 617 1777 9 40 0

Matrix: 2
COLA/COLB 5065 5294 5672 5673 8059 8505 8508 8549 8587 8600 8632 8649 8706 8709 8725 8739 8748 8778 8798 8822 9008 9404
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 4038 0 1309 583 302 0 0 108 228 0 171 30293 208 341 0 11120 5149 0
1 0 0 0 0 275 0 0 2 10 0 0 4 4 0 24 1383 4 16 0 468 305 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 47 0 0
17 0 0 0 0 5 0 0 0 0 0 0 0 7 0 0 11 9 0 0 3 1 0
19 0 0 0 0 1828 0 24 27 31 0 0 0 23 0 13 439 20 61 0 1203 75 0
28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0 0
34 0 0 0 0 87 0 165 0 22 0 0 22 2 0 0 2813 3 23 0 360 115 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
131 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
133 0 0 0 0 825 0 43 75 27 0 0 15 85 0 105 1985 12 242 0 999 364 0
137 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 0 0 0 0 1 0
138 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
254 0 0 0 0 175 0 31 7 8 0 0 4 8 0 4 473 3 10 0 239 85 0
255 0 0 0 0 41 0 9 1 3 0 0 0 3 0 2 138 2 1 0 74 14 0

Matrix: 3
COLA/COLB 5065 5294 5672 5673 8059 8505 8508 8549 8587 8600 8632 8649 8706 8709 8725 8739 8748 8778 8798 8822 9008 9404
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 987 0 0 0 413 3492 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 1 132 0
28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
131 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
133 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
137 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 7 0
138 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
254 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 3 15 0
255 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 1 24 0

Matrix: 5
COLA/COLB 5065 5294 5672 5673 8059 8505 8508 8549 8587 8600 8632 8649 8706 8709 8725 8739 8748 8778 8798 8822 9008 9404
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1772 126 877 1056 256 1413 3749 544 1549 481 152 0 998 746 4304 115 97 0
1 0 0 0 0 0 1 0 1 0 1 5 0 0 2 0 0 1 1 5 0 0 0
3 0 0 0 0 0 0 0 168 0 0 0 0 79 0 0 0 0 0 30 12 0 0
17 0 0 0 0 70 46 2 40 1 1 120 12 147 47 2 0 5 66 185 2 3 0
19 0 0 0 0 71 24 33 44 17 65 122 32 35 19 26 0 42 24 186 0 3 0
28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
34 0 0 0 0 2 0 50 1 0 0 2 0 9 6 1 0 0 0 16 0 0 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
131 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
133 0 0 0 0 107 9 39 22 4 53 112 36 98 21 6 0 37 72 133 2 2 0
137 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
138 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
254 0 0 0 0 88 36 29 34 11 37 107 28 47 11 12 0 29 32 58 2 2 0
255 0 0 0 0 100 14 40 95 7 25 153 27 48 11 4 0 28 51 181 0 3 0

Matrix: 6
COLA/COLB 5065 5294 5672 5673 8059 8505 8508 8549 8587 8600 8632 8649 8706 8709 8725 8739 8748 8778 8798 8822 9008 9404
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 210 0 0 0 0 0 0 0 0 0 0 6463 0 0 0 1025 566 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 62 0 0
28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
34 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 466 0 0 0 40 38 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51 0 0 0 3 5 0
131 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
133 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 245 0 0 0 120 75 0
137 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0
138 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
254 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 103 0 0 0 11 10 0
255 0 0 0 0 24 0 0 0 0 0 0 0 0 0 0 730 0 0 0 81 69 0

###################

Here the above most vertical entry is for "blank" but blank is not getting printed....also for the hoz value 8798, 1 count is in blank,,,I want this also to be added in vertical entry 255 ie: the last entry in vertical column.....
8739 in hoz array is having all entries as zero.....need to remove this.....

################################################
Also dear friend can u just give me some ideas how to learn to write such a nice and effective short length codes,,,,,as I am a newbie and do not have much ideas of C language......any links or pdf's wud be very very highly appreciated.....

Thanx in advance ....
# 9  
Old 08-12-2011
Tweeks to the code to eliminate rows/columns that are all zeros, and maybe help with blanks.
Code:
#!/usr/bin/env ksh

# $1 == c1 == vertical
# $2 == c2 == horizontal
# $3 == matrix id col (midc)
#
awk -F "," -v ct=$ct -v c1=${1:-31} -v c2=${2:-36} -v midc=${3:-49} '
    function sort_order(  what, l,  i, j )
    {
        for( i = 0; i < l; i++ )
        {
            big = 0;
            for( j = 1; j < l-i; j++ )
                if( what[j] > what[big] )
                    big = j;
            if( big != j-1 )
            {
                hold = what[j-1];
                what[j-1] = what[big];
                what[big] = hold;
            }
        }

    }

    {
        gsub( " ", "", $0 );            # if there are truely blanks (, ,) and not nils (,,) then ditch them
        gsub( "\t", "", $0 );

        mid = $(midc);                      # pick up the matrix id
        if( ! mseen[mid]++ )
            order_m[moidx++] = mid;         # capture for print at end

        if( $(c2) == "" )                   # if value in c2 is missing, we dont count
            next;

        if( $(c1) == "" )
            $(c1) = "BLANK";                # easy eycatcher for empty field

        if( !seen_1[$(c1)]++ )
            order_1[o1idx++] = $(c1);       # order each c1 was observed

        if( !seen_2[$(c2)]++ )
            order_2[o2idx++] = $(c2);       # order each c2 was observed

        if( $(c1) == "255" && $13 != "0"  )     # special case: c1 is 255 and $13 is non-zero, count as though c1 was 0
            $(c1) = "0";

        count[mid,$(c1),$(c2)]++;           # count the number of times the pair (c1,c2) appear together
        row_nonz[mid,$(c1)] = 1;            # flag row as having nonzero value for this matrix
        col_nonz[mid,$(c2)] = 1;            # flag col as having nonzero value for this matrix
    }

    function print_matrix( mid,     j, i )
    {
        printf( "%15s ", "COLA/COLB" );         # print the header line using the order_2 list
        for( j=0; j < o2idx; j++ )
            if( col_nonz[mid,order_2[j]] )          # print only if in this matrix we have a nonzero in the column
                printf( "%15s ", order_2[j] );      # %15s will align columns based on a width of 15
        printf( "\n" );                         # new line for first row of matrix

        for( i = 0; i < o1idx; i++ )            # print matrix -- for each row (c1 value)
        {
            if( row_nonz[mid,order_1[i]] )                  # only if there was a non-zero value in the row
            {
                printf( "%15s ", order_1[i] );      # print the c1 value (again width of 15)
                for( j=0; j < o2idx; j++ )          # for each column (c2 values)
                    if( col_nonz[mid,order_2[j]] )  # print only if in this matrix we have a nonzero in the column
                        printf( "%15d ", count[mid,order_1[i],order_2[j]] );    # print each (width 15 again)

                printf( "\n" );                     # end the row by printing a newline
            }
        }
    }

    END {
        sort_order( order_1, length( order_1 ) );       # sort values from c1, and c2
        sort_order( order_2, length( order_2 ) );
        sort_order( order_m, length( order_m ) );

        for( i = 0; i < moidx; i++ )
        {
            printf( "\nMatrix: %s\n", order_m[i] );     # header for the matrix
            print_matrix( order_m[i] );
        }
    }
'


I'm a bit confused as to how blanks should be handled. I've added a small change that might help. If it does not, could you post, or attach, 20 lines or so of your CVS input that has some rows with blanks. Right now, I am assuming that column A (31) may contain a blank value and that should be represented with a row that is labeled "BLANK". I also assume that if column B (36) has a blank, it is to be skipped.


As for learning to write code, my advice is to practice. Start with simple problems (this is certainly not simple!) and go slowly. Look at some of the posts on this site and see what code was offered to solve the problem. Initially try to understand what the programe does, and work towards reading the problem and writing the code yourself -- then comparing it with what was posted.

These are two good sites for helping learn awk:
Awk - A Tutorial and Introduction - by Bruce Barnett
Gawk: Effective AWK Programming - GNU Project - Free Software Foundation (FSF)

The second site has several different "presentations" of the same material.
# 10  
Old 08-12-2011
Thanx again...

Thanx again for your post.....

Ok...I will try it again in my live environment.......and repost asap....

One more thing for each thing I have to try in my live env. sometimes which may be risky......

I have installed "ubuntu" in my windows virtual machine....but it has only bash shell default for ubuntu....

Can I have any such testbed as u r having to test all these codes at home etc...

aside to VM, its features are also not gud.....as I have to scroll everytime I had to take help from the NET....

any suggestion dear friend.....
# 11  
Old 08-20-2011
Thanx a lot....

Thanx dear friend agama....its working perfectly now.....

Also the (#) comments help me a lot in understanding your a million worthy post......

Thanx again....

GR88...

Last edited by jitendra.pat04; 08-20-2011 at 05:53 AM..
# 12  
Old 08-20-2011
Quote:
Originally Posted by jitendra.pat04
One more thing for each thing I have to try in my live env. sometimes which may be risky......

I have installed "ubuntu" in my windows virtual machine....but it has only bash shell default for ubuntu....

Can I have any such testbed as u r having to test all these codes at home etc...
Agree, testing in your live environment is risky, but sometimes unavoidable.

I don't install windows on my computers, just some form of UNIX. I currently have FreeBSD running on a 10 year old laptop, and Linux running on my newer laptop; between these two it gives me the ability to try most things under a couple of environments. If you can find an older (not too old though) PC that someone will give you, or isn't too expensive, then I'd suggest installing Linux directly on it. You'll then have a system that you can experiment with and not have any worry about messing something up.

As for only having bash in your VM.... most scripts will run under both Korn shell and bash. I prefer Korn shell because it has several capabilities that bash doesn't. The script in this thread, while marked ksh in the #! statement, would run just fine under bash as it's all awk.

Glad you've gotten it working!
This User Gave Thanks to agama For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Help comparing string, please

Good morning, I need compare this string. if || || ; then But this line not work, somebody can say me what is the error. Thank you for advanced. (5 Replies)
Discussion started by: systemoper
5 Replies

2. Shell Programming and Scripting

Grep string in files and list file names that contain the string

Hi, I have a list of zipped files. I want to grep for a string in all files and get a list of file names that contain the string. But without unzipping them before that, more like using something like gzcat. My OS is: SunOS test 5.10 Generic_142900-13 sun4u sparc SUNW,SPARC-Enterprise (8 Replies)
Discussion started by: apenkov
8 Replies

3. Shell Programming and Scripting

How to append a string by comparing another string?

Hi , I have one file like BUD,BDL BUDCAR BUD,BDL BUDLAMP ABC,CDF,KLT ABISKAR ABC,CDF,KLT CORNEL ABC,CDF,KLT KANNAD JKL,HNM,KTY,KJY JAGAN JKL,HNM,KTY,KJY HOUSE JKL,HNM,KTY,KJY KATAK JKL,HNM,KTY,KJY KOLKA The o/p should be like BUD,BDL BUDCAR,BUDLAMP ABC,CDF,KLT... (4 Replies)
Discussion started by: jagdishrout
4 Replies

4. Shell Programming and Scripting

grep exact string from files and write to filename when string present in file

I am attempting to grep an exact string from a series of files within a directory and append that output to the filename when it is present in the file. I've been after this all day with no luck. Thanks for your help in advance :wall:. (4 Replies)
Discussion started by: JC_1
4 Replies

5. UNIX for Dummies Questions & Answers

Comparing a String variable with a string literal in a Debian shell script

Hi All, I am trying to to compare a string variable with a string literal inside a loop but keep getting the ./testifstructure.sh: line 6: #!/bin/sh BOOK_LIST="BOOK1 BOOK2" for BOOK in ${BOOK_LIST} do if then echo '1' else echo '2' fi done Please use next... (1 Reply)
Discussion started by: daveu7
1 Replies

6. Shell Programming and Scripting

Parsing a long string string problem for procmail

Hi everyone, I am working on fetchmail + procmail to filter mails and I am having problem with parsing a long line in the body of the email. Could anyone help me construct a reg exp for this string below. It needs to match exactly as this string. GetRyt... (4 Replies)
Discussion started by: cwiggler
4 Replies

7. Shell Programming and Scripting

Problem comparing String using IF stmt

Hi frnds Im facing an issues while trying to compare string using IF stmt, my code is: chkMsgName=`Service Fee Detail` if then if then if then echo "Valid File Ready for processing" fi fi ... (5 Replies)
Discussion started by: balesh
5 Replies

8. Shell Programming and Scripting

problem in comparing numeric with string

Hi all, I am having a problem in comparing numeric value with string. I have a variable in my script which gets the value dynamically. It can be a numeric value or a string. I have to do separate task based on its value numeric or sting variable VARIABLE. I grep FILE_COUNT and obtained... (7 Replies)
Discussion started by: naren_0101bits
7 Replies

9. Shell Programming and Scripting

Extracting a string from one file and searching the same string in other files

Hi, Need to extract a string from one file and search the same in other files. Ex: I have file1 of hundred lines with no delimiters not even space. I have 3 more files. I should get 1 to 10 characters say substring from each line of file1 and search that string in rest of the files and get... (1 Reply)
Discussion started by: mohancrr
1 Replies

10. Shell Programming and Scripting

sed problem - replacement string should be same length as matching string.

Hi guys, I hope you can help me with my problem. I have a text file that contains lines like this: 78 ANGELO -809.05 79 ANGELO2 -5,000.06 I need to find all occurences of amounts that are negative and replace them with x's 78 ANGELO xxxxxxx 79... (4 Replies)
Discussion started by: amangeles
4 Replies
Login or Register to Ask a Question