Challenging Compare and validate question -- plus speed.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Challenging Compare and validate question -- plus speed.
# 15  
Old 05-23-2006
if you can assume your Metadata file as an 'authoritative' source of metta data definition AND your 'detailedData' file that can vary....
Code:
#!/usr/bin/ksh

nawk '
  FNR==NR{
     detail[$2]
     next
  }
  {
     for( i in detail)
       if ( substr($2, 1, length(i)) == i ) {
          printf("Meta [%s] found in Detail-- Succefull\n",  $2)
          next
       }
     printf("Meta [%s] NOT found in Detail-- Failed\n",  $2)
     _ex=1
  }
  END { exit(_ex)}' DetailFile.txt MetadataFile.txt

# 16  
Old 05-24-2006
Thank you vgersh99...

I was working with the script since couple of days and it was working fine...

A new file came in today and the script could not abort. The reason is

Metadata Records has:
PHP Code:
Metadata partner name [ORBITZfound in Detail-- Successful
Metadata partner name 
[AIRTRANfound in Detail-- Successful
Metadata partner name 
[FRONTIERfound in Detail-- Successful
Metadata partner name 
[BESTWESTERNfound in Detail-- Successful 
But the Detail Records has:
PHP Code:
ORBITZ
AIRTRAN
FRONTIER
BESTWESTERN
MIDWEST 
There were additional records for MIDWEST. Is there any way that the script can be modified to accomodate this enhancement?

If not present in Metadata records, but present in Detail -- the script should abort..


Please advice...
# 17  
Old 05-24-2006
Quote:
Originally Posted by madhunk
Thank you vgersh99...

I was working with the script since couple of days and it was working fine...

A new file came in today and the script could not abort. The reason is

Metadata Records has:
PHP Code:
Metadata partner name [ORBITZfound in Detail-- Successful
Metadata partner name 
[AIRTRANfound in Detail-- Successful
Metadata partner name 
[FRONTIERfound in Detail-- Successful
Metadata partner name 
[BESTWESTERNfound in Detail-- Successful 
But the Detail Records has:
PHP Code:
ORBITZ
AIRTRAN
FRONTIER
BESTWESTERN
MIDWEST 
There were additional records for MIDWEST. Is there any way that the script can be modified to accomodate this enhancement?
OK, but there were no METAdata record for 'MIDWEST'. The task was: find ONLY the METAdata records for which there was a corresponding record in the DETAIL file.
Quote:
Originally Posted by madhunk
If not present in Metadata records, but present in Detail -- the script should abort..


Please advice...
I don't understand what you're asking.....
I suggest you take the most recent version of what's been implemented already, try to understand it and figure out how to adjust it based on your vaying input data patterns.

Last edited by vgersh99; 05-24-2006 at 04:19 PM..
# 18  
Old 05-25-2006
I did play with the script and tried to change it...

In your script before, it compares the metadata file with the detail file.

There was a change in the requirement and I wanted to use the detail file as the standard and compare it with the metadata file. I did change the order of the files when calling the script.

Somehow something is going wrong....
# 19  
Old 05-25-2006
Quote:
Originally Posted by madhunk
I did play with the script and tried to change it...

In your script before, it compares the metadata file with the detail file.

There was a change in the requirement and I wanted to use the detail file as the standard and compare it with the metadata file. I did change the order of the files when calling the script.

Somehow something is going wrong....
So the 'new' requirement is: if $2 in the 'detail' file does NOT appear as $2 in the 'metadata' file - then abort?
# 20  
Old 05-25-2006
Yes....I tried to switch the order of the files in calling -- like this

Code:
nawk '
  FNR==NR{
     detail[$2]
     next
  }
  {
     if ( $2 in detail)
        printf("Metadata partner name [%s] found in Detail-- Successful\n",  $2)
     else {
        printf("Metadata partner name [%s] NOT found in Detail-- Failed\n",  $2)
        _ex=1
     }
  }
  END { exit(_ex)}' ${METADATA_FILE} ${DETAIL_FILE}

The problem is: I only have 5 records in Metadata file but I have 13 Million in the Detail file.

If $2 is there in the Detail file but not in the Metadata file, then I am getting this huge output of all the records.
# 21  
Old 05-25-2006
Quote:
Originally Posted by madhunk
Yes....I tried to switch the order of the files in calling -- like this

Code:
nawk '
  FNR==NR{
     detail[$2]
     next
  }
  {
     if ( $2 in detail)
        printf("Metadata partner name [%s] found in Detail-- Successful\n",  $2)
     else {
        printf("Metadata partner name [%s] NOT found in Detail-- Failed\n",  $2)
        _ex=1
     }
  }
  END { exit(_ex)}' ${METADATA_FILE} ${DETAIL_FILE}

The problem is: I only have 5 records in Metadata file but I have 13 Million in the Detail file.

If $2 is there in the Detail file but not in the Metadata file, then I am getting this huge output of all the records.
Well it sounds like.... well.... everything is good - everything in META appears in Detail.

Are you sure you have any mismatches?
I'd suggest creating the a small sample 'mis-matched' pair of files and running the script.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to compare files and validate order of headers

The below awk verifies the count and order of each text file in the directory. The script does execute and produce output, however the order of the headers are not compared to key. The portion in bold is supposed to do that. If the order of the headers in each text file is the same as key, then... (0 Replies)
Discussion started by: cmccabe
0 Replies

2. Programming

Generic speed question (PHP vs.)

Hi, On a hosted linux environment which I have very little control over, I have a PHP script that takes in X number of floats, performs Y number of simple recursive arithmetic calculations, and produces some output for display to the user. When I first created the script, 'X' and 'Y' were... (4 Replies)
Discussion started by: MoreCowbell
4 Replies

3. Shell Programming and Scripting

Challenging scenario

Hi, My input file contains 1,2 2,4 3,6 4,9 9,10 My expected output is 1,10 2,10 3,6 4,1 9,10 (6 Replies)
Discussion started by: pandeesh
6 Replies

4. Shell Programming and Scripting

Another validate input Question.

I'm writing a bash shell script to 'help' me post to susepaste (I can NEVER remember the time options). Here's the code: #!/bin/bash ########## # # Project : personal script. # Started : Wed Aug 03, 2011 # Author : Habitual # Description : susepaste c-li script with user... (5 Replies)
Discussion started by: Habitual
5 Replies

5. Filesystems, Disks and Memory

data from blktrace: read speed V.S. write speed

I analysed disk performance with blktrace and get some data: read: 8,3 4 2141 2.882115217 3342 Q R 195732187 + 32 8,3 4 2142 2.882116411 3342 G R 195732187 + 32 8,3 4 2144 2.882117647 3342 I R 195732187 + 32 8,3 4 2145 ... (1 Reply)
Discussion started by: W.C.C
1 Replies

6. Shell Programming and Scripting

Need help with this challenging code....

Hello All, I am new to this forum, and the reason I came here is to seek solution from the experts. I have written following wrapper script, it was running fine from past couple of months, until last week. When one of the function in the script which suppose to login through ssh to the... (2 Replies)
Discussion started by: tajdar
2 Replies

7. Shell Programming and Scripting

Compare files question

Hi all, How do I compare contents of entire two files except for the first line is each of them? I am sure first lines from both my files are going to be different so I want to ignore them. Is there a easier way than creating temporary files by cutting out the first line and then comparing... (1 Reply)
Discussion started by: jakSun8
1 Replies

8. Shell Programming and Scripting

Challenging!! Help needed

Hi, I have a script xyz.ksh which accpets two parameters the format of first one is :X_TABLENAME_Y and second one is a digit. I can extract a table name from that parameter and store it in a variable var_tblnm, so if i pass a parameter X_TABLE1_Y the value in var_tblenm is "TABLE1" now i have... (1 Reply)
Discussion started by: hcdiss
1 Replies

9. Filesystems, Disks and Memory

dmidecode, RAM speed = "Current Speed: Unknown"

Hello, I have a Supermicro server with a P4SCI mother board running Debian Sarge 3.1. This is the "dmidecode" output related to RAM info: RAM speed information is incomplete.. "Current Speed: Unknown", is there anyway/soft to get the speed of installed RAM modules? thanks!! Regards :)... (0 Replies)
Discussion started by: Santi
0 Replies

10. UNIX for Advanced & Expert Users

Very Challenging Question! Need help bad!

I am in desperate need of an answer to this question. I have looked everywhere (even the man pages) and found very little. Solaris has the concept of "plumbing" a network interface. What does this mean? I would be really greatful to whoever could help me answer this question. I am so... (1 Reply)
Discussion started by: Sparticus007
1 Replies
Login or Register to Ask a Question