I am planning to automate the comparison of data between few tables in 2 different databases ( Teradata and sql server).
Below is the approach which I think of. Please suggest any improvements/Modification :
1) The sql server file is having more records and I have to eliminate the duplicates based on the unique key and select only few sample records for comparison. The file is pipe delimited. I am thinking of using the sort -u command to eliminate the duplicates and then will select few sample records based on the unique key.
2) Fetch the unique key from the ssql file into a new input file and then using teradata BTEQ utility will export the records corresponding to the unique key in ssql file.
3) use comp command to do column wise comparison for the table. Is there any way I can highlight the columns not matching, like in excel we can compare the columns and hightlight the values having "FALSE" result.
4) This needs to be done for around 10 tables and the structure of the tables is similar in both the databases.
I don't see how you get rid of all duplicates by selecting on a few samples???
Are you trying to compare databases? Or, are you trying to compare text files that have been extracted from those databases?
You say the sql server file is larger. How many records are in each database? How many fields are in each database?
If you extracted each entire database into a pipe delimited text file, how many records (lines) would appear in each file and how many characters would be in the longest line in each file?
Please show us a sanitized sample of the records you want to compare and show us the output you want to produce from those input samples? (Please use CODE tags when you post the sample input and output files.)
I don't see how you get rid of all duplicates by selecting on a few samples???
- The duplicates will be eliminated first and then sample records will be selected.
Are you trying to compare databases? Or, are you trying to compare text files that have been extracted from those databases?
- text files extracted from databases(around 10 tables)
You say the sql server file is larger. How many records are in each database? How many fields are in each database?
- each of the 10 tables have different structure. On an average around 10-15 records and I would select a sample of around 1000 records from each file.
If you extracted each entire database into a pipe delimited text file, how many records (lines) would appear in each file and how many characters would be in the longest line in each file?
- the records in each file would vary from around 10K - 10L records.Sample would be around 1000 records and each file would have somewhere around 1000 characters
Please show us a sanitized sample of the records you want to compare and show us the output you want to produce from those input samples? (Please use CODE tags when you post the sample input and output files.)
- I can't provide the print of records due to the company policy but it would be something as below :
I don't see how you get rid of all duplicates by selecting on a few samples???
- The duplicates will be eliminated first and then sample records will be selected.
Are you trying to compare databases? Or, are you trying to compare text files that have been extracted from those databases?
- text files extracted from databases(around 10 tables)
You say the sql server file is larger. How many records are in each database? How many fields are in each database?
- each of the 10 tables have different structure. On an average around 10-15 records and I would select a sample of around 1000 records from each file.
OK. Each file contains 10 to 15 records and out of those 15 records you want to select 1000??????
Quote:
Originally Posted by Rahul Raj
If you extracted each entire database into a pipe delimited text file, how many records (lines) would appear in each file and how many characters would be in the longest line in each file?
- the records in each file would vary from around 10K - 10L records.Sample would be around 1000 records and each file would have somewhere around 1000 characters
10K usually means 10,000 or 10,240; I have no idea what 10L means.
And, if a sample file contains ~1,000 lines and the total file size is 1,000 characters; each line consists entirely of a <newline> line terminator and there is no data to compare??????
Quote:
Originally Posted by Rahul Raj
Please show us a sanitized sample of the records you want to compare and show us the output you want to produce from those input samples? (Please use CODE tags when you post the sample input and output files.)
- I can't provide the print of records due to the company policy but it would be something as below :
Please show us some data with actual values similar to what would appear in your files and the actual results that should be produced from that sample data. If we can't see data that we can feed into scripts that might do what you are trying to do so we can see if the code we're suggesting produces the results you are trying to produce, you are asking us to write code with both hands tied behind our backs. I have absolutely no idea from the above how we are supposed to determine what you consider to be a mismatched record.
With what you have told us, the only thing I can suggest is to use the diff utility to compare pairs of files.
The way you formatted post #5, it isn't clear whether the field names are on line 1 of your input files (as it is on your output file) or if they are on line 2 of your input files. And, even after removing leading and trailing whitespace characters from all of the fields in your sample input files there are absolutely no matching fields in your sample data. The names on the first two data lines to not match and the data in the other fields on the last line do not match, so from your sample input, I would assume that you want output something like:
You said the order of the fields could vary between files. But, assuming that the 1st line in each file contains the headings and that the first field is always the field to be used as the matching field, the following seems to do what you want even if the other fields are in random order:
If you have the files Source1:
and Source2:
(note that the order of the last two fields is switched between these files) and you run the script with:
you get the output:
I have two file as given below which shows the ACL permissions of each file. I need to compare the source file with target file and list down the difference as specified below in required output. Can someone help me on this ?
Source File
*************
# file: /local/test_1
# owner: own
#... (4 Replies)
Hi,
I have 2 variables as given below. How can i compare them and say its matching ? Appreciate your help
VAR1=describe/read/write
VAR2=read/write/describeThanks,
Please use CODE tags as required by forum rules! (4 Replies)
Hi Everyone,
I am comparing results of two environments using unix files.
I am writing two different csv file using spool and merging both the files using paste command
paste -d, file1.csv file2.csv > prod_uat_result.csv
and then finding the difference and attaching the same in a mail... (8 Replies)
Hi , I want to compare first 3 columns of File A and File B and create a new file File C which will have all rows from File B and will include rows that are present in File A and not in File B based on First 3 column comparison.
Thanks in advance for your help.
File A
A,B,C,45,46... (2 Replies)
I hope I can explain this correctly. I am using Bash-4.2 for my shell.
I have a group of file names held in an array. I want to compare the names in this array against the names of files currently present in a directory. If the file does not exist in the directory, that is not a problem.... (5 Replies)
Hi Guys ,
we have one directory ...in that directory all files will be set on each day..
files must have header ,contents ,footer..
i wants to compare the header,contents,footer ..if its same means display an error message as 'files contents same' (7 Replies)
I've two files with data like below:
file1.txt:
AAA,Apples,123
BBB,Bananas,124
CCC,Carrot,125
file2.txt:
Store1|AAA|123|11
Store2|BBB|124|23
Store3|CCC|125|57
Store4|DDD|126|38
So,the field separator in file1.txt is a comma and in file2.txt,it is |
Now,the output should be... (2 Replies)
Hi
I need to compare shadow file sizes with their real file counterparts. If the shadow file size differs form the realfile size then it must send a mail. My problem is that our system has over 1600 shadowfiles in different directories, with different names. the only consistancy is the .sh file... (4 Replies)