I need to compare two flat files (ASCII format), say file OLD and file NEW. Both have similar structure. These files are | delimitted files and have around few million of records (lines) each. Each file has same set columns and same set of key columns (i.e. the 3rd and 5th column of the record). I need to compare all the records in OLD and NEW file on the basis of values present in key columns, and store the result into DELTA file. If key set is present in NEW but not in OLD, then append 'I' in the record (NEW file record) and store in DELTA files and if key set is present in OLD file but not in NEW then append 'D' to the record (ODL file record) and store in DELTA file, also if same key set is present in both OLD and NEW file then append 'U' to the record (NEW file record) and store in DELTA file. There can be any number of fields(columns) in one record (row). Key set will be provided as the input to the program.
Examples please. Sounds like sth. diff could handle perfectly if lines consisted of keys only. How about cutting the keys to temp files, diff these, and go back to the original files with the result?
DELTA
One more condition to add, we don't have to include those records in DELTA file, which are identical in OLD and NEW files (means we don't need to include identical lines of NEW and OLD file into DELTA file, see example above for 1st line in both the files)
I tried, with sorting on key columns and then finding the difference.... but I am not good enough in UNIX.
Let me try to paraphrase your requirement:
If lines are identical, skip.
If different, but keys are same: output NEW's line to DELTA, adding a "u".
If keys differ, output NEW's line adding "i" and output OLD's line adding "d".
What are the files' sorting criteria? I'm afraid we're going to lose sync once the deviations occur. What will the maximum count be between lines with identical key pairs?
Thanks of your suggestions i was able to calculate the delta between some numbers in a column file with .
awk 'BEGIN{last=0}{delta=$1-last; last=$1; print $0" "delta}'
the file was like
499849120.00
500201312.00
500352416.00
500402784.00
500150944.00
499849120.00
500150944.00... (3 Replies)
Hello all,
I am currently trying to find the delta time from some GPS log.
I am using the following script with awk. But the script result shows some incorrect values (delta time some time = 0.2 but when I check it manually it is equal to 0.1)
My final goal is to get a script that print... (7 Replies)
Hi All,
I have two xml files.
One is having below input
<NameValuePair>
<name>Daemon</name>
<value>tcp:7474</value>
</NameValuePair>
<NameValuePair>
<name>Network</name>
<value></value>
</NameValuePair>
... (2 Replies)
Hi,
I require need help in two aspects actually:
1) Fatal error that gets generated as %F% from a log file say ABClog.dat to trigger a mail. At present I manually grep the log file as <grep %F% ABClog.dat| cut-d "%" -f1>. The idea is to use this same logic to grep the log file which is... (1 Reply)
Hey, for the purpose of a research project I need to know if a specific type of parallel processing is being utilized by any user-run programs. Is there a way to detect whether a program either returns a value to another program at the end of execution, or just utilizes any form of parallel... (4 Replies)
I have a fixed length file (854 characters file).
Our project will start getting this file soon. On the first day this file will have 100000 records. From the next day the file will have all the records from previous day + some new records (there will be few additions + few changes in day1... (13 Replies)
I am currently running 4 scripts to complete a job for me. Each script requires the finished file of the one before it. For example the first script gets the finished file called model.x, then i would like script2 to start in and use model.x as the input and get model_min.x as the finished... (5 Replies)