Difference of two data files & writing to an outfile.


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
Difference of two data files & writing to an outfile.

Hi Everyone,
I have two files i.e. one file2 contains today's data and the other file1 contains Yesterday's data.
The data in the files contains 226 columns and the data for the coulums separated by a Pipe "|" delimiter.
Now, I have 4 Primary keys (coulumns) by which I have to compare file2 and file1 and generate a diff file that is the data that is not present in file2 but present in file1 and vice versa.
The sample of the data that contains in file1 is ( 1 record):
Code:
START-OF-FILE
FILENAME=fixedincome_bo_namr.dif
DATA=bo
REGION=namr
TYPE=dif
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd
START-OF-FIELDS
# Security description
TICKER
CPN
MATURITY
SERIES
NAME
SHORT_NAME
ISSUER_INDUSTRY
MARKET_SECTOR_DES
CPN_FREQ
CPN_TYP
MTY_TYP
CALC_TYP_DES
DAY_CNT
MARKET_ISSUE
COUNTRY
CRNCY
COLLAT_TYP
AMT_ISSUED
AMT_OUTSTANDING
MIN_PIECE
MIN_INCREMENT
PAR_AMT
LEAD_MGR
EXCH_CODE
REDEMP_VAL
# Issuance information
ANNOUNCE_DT
FIRST_SETTLE_DT
FIRST_CPN_DT
INT_ACC_DT
ISSUE_DT
ISSUE_PX
# Identifiers
ID_EUROCLEAR
ID_XTRAKTER
ID_SEDOL1
ID_SEDOL2
ID_CEDEL
ID_WERTPAPIER
ID_ISIN
ID_DUTCH
ID_VALOREN
ID_FRENCH
ID_COMMON
ID_JAPAN
ID_BELGIUM
ID_DANISH
ID_AUSTRIAN
ID_LUXEMBOURG
ID_SWEDISH
ID_NORWAY
ID_JAPAN_COMPANY
ID_SPAIN
ID_ITALY
ID_BB_COMPANY
ID_BB_SECURITY
ID_CUSIP
# Schedules
NXT_CALL_DT
NXT_CALL_PX
NXT_PAR_CALL_DT
NXT_PUT_DT
NXT_PUT_PX
NXT_PAR_PUT_DT
NXT_CPN_DT
NXT_SINK_DT
NXT_REFUND_DT
# Ratings
RTG_JCR
# Floaters
REFIX_FREQ
NXT_REFIX_DT
RESET_IDX
# Inflation-indexed bonds
# Preferreds
PFD_DVD_PAY_DT
PFD_EX_DVD_DT
PFD_RST_DVD
# Convertibles
CV_COMMON_TICKER
CV_COMMON_TICKER_EXCH
CV_CNVS_RATIO
CV_UNTIL
CV_CNVS_FEXCH_RT
CV_PROV_PX
CALLABLE
SINKABLE
PUTABLE
ID_BB_PARENT_CO
PARENT_COMP_NAME
PARENT_COMP_TICKER
CNTRY_OF_INCORPORATION
BASIC_SPREAD
INDUSTRY_SECTOR
INDUSTRY_GROUP
INDUSTRY_SUBGROUP
COUNTRY_GUARANTOR
CNTRY_OF_DOMICILE
SECURITY_DES
144A_FLAG
FLT_BENCH_MULTIPLIER
SECURITY_TYP
CV_START_DT
GUARANTOR_NAME
PREV_CPN_DT
NXT_SINK_AMT
CALL_DISCRETE
PUT_DISCRETE
MAKE_WHOLE_CALL
ID_BB_UNIQUE
LONG_COMP_NAME
REDEMP_CRNCY
CPN_CRNCY
DTC_ELIGIBLE
STRUCTURED_NOTE
PCT_PAR_QUOTED
PCS_QUOTE_TYP
IS_UNIT_TRADED
IS_REVERSE_CONVERTIBLE
TRADE_CRNCY
BEARER
REGISTERED
CALLED
CALLED_DT
ISSUER
CALL_FEATURE
PUT_FEATURE
PENULTIMATE_CPN_DT
FLT_CPN_CONVENTION
CUR_CPN
FLOATER
TRADE_STATUS
CDR_COUNTRY_CODE
CDR_SETTLE_CODE
SEASONING_STATUS
FINAL_MATURITY
PRVT_PLACE
CALC_TYP
REMOVAL_REASON
IS_PERPETUAL
IS_REG_S
CALLED_PX
DEFAULTED
GILTS_EX_DVD_DT
MOST_RECENT_REPORTED_FACTOR
NXT_FACTOR_DT
OID_BOND
DELIVERY_TYP
ID_SEDOL3
ID_SEDOL4
ID_SEDOL5
SEDOL1_COUNTRY_ISO
SEDOL2_COUNTRY_ISO
SEDOL3_COUNTRY_ISO
SEDOL4_COUNTRY_ISO
SEDOL5_COUNTRY_ISO
ID_MIC1
ID_MIC2
ID_MIC3
ID_MIC4
ID_MIC5
CV_SH_PAR
DUAL_CRNCY
EXTENDIBLE
EXCHANGEABLE
IS_SOFT_CALL
CV_MANDATORY_CNVS
EU_SAVINGS_DIRECTIVE
ID_CUSIP_REAL
INDUSTRY_SUBGROUP_NUM
SECURITY_TYP2
LAST_REFIX_DT
ISO_COUNTRY_GUARANTOR
DTC_REGISTERED
CALL_PARTIAL
CV_CNVS_PX
IS_CURRENT_GOVT
FIRST_CALL_DT_ISSUANCE
UNDL_ID_BB_UNIQUE
EST_CPN_FLAG
ID_BB_GUARANTOR
IS_DAY_PAYER
STEPUP_CPN
STEPUP_DT
CALC_MATURITY
ID_EXCH_SYMBOL
CREDIT_ENHANCEMENTS
INSURANCE_STATUS
JUNIOR
SENIOR
FLT_PAY_DAY
FLT_DAYS_PRIOR
INFLATION_LINKED_INDICATOR
DAYS_TO_SETTLE
TYPE_OF_BOND
REFERENCE_INDEX
BASE_CPI
CFI_CODE
CPN_FREQ_YLD_CNV
DAY_PAYER_FREQ
EX_DIV_DAYS
EX_DIV_CALENDAR
CONTINGENT_CONVERSION
CONTRIB_DATA_INDICATOR
SECURITY_FACTORABLE
ID_BB_GLOBAL
END-OF-FIELDS
TIMESTARTED=Wed Jun  8 18:33:51 EDT 2011
  230 START-OF-DATA
  231 CP5101987 Corp|-1|198|C|7.300000|20110607|EAB|EUROPEAN AMERICAN BANK|LNB-CALL12/97|BANK|Corp|2|STEP CPN|CALLABLE|MULTI-STEP BOND|1|US DOMESTIC|US|USD|DE      POSIT NOTES|760000.00|.00|10000.0000|1000.0000|1000.00|LAS-sole|NOT LISTED|100.00000|19960607|19960607|19961207|19960607|19960607|100.000000| | | | | |       |US29874AZZ55| | | | | | | | | | | | | | |225433|500231|29874AZZ5| | | | | | | | | |N.A.| | | | | | | | | | | | |Y|N|N|197879|Citibank NA|8156Z|US| |Fin      ancial|Banks|Money Center Banks|N.A.|US|C 7.3 06/07/11|N| |US DOMESTIC| |N.A.| | |Y| |N|COCP5101987|European American Bank|USD|USD|Y|Y|Y|1|N|N|USD|N|N|Y      |19971207|EUROPEAN AMERICAN BANK|Semi-Annual| |19970607| | |N|N|US|US|Does Not Apply|20110607|N|421|MATURED|N|N|100.000000|N| |.000000000| |N|DTC| | | |      N.A.|N.A.|N.A.|N.A.|N.A.| | | | | | |N|N|N|N| |Grandfathered|29874AZZ5|145| | |N.A.|N|FULL (ONLY)| |N|19970607| | | |N| | |20110607| | |N|N|N| | |N|3| |       | |N.A.|2| | | | |N|N|BBG00048KQJ7|

The sample of the data that contains in file2 (Todays file) is ( 1 record):
Code:
START-OF-FILE
FILENAME=fixedincome_bo_namr.dif
DATA=bo
REGION=namr
TYPE=dif
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd
START-OF-FIELDS
# Security description
TICKER
CPN
MATURITY
SERIES
NAME
SHORT_NAME
ISSUER_INDUSTRY
MARKET_SECTOR_DES
CPN_FREQ
CPN_TYP
MTY_TYP
CALC_TYP_DES
DAY_CNT
MARKET_ISSUE
COUNTRY
CRNCY
COLLAT_TYP
AMT_ISSUED
AMT_OUTSTANDING
MIN_PIECE
MIN_INCREMENT
PAR_AMT
LEAD_MGR
EXCH_CODE
REDEMP_VAL
# Issuance information
ANNOUNCE_DT
FIRST_SETTLE_DT
FIRST_CPN_DT
INT_ACC_DT
ISSUE_DT
ISSUE_PX
# Identifiers
ID_EUROCLEAR
ID_XTRAKTER
ID_SEDOL1
ID_SEDOL2
ID_CEDEL
ID_WERTPAPIER
ID_ISIN
ID_DUTCH
ID_VALOREN
ID_FRENCH
ID_COMMON
ID_JAPAN
ID_BELGIUM
ID_DANISH
ID_AUSTRIAN
ID_LUXEMBOURG
ID_SWEDISH
ID_NORWAY
ID_JAPAN_COMPANY
ID_SPAIN
ID_ITALY
ID_BB_COMPANY
ID_BB_SECURITY
ID_CUSIP
# Schedules
NXT_CALL_DT
NXT_CALL_PX
NXT_PAR_CALL_DT
NXT_PUT_DT
NXT_PUT_PX
NXT_PAR_PUT_DT
NXT_CPN_DT
NXT_SINK_DT
NXT_REFUND_DT
# Ratings
RTG_JCR
# Floaters
REFIX_FREQ
NXT_REFIX_DT
RESET_IDX
# Inflation-indexed bonds
# Preferreds
PFD_DVD_PAY_DT
PFD_EX_DVD_DT
PFD_RST_DVD
# Convertibles
CV_COMMON_TICKER
CV_COMMON_TICKER_EXCH
CV_CNVS_RATIO
CV_UNTIL
CV_CNVS_FEXCH_RT
CV_PROV_PX
CALLABLE
SINKABLE
PUTABLE
ID_BB_PARENT_CO
PARENT_COMP_NAME
PARENT_COMP_TICKER
CNTRY_OF_INCORPORATION
BASIC_SPREAD
INDUSTRY_SECTOR
INDUSTRY_GROUP
INDUSTRY_SUBGROUP
COUNTRY_GUARANTOR
CNTRY_OF_DOMICILE
SECURITY_DES
144A_FLAG
FLT_BENCH_MULTIPLIER
SECURITY_TYP
CV_START_DT
GUARANTOR_NAME
PREV_CPN_DT
NXT_SINK_AMT
CALL_DISCRETE
PUT_DISCRETE
MAKE_WHOLE_CALL
ID_BB_UNIQUE
LONG_COMP_NAME
REDEMP_CRNCY
CPN_CRNCY
DTC_ELIGIBLE
STRUCTURED_NOTE
PCT_PAR_QUOTED
PCS_QUOTE_TYP
IS_UNIT_TRADED
IS_REVERSE_CONVERTIBLE
TRADE_CRNCY
BEARER
REGISTERED
CALLED
CALLED_DT
ISSUER
CALL_FEATURE
PUT_FEATURE
PENULTIMATE_CPN_DT
FLT_CPN_CONVENTION
CUR_CPN
FLOATER
TRADE_STATUS
CDR_COUNTRY_CODE
CDR_SETTLE_CODE
SEASONING_STATUS
FINAL_MATURITY
PRVT_PLACE
CALC_TYP
REMOVAL_REASON
IS_PERPETUAL
IS_REG_S
CALLED_PX
DEFAULTED
GILTS_EX_DVD_DT
MOST_RECENT_REPORTED_FACTOR
NXT_FACTOR_DT
OID_BOND
DELIVERY_TYP
ID_SEDOL3
ID_SEDOL4
ID_SEDOL5
SEDOL1_COUNTRY_ISO
SEDOL2_COUNTRY_ISO
SEDOL3_COUNTRY_ISO
SEDOL4_COUNTRY_ISO
SEDOL5_COUNTRY_ISO
ID_MIC1
ID_MIC2
ID_MIC3
ID_MIC4
ID_MIC5
CV_SH_PAR
DUAL_CRNCY
EXTENDIBLE
EXCHANGEABLE
IS_SOFT_CALL
CV_MANDATORY_CNVS
EU_SAVINGS_DIRECTIVE
ID_CUSIP_REAL
INDUSTRY_SUBGROUP_NUM
SECURITY_TYP2
LAST_REFIX_DT
ISO_COUNTRY_GUARANTOR
DTC_REGISTERED
CALL_PARTIAL
CV_CNVS_PX
IS_CURRENT_GOVT
FIRST_CALL_DT_ISSUANCE
UNDL_ID_BB_UNIQUE
EST_CPN_FLAG
ID_BB_GUARANTOR
IS_DAY_PAYER
STEPUP_CPN
STEPUP_DT
CALC_MATURITY
ID_EXCH_SYMBOL
CREDIT_ENHANCEMENTS
INSURANCE_STATUS
JUNIOR
SENIOR
FLT_PAY_DAY
FLT_DAYS_PRIOR
INFLATION_LINKED_INDICATOR
DAYS_TO_SETTLE
TYPE_OF_BOND
REFERENCE_INDEX
BASE_CPI
CFI_CODE
CPN_FREQ_YLD_CNV
DAY_PAYER_FREQ
EX_DIV_DAYS
EX_DIV_CALENDAR
CONTINGENT_CONVERSION
CONTRIB_DATA_INDICATOR
SECURITY_FACTORABLE
ID_BB_GLOBAL
END-OF-FIELDS
 TIMESTARTED=Thu Jun  9 18:34:19 EDT 2011
 230 START-OF-DATA
 231 9999X01M9 Govt|-1|198|XIB|0|20110707| |WI TSY BILL|WI TSY BILL|USGN|Govt| |NONE|NORMAL|DISCOUNT|2|US GOVT|US|USD| |31782000000|31782000000|100|100| | | |     100.00000| | |20110106| |20110609|0.005000| | | | | | | | | | | | | | | | | | | | | |349057|13714872|9127952X8| | | | | | | | | | | | | | | | | | | | | |      |N|N|N|218252|United States of America|3352Z|US| |Government|Sovereign|Sovereign| |US|XIB 07/07/11| | |US GOVERNMENT| | | | | | |N.A.|GV9999X01M9|United      States Treasury Bill - WI Post Auction|USD|USD| | |Y|2|N| |USD|N.A.|N.A.|N| |WI TSY BILL| | | | | |N|Y|US|US| |20110707|N.A.|5| | | | | | | | |Y| | | |      |N.A.|N.A.|N.A.|N.A.|N.A.| | | | | | | | |N.A.|N| |Non-Grandfathered|9999X01M9|459|Bill| | | | | |N| | | | |N| | |20110707| | | | | | | |N|1| | | | | | |      | | | |N|BBG001CSH9Y7|

Since I have 4 primary keys, I though of doing the code using Hashes.
Could someone please help me out. I am looking for a desperate help on this.
Really appreciated if someone could help out with some ideas.
# 2  
Do you want to find the differences file1 and file2 for the line starting with 231?
If yes, what are the keys?
# 3  
Hi getmmg,

Yes. Thats correct I have to start finding the differences from line 231.

The primary keys are as follows:
Code:
ID_ISIN
ID_CUSIP
ID_BB_UNIQUE
SECURITY DES

Thanks a lot for your reply.

Last edited by Yogesh Sawant; 06-13-2011 at 04:19 PM.. Reason: added code tags
# 4  
Your primary keys have the following sequence in the record:
Code:
49:ID_ISIN
66:ID_CUSIP
119:ID_BB_UNIQUE
188:UNDL_ID_BB_UNIQUE

To extract the primary key values from the file:
Code:
egrep '|' input_file | cut -d'|' -f49,66,119,188

Then you can compare the extracted values.
# 5  
Hi Shell_life,

Thanks for your reply.

So once I extract the primary keys using your command, then I have to push them to a hash ( keys & values) and then start comparing ?

Code:
 
egrep '|' input_file | cut -d'|' -f49,66,119,188

Like The Outfile should contain, the records that exits in file1 but doesn't exist in file2 and vice versa.

I am sorry Could you please explain me a little bit more.
# 6  
The result should be output to a file:
Code:
egrep '|' Input_Yesterday | cut -d'|' -f49,66,119,188 > File_Yesterday
egrep '|' Input_Today | cut -d'|' -f49,66,119,188 > File_Today

After having these two files, you can use either "diff" or "cmp" commands.

Choose which command works best for you.
# 7  
Hi Shell_life,

Thanks a lot for your reply.

Once I get the two files after running your commands and then I would use the diff() command to write to an outfile.

But the outfile contains only four columns.

For example:


The below record exists in file1.

Code:
 231 CP5101987 Corp|-1|198|C|7.300000|20110607|EAB|EUROPEAN AMERICAN BANK|LNB-CALL12/97|BANK|Corp|2|STEP CPN|CALLABLE|MULTI-STEP BOND|1|US DOMESTIC|US|USD|DE      POSIT NOTES|760000.00|.00|10000.0000|1000.0000|1000.00|LAS-sole|NOT LISTED|100.00000|19960607|19960607|19961207|19960607|19960607|100.000000| | | | | |       |US29874AZZ55| | | | | | | | | | | | | | |225433|500231|29874AZZ5| | | | | | | | | |N.A.| | | | | | | | | | | | |Y|N|N|197879|Citibank NA|8156Z|US| |Fin      ancial|Banks|Money Center Banks|N.A.|US|C 7.3 06/07/11|N| |US DOMESTIC| |N.A.| | |Y| |N|COCP5101987|European American Bank|USD|USD|Y|Y|Y|1|N|N|USD|N|N|Y      |19971207|EUROPEAN AMERICAN BANK|Semi-Annual| |19970607| | |N|N|US|US|Does Not Apply|20110607|N|421|MATURED|N|N|100.000000|N| |.000000000| |N|DTC| | | |      N.A.|N.A.|N.A.|N.A.|N.A.| | | | | | |N|N|N|N| |Grandfathered|29874AZZ5|145| | |N.A.|N|FULL (ONLY)| |N|19970607| | | |N| | |20110607| | |N|N|N| | |N|3| |       | |N.A.|2| | | | |N|N|BBG00048KQJ7|

If the above record doesn't exists in file 2 ( by comapring the Primary keys ) then the complete record should write in the outfile.

So, after running the below script:

Code:
egrep '|' Input_Yesterday | cut -d'|' -f49,66,119,188 > File_Yesterday
egrep '|' Input_Today | cut -d'|' -f49,66,119,188 > File_Today

and after using the diff(), the put file contains only the four columns where as we need the entire record in the outfile.

Could you please suggest me how can I proceed further and whether this would be easier using Shell or perl.
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #99
Difficulty: Easy
Unix has a flat file structure which allows easier access and maintenance of data.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Difference between & and nohup &

Hi All, Can anyone please help me understanding what the difference between the below two? 1. script.sh & 2. nohup script.sh & (2 Replies)
Discussion started by: Anupam_Halder
2 Replies

2. Shell Programming and Scripting

Help on writing data from 2 different files to one based on a common factor

Hello all, I have 2 text files. For example: File1.txt contains data A B C D ****NEXT**** X Y Z ****NEXT**** L M N and File2.txt contains data (13 Replies)
Discussion started by: vat1kor
13 Replies

3. Shell Programming and Scripting

awk help: Match data fields from 2 files & output results from both into 1 file

I need to take 2 input files and create 1 output based on matches from each file. I am looking to match field #1 in both files (Userid) and create an output file that will be a combination of fields from both file1 and file2 if there are any differences in the fields 2,3,4,5,or 6. Below is an... (5 Replies)
Discussion started by: ambroze
5 Replies

4. Shell Programming and Scripting

search & merg data from 3 files

i have 3 files which contains as below (example): yy-mm-dd hh:mm:sec lat lon depth mag 2006-01-01 23:17:26.80 39.8405 41.8795 2.0 3.3 2006-01-06 00:10:26.80 39.9570 41.2130 5.0 3.3 2006-01-06 06:59:02.10 39.4099 44.6065 10.0 3.7 2006-01-06 13:49:52.70... (4 Replies)
Discussion started by: oreka18
4 Replies

5. Shell Programming and Scripting

Copying the Header & footer Information to the Outfile.

Hi I am writing a perl script which checks for the specific column values from a file and writes to the OUT file. So the feed file has a header information and footer information. I header information isaround107 lines i.e. Starts with START-OF-FILE ....... so on .... ... (11 Replies)
Discussion started by: filter
11 Replies

6. Shell Programming and Scripting

Sort a the file & refine data column & row format

cat file1.txt field1 "user1": field2:"data-cde" field3:"data-pqr" field4:"data-mno" field1 "user1": field2:"data-dcb" field3:"data-mxz" field4:"data-zul" field1 "user2": field2:"data-cqz" field3:"data-xoq" field4:"data-pos" Now i need to have the date like below. i have just... (7 Replies)
Discussion started by: ckaramsetty
7 Replies

7. Shell Programming and Scripting

How to combine 2 files and output the unique & difference?

Hi Guys, I have two input files and I want to combine them and get the unique values and differences and put them into one file. See below desired output file. Inputfile1: 1111111 2222222 3333333 7860068 7860069 7860071 7860072 Inputfile2: 4444444 (4 Replies)
Discussion started by: pinpe
4 Replies

8. UNIX for Dummies Questions & Answers

Reading and writing data to and from multiple files

Hi, I have several text files. One main file contains the detail data, other have some information to extract data from the main file, and some are empty files. Examples are shown below: The main file look like: MainFile.txt >Header1 data1...data1... >Header2 data2...data2... ... ...... (2 Replies)
Discussion started by: Fahmida
2 Replies

9. Shell Programming and Scripting

Need help in writing a script to create a new text file with specific data from existing two files

Hi, I have two text files. Need to create a third text file extracting specific data from first two existing files.. Text File 1: Format contains: SQL*Loader: Release 10.2.0.1.0 - Production on Wed Aug 4 21:06:34 2010 some text ............so on...and somwhere text like: Record 1:... (1 Reply)
Discussion started by: shashi143ibm
1 Replies

10. UNIX Desktop Questions & Answers

what is the difference between Unix & linux, what are the advantages & disadvantages

ehe may i know what are the difference between Unix & Linux, and what are the advantages of having Unix as well as disadvantages of having Unix or if u dun mind i am dumb do pls tell me what are the advantages as well as the disadvantages of having linux as well. thanks (1 Reply)
Discussion started by: cybertechmkteo
1 Replies

Featured Tech Videos