I want to find the difference between two files, only for the header (column names)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting I want to find the difference between two files, only for the header (column names)
# 29  
Old 09-15-2014
@MadeINGermany

Thanks. I changed the locale settings and it's working.

But the only problem I saw is when there is no values in some of the columns then the file looks like ;;;; with just separators.

The code is removing one of the last separator ";". When it contains no values then it will show just multiple separators like ;;;; .

It is removing the last ";" at the end of few lines (rows). I have provided the example below. Left section is input and the right one is output where you can see one of the ";" is removed by the program.

Image

Image

And if the column is added in the last it is not getting reported in the mismatch.txt. It is reporting only when the columns are added in the middle. Can you please let me know how to resolve this issue of one missing ";" at the end of few rows? and how to report if the column or columns are added at the end or middle or beginning?


Merlin Joseph

Last edited by Merlin Joseph; 09-15-2014 at 09:22 AM..
# 30  
Old 09-15-2014
The printed columns are determined by the matches between the line in declare.txt and the 1st line in feed.txt
So I guess you must add one (or three) more matching field to declare.txt
This User Gave Thanks to MadeInGermany For This Post:
# 31  
Old 09-16-2014
@MadeINGermany

But the number of columns I declared are found in the input file feed.txt and they are matching.

It's about a scenario when no new columns are added in the feed.txt :

The program has to just return the feed.txt file with a different name feed2.txt with all it's data and columns intact, when the program does not find any new columns added in the feed.txt

Please advice.

Merlin Jospeh

Last edited by Merlin Joseph; 09-16-2014 at 06:16 AM..
# 32  
Old 09-17-2014
@Rudic,

I have even tested your script. It works well for the first time and then when it is executed for the second time it is removing the values from most of the columns. It is even creating a "mismatch.txt" even when there are no columns added in the "feed.txt"

I found these values in the mismatch.txt when I did not add any new columns and values (for testing) in the feed.txt (input file).

Column 2: image
Column 3: id
Column 4: showcase_id
Column 5: showcase_name
Column 6: showcase_zip_code
Column 7: vehicle_id
Column 8: carmodel_id
Column 9: make_id
Column 10: external_mpn
Column 11: carmodel_name
Column 12: make_name
Column 13: colour
Column 14: colour_alias
Column 15: colour_secondary_name
Column 16: colour_secondary_alias
Column 17: trim
Column 18: catalog_price
Column 19: customer_bonus
Column 20: reprise_bonus
Column 21: legal_notice
Column 22: rebate_legal_notice
Column 23: eco_bonus
Column 24: price
Column 25: customer_benefit_percent
Column 26: options
Column 27: code
Column 28: name
Column 29: matriculation_on
Column 30: status
Column 31: updated_at
Column 32: created_at
Column 33: mileage
Column 34: vehicle_acceleration_0_100kph
Column 35: vehicle_air_conditioning
Column 36: vehicle_body_type
Column 37: vehicle_co2_emission_level
Column 38: vehicle_combined_fuel_economy
Column 39: vehicle_cylinders_count
Column 40: vehicle_doors_count
Column 41: vehicle_driven_wheels
Column 42: vehicle_emission_standard
Column 43: vehicle_engine_capacity
Column 44: vehicle_extra_urban_fuel_economy
Column 45: vehicle_fiscal_horse_power
Column 46: vehicle_height
Column 47: vehicle_kerbweight
Column 48: vehicle_length
Column 49: vehicle_max_load_capacity
Column 50: vehicle_name
Column 51: vehicle_primary_fuel_tank_capacity
Column 52: vehicle_primary_fuel_type
Column 53: vehicle_published_hp_metric
Column 54: vehicle_seats_count
Column 55: vehicle_secondary_fuel_type
Column 56: vehicle_type
Column 57: vehicle_luggage_capacity
Column 58: vehicle_tvs_tax
Column 59: vehicle_transmission_shiftable
Column 60: vehicle_trim_name
Column 61: vehicle_trim_code
Column 62: vehicle_urban_fuel_economy
Column 63: vehicle_vehicle_warranty_kms
Column 64: vehicle_vehicle_warranty_months
Column 65: vehicle_width_excluding_mirrors
Column 66: vehicle_transmission_type
Column 67: vehicle_engine_name
Column 68: vehicle_engine_code
Column 69: vehicle_rsi_id
Column 70: image_1
Column 71: image_2
Column 72: image_3
Column 73: image_4
Column 74: image_5

It is skipping the first column name "url"
and also in the output it is removing the column values when executed more than once.

Ex: offre-vehicule-neuf/make/rav4-124-d-4d-2wd-life_157127/chartres_1766?_krg=link&_krk=xyz&_krt=HREV30D003458;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;


Only the delimiters remain.

Thanks

Merlin Joseph
# 33  
Old 09-17-2014
Did you check to see if the output files produced by your script contain the header lines?
# 34  
Old 09-17-2014
@Dan Cragun,

Yes, when I checked it is just keeping one column with it's value and then removing all other columns and it's values from the output file. The header is like this in the output file

url;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Surprisingly the script provided by Mr.RudiC works for the first time and then it fails to generate the correct output.

The requirement is, report the missing columns (one or more) if added any where in the feed.txt (input file) when compared with the declare.txt (where column names are declared for comparison).

Generate an output file (feed2.txt for example) in the both the scenarios whether new columns are found or not found. If new columns are found create a file "mismatch.txt" for reporting. If new columns are not added in the input file then the output will remain same as input file (feed.txt = feed2.txt). If new columns are found then they should be removed with it's associated values for creating an output file "feed2.txt"

Please help.

Thanks
Merlin Joseph

Last edited by Merlin Joseph; 09-18-2014 at 02:07 AM..
# 35  
Old 09-18-2014
Quote:
Originally Posted by Merlin Joseph
@MadeINGermany

But the number of columns I declared are found in the input file feed.txt and they are matching.

It's about a scenario when no new columns are added in the feed.txt :

The program has to just return the feed.txt file with a different name feed2.txt with all it's data and columns intact, when the program does not find any new columns added in the feed.txt

Please advice.

Merlin Jospeh
The declare.txt and the header of feed.txt must be identical (including the field contents),
then feed.txt won't be changed.
So I have understood the original requirement.

---------- Post updated at 12:43 PM ---------- Previous update was at 11:00 AM ----------

Also, in feed.txt the number of fields must be consistent.
In your previous example, the header has not got the "updated_at" and "created_at", so the script does not work correctly (it then strips off the last two columns):
Code:
awk -F\; '{print NF}' feed.txt
72
74
74
74

This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Prefix a variable in the first column of all the records of the files with and without header

In a bash shell, I have to prefix a variable to two .CSV files File1.CSV and File2.CSV. One of the files has a header and the other one is with no header in the below format: "value11","value12","value13","value14","value15","value16" "value21","value22","value23","value24","value25","value26"... (7 Replies)
Discussion started by: dhruuv369
7 Replies

2. Shell Programming and Scripting

How to get difference of the same column between two files when other column matches?

File 1: 20130416,235800,10.78.25.104,BR2-loc,60.0,1624,50.0,0,50.0,0 20130416,235800,10.78.25.104,BR1-LOC,70.0,10,50.0,0,70.0,0 20130416,235800,10.78.25.104,Hub_None,60.0,15,60.0,0,50.0,0 File 2: 20130417,000200,10.78.25.104,BR2-loc,60.0,1626,50.0,0,50.0,0... (3 Replies)
Discussion started by: Lakshmikumari
3 Replies

3. Homework & Coursework Questions

Script to find difference between 2 files by column

Hi , i am newbie to shell scripting and am trying to do the below job, A shell script to be run with a command like sh Compare.ksh file1.txt file2.txt 1 2 > file3.txt 1 2-are the key columns Consider the delimiter would be Tab or comma File 1: SK TEST NAME MATHS PHYSICS 21 1 AAA... (1 Reply)
Discussion started by: shakthi666
1 Replies

4. Shell Programming and Scripting

Script to find difference between 2 files by column

Hi , i am newbie to shell scripting and am trying to do the below job, A shell script to be run with a command like sh Compare.ksh file1.txt file2.txt 1 2 > file3.txt 1 2-are the key columns Consider the delimiter would be Tab or comma File 1: SK TEST NAME MATHS PHYSICS 21 1... (1 Reply)
Discussion started by: shakthi666
1 Replies

5. UNIX for Dummies Questions & Answers

Find the average based on similar names in the first column

I have a table, say this: name1 num1 num2 num3 num4 name2 num5 num6 num7 num8 name3 num1 num3 num4 num9 name2 num8 num9 num1 num2 name2 num4 num5 num6 num4 name4 num4 num5 num7 num8 name5 num1 num3 num9 num7 name5 num6 num8 num3 num4 I want a code that will sort my data according... (4 Replies)
Discussion started by: FelipeAd
4 Replies

6. UNIX for Dummies Questions & Answers

Rename a header column by adding another column entry to the header column name

Hi All, I have a file example.csv which looks like this GrpID,TargetID,Signal,Avg_Num CSCH74_1_1,2007,61,256 CSCH74_1_1,212007,647,679 CSCH74_1_1,12007,3,32 CSCH74_1_1,207,299,777 I want the output as GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num CSCH74_1_1,2007,61,256... (1 Reply)
Discussion started by: Vavad
1 Replies

7. Shell Programming and Scripting

Rename a header column by adding another column entry to the header column name URGENT!!

Hi All, I have a file example.csv which looks like this GrpID,TargetID,Signal,Avg_Num CSCH74_1_1,2007,61,256 CSCH74_1_1,212007,647,679 CSCH74_1_1,12007,3,32 CSCH74_1_1,207,299,777 I want the output as GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num CSCH74_1_1,2007,61,256... (4 Replies)
Discussion started by: Vavad
4 Replies

8. Shell Programming and Scripting

find difference in file column...

Hi All, i have a file that is tab delimited. i need help to find the rows which are having same price based on the site code but some times, there are difference so i need to find only the records which are different in all site code. Dept Sec Barcode 10001 10002 10003 10004... (1 Reply)
Discussion started by: malcomex999
1 Replies

9. Shell Programming and Scripting

script to compare first column of two files and find difference

Hi, I want to write a script which will compare the 1st column of both the files and will give the difference. e.g:- my 1st file contains: 89 /usr 52 /usr/local 36 /tmp 92 /opt 96 /home 27 /etc/opt/EMCom 1 ... (3 Replies)
Discussion started by: adityam
3 Replies

10. Shell Programming and Scripting

Column names in flat files

Hi all, I want to create column names in a flat file and then load the data through some other application. For example, I have a file with emp.txt and I need column names as eno,ename,sal in the first line. The delimiter here is comma and record delimiter is end of line or unix new line. Could... (1 Reply)
Discussion started by: srivsn
1 Replies
Login or Register to Ask a Question