Compare two csv's with column based


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare two csv's with column based
# 1  
Old 03-09-2017
Compare two csv's with column based

Hi,

I am having below two CSV's
Code:
col_1,col_2,col_3
1,2,4
1,3,6

col_1,col_3,col2,col_5,col_6
1,2,3,4,5
1,6,3,,,

I need to compare based on the columns where the mismatch is

expected output
Code:
col_1,col_2,col_3
1,2,4

the above output is compared based on column since the column names are shuffled among other csv

here 1,3,6 got matached because it is same as other 2.csv

i tried using diff but not able to handle with coulmns also used join

Code:
diff 1.csv 2.csv
join -t, <(sort 1.csv) <(sort 2.csv)

# 2  
Old 03-09-2017
Hi.

Is col2 in col_1,col_3,col2,col_5,col_6 a typo which should be col_2, or is this a different column name? ... cheers, drl
# 3  
Old 03-09-2017
Not sure this is the most elegant solution, but the result is as desired. Try
Code:
awk '
NR == FNR       {if (NR == 1)   {print
                                 for (MX=n=NF; n>0; n--) REF[$n]=n
                                }
                 else           TMP[NR] = $0
                 next
                }

FNR == 1        {for (n=NF; n>0; n--) if ($n in REF) CMP[n]=REF[$n]
                 next
                }

                {TSTR = TFS = ""
                 split ("", IT)
                 for (n=NF; n>0; n--)   if (CMP[n]) IT[CMP[n]] = $n
                 for (i=1; i<=MX; i++)  {TSTR = TSTR TFS IT[i]
                                         TFS  = FS
                                        }
                 if (TSTR != TMP[FNR]) print TMP[FNR]
                }


' FS="," file[12]
col_1,col_2,col_3
1,2,4

# 4  
Old 03-10-2017
Hi.

I agree that as soon as one knows that fields are involved, one should consider awk.

However, if you are going to be dealing with csv files frequently, you might want to consider csv-specific utilities. Here's an example of one, along with a field-specific utility.
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate field differences, csv-format file.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C csvfix dwdiff

pl " Input data files:"
head data1 data2

csvfix order -fn col_1,col_2,col_3 data1 >new1
csvfix order -fn col_1,col_2,col_3 data2 >new2

pl " Results, csvfix to generate named column files in order:"
head new1 new2

pl " Results, delimited-word diff:"
dwdiff -3 -d',' new1 new2

exit 0

producing:
Code:
$ ./s1 

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.7 (jessie) 
bash GNU bash 4.3.30
csvfix - ( local: ~/executable/csvfix, 2014-05-17 )
dwdiff 2.0.9

-----
 Input data files:
==> data1 <==
col_1,col_2,col_3
1,2,4
1,3,6

==> data2 <==
col_1,col_3,col_2,col_5,col_6
1,2,3,4,5
1,6,3,,,

-----
 Results, csvfix to generate named column files in order:
==> new1 <==
"col_1","col_2","col_3"
"1","2","4"
"1","3","6"

==> new2 <==
"col_1","col_2","col_3"
"1","3","2"
"1","3","6"

-----
 Results, delimited-word diff:
======================================================================
[-"2"-]{+"3"+}
======================================================================
[-"4"-]{+"2"+}
======================================================================

The csvfix selects the named columns, placing them in order, and dwdiff compares the fields (words) separately.

So "2" is replaced by "3", and "4" is replaced by "2". There are a number of options on dwdiff, see man page.

Here are some details on those utilities:
Code:
csvfix  Manipulate csv files (doc)
Path    : ~/executable/csvfix
Version : - ( local: ~/executable/csvfix, 2014-05-17 )
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Home    : https://neilb.bitbucket.io/csvfix/

dwdiff  a delimited word diff program (man)
Path    : /usr/bin/dwdiff
Version : 2.0.9
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help    : probably available with --help
Repo    : Debian 8.7 (jessie)

and here are some other csv-specific tools:
Code:
CSV, comma separated variable file (comma typical, it can be almost anything)

        1) csvfix
           https://neilb.bitbucket.io/csvfix/ (checked  2017.03)

        2) csvtool

        3) Text::CSV -- perl module, DIY

        4) crush, Google collection
           http://crush-tools.googlecode.com/files/crush-tools-2013-04.tar.gz

        5) csvkit
           https://github.com/wireservice/csvkit

Best wishes ... cheers, drl

Last edited by drl; 03-11-2017 at 09:55 AM.. Reason: Edit 1: minor typo in list number
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Example: I have files in below format file 1: zxc,133,joe@example.com cst,222,xyz@example1.com File 2 Contains: hxd hcd jws zxc cst File 1 has 50000 lines and file 2 has around 30000 lines : Expected Output has to be : hxd hcd jws (5 Replies)
Discussion started by: TestPractice
5 Replies

2. Shell Programming and Scripting

Get maximum per column from CSV file, based on date column

Hello everyone, I am using ksh on Solaris 10 and I'm gathering data in a CSV file that looks like this: 20170628-23:25:01,1,0,0,1,1,1,1,55,55,1 20170628-23:30:01,1,0,0,1,1,1,1,56,56,1 20170628-23:35:00,1,0,0,1,1,2,1,57,57,2 20170628-23:40:00,1,0,0,1,1,1,1,58,58,2... (6 Replies)
Discussion started by: ejianu
6 Replies

3. UNIX for Beginners Questions & Answers

Compare first column from two csv files with greater than or equal, and less than

I have two csv files of different sizes. The output file needs to have file1 contents on top of file2 contents where file2 col1 is >= to file1 col1, and file2 col1(same value) is < file1 col1 (next value). So basically, some file2 rows will be matched to the same file1 row because it is the closet... (7 Replies)
Discussion started by: aachave1
7 Replies

4. Shell Programming and Scripting

Compare 2 files of csv file and match column data and create a new csv file of them

Hi, I am newbie in shell script. I need your help to solve my problem. Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file. File1: SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07... (8 Replies)
Discussion started by: refrain
8 Replies

5. Shell Programming and Scripting

Compare two files based on column

Hi, I have two files roughly 1200 fields in length for each row, sorted on the 2nd field. I need to compare based on that 2nd column between file1 and file2 and print lines that exist in both files into separate files (I can't guarantee that every line in file1 is in file2). Example: File1: ... (1 Reply)
Discussion started by: origon
1 Replies

6. Shell Programming and Scripting

Compare based on column value

Hi Experts, I want to compare 2 text files based on their column values text1 is like prd-1234 yes no yes yes prd-2345 no no no yes prd-6475 yes yes yes no and test 2 is prd-1234 no no no yes prd-2345 yes no no no desired out put as follows prd-1234 1 3 prd-235 1 4 basically it shows... (5 Replies)
Discussion started by: tijomonmathew
5 Replies

7. Shell Programming and Scripting

Pick the column value based on another column from .csv file

My scenario is that I need to pick value from third column based on fourth column value, if fourth column value is 1 then first value of third column.Third column (2|3|4|6|1) values are cancatenated. Main imp point, in my .csv file, third column is having price value with comma (1,20,300), it has... (2 Replies)
Discussion started by: Ganesh L
2 Replies

8. Shell Programming and Scripting

Compare files column to column based on keys

Here is my situation. I need to compare two tab separated files (diff is not useful since there could be known difference between files). I have found similar posts , but not fully matching.I was thinking of writing a shell script using cut and grep and while loop but after going thru posts it... (2 Replies)
Discussion started by: blackjack101
2 Replies

9. Shell Programming and Scripting

Read CSV column value based on column name

Hi All, I am newbie to Unix I ve got assignment to work in unix can you please help me in this regard There is a sample CSV file "Username", "Password" "John1", "Scot1" "John2", "Scot2" "John3", "Scot3" "John4", "Scot4" If i give the column name as Password and row number as 4 the... (3 Replies)
Discussion started by: JohnGG
3 Replies

10. Shell Programming and Scripting

extract csv based on column value

Hi I have a csv file which is below A,5 B,6 C,10 D,7 I want the values who's second column is greater than 7 say C,10 D,7 Help me please... Thanks, Maruth (3 Replies)
Discussion started by: maruthavanan
3 Replies
Login or Register to Ask a Question