Merge 2 CSV files using sed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merge 2 CSV files using sed
# 8  
Old 04-20-2011
One row from file1 is missing using this logic

output got:
Code:
"20","9900000007","1","A","2"
"3","9900000001","1","A","2"
"13","9900000004","0","A","2"
"17","9900000005","1","A","2"
"4","9900000003"

output expected:
Code:
"20","9900000007","1","A","2"
"3","9900000001","1","A","2"
"13","9900000004","0","A","2"
"17","9900000005","1","A","2"
"4","9900000003"
"3","9900000002"

# 9  
Old 04-20-2011
Based on your example in post 4 and assuming that you really mean the second column and only the second column
Code:
$ join -a 1 -a 2 -t, -j 2 file1 file2 | awk -F, '{ if (NF==6) print $2,$1,$4,$5,$6; else print $2,$1,$3,$4,$5}'
"20" "9900000007" "1" "A" "2"
"3" "9900000001" "1" "A" "2"
"3" "9900000002"
"4" "9900000003"
"13" "9900000004" "0" "A" "2"
"17" "9900000005" "1" "A" "2"

# 10  
Old 04-24-2011
Smilie
Quote:
Originally Posted by NewToSed
Thanks,

But in my example i need to compare only the second column, I am posting the original contents of my files

file1:
Code:
"20","9900000007"
"3","9900000001"
"3","9900000002"
"4","9900000003"

file2
Code:
"20","9900000007","1","A","2"
"3","9900000001","1","A","2"
"13","9900000004","0","A","2"
"17","9900000005","1","A","2"

outPutFile:
Code:
"20","9900000007","1","A","2"
"3","9900000001","1","A","2"
"3","9900000002"
"4","9900000003"
"13","9900000004","0","A","2"
"17","9900000005","1","A","2"


I have written this script, which has 2 problem
Code:
#!/bin/bash
while read updateline
do
   updateField=`echo $updateline | awk -F "," '{print $2}'`
   while read initialline
   do
      initialField=`echo $initialline | awk -F "," '{print $2}'`
      if [ "$initialField" == "$updateField" ]; then
         sed -e "s/$initialline/$updateline/" <file1 
      fi
   done < file1 
done < file2 >tmpUpdateFile

output of this script:
Code:
"20","9900000007","1","A","2"
"3","9900000001"
"3","9900000002"
"4","9900000003"
"20","9900000007"
"3","9900000001","1","A","2"
"3","9900000002"
"4","9900000003"

This is not the expected result, problem are,
1. During each loop, the content is getting appended
2. In my script, in the else part how to append the variable $updateline.

Thanks....

Moderator's Comments:
Mod Comment Please use [code] and [/code] tags when posting code, data or logs etc. to preserve formatting and enhance readability, thanks.

I try to write a script for compare columns issue Smilie
if wanted more can customized more further (for exa i can add "-F" (field separator so we can change other chars instead of comma) parameter. in this script field separator (column separator) is comma for your examples)

lets we look your examples ..

lets run script for your files.
Code:
# cat file1
"20","9900000007"
"3","9900000001"
"3","9900000002"
"4","9900000003"

Code:
# cat file2
"20","9900000007","1","A","2"
"3","9900000001","1","A","2"
"13","9900000004","0","A","2"
"17","9900000005","1","A","2"


as far i mean , you want second column compare on files then run the script for compare 2 column.our compare column parameter is 'c'
our command is like below and results..
Code:
# ./justdoit c=2
['file1'] contains non-digit chars or columns!
['file2'] contains non-digit chars or columns!
--> Showing compare results..
==================================
"20","9900000007","1","A","2"
"3","9900000001","1","A","2"
"3","9900000002"
"4","9900000003"
"13","9900000004","0","A","2"
"17","9900000005","1","A","2"
=================================


and we can use the script for another compare examples..


below file has same columns then the script will ask add or not.
and if we use as 2 column for compare process then script

in file1 has only one column with "9" line and script fails for compare non-first columns choice..also file1 has second column with non-number (uncomparable)!! .. and also it has multiple lines then the script ask this issue for add or not these..

Code:
# cat file1
"20","9900000007"
"3","9900000001"
"3","9900000002"
"4","9900000003"
"4","9900000003","AAAAAAA"
"9"
"4","9900000003","AAAAAAA"
"11","NOTNUMBER","BBBBBB","CCCCCC"

Code:
# cat file2
"20","9900000007","1","A","2"
"3","9900000001","1","A","2"
"13","9900000004","0","A","2"
"17","9900000005","1","A","2"
"17","9900000005","1","A","2"
"17","9900000005","1","A","2","CCCCCC"

Code:
# ./justdoit c=2
['file1'] contains non-digit chars or columns!
['file2'] contains non-digit chars or columns!
['file1'] file, ['"11","NOTNUMBER","BBBBBB","CCCCCC"'] line, ['2'] numbered ['NOTNUMBER'] column, has contain non-digit chars!!
sed: -e expression #1, char 31: invalid reference \1 on `s' command's RHS
Check your values.. file ['file1'], line ['"9"'], compare column ['2']
"SED will give error!! And the result is partially false!!!"
sed: -e expression #1, char 31: invalid reference \1 on `s' command's RHS
sed: -e expression #1, char 31: invalid reference \1 on `s' command's RHS
Multiple lines have same number of columns for compare column ['9900000003'] !!
"4","9900000003","AAAAAAA"
"4","9900000003","AAAAAAA"
Multiple lines add to results [y/n]? y
--> Showing compare results..
==================================
"4","9900000003","AAAAAAA"
"4","9900000003","AAAAAAA"
"20","9900000007","1","A","2"
"3","9900000001","1","A","2"
"3","9900000002"
"17","9900000005","1","A","2","CCCCCC"
"13","9900000004","0","A","2"
=================================

of course examples are reproduced.
and maybe happens false results for unforeseeable issues that i forgot variations..

Code:
#!/bin/bash
## justdoit##
## SED column comparison ##
 
maxcolfind()
{
local col='"[^ ]*"'
for file in file1 file2 ; do
maxcol=0 ; z=0 ; newcol=$col
while [[ $(sed -n "/^$newcol/p" $file) ]] ; do
((maxcol++))
newcol="$newcol,$col" ; newcol=$(echo $newcol|sed 's/ ,/,/g')
done
filec[z]=$maxcol ; ((z++))
done
if [[ ${filec[z-1]} -ge ${filec[z]} ]] ; then maxcol=${filec[z-1]} ; else maxcol=${filec[z]} ; fi
}
maxcolfind ""
 
nondigitsfind()
{
for file in file1 file2 ; do
if [[ $(sed -n '/[^0-9]/p' $file) ]] ; then echo "['$file'] contains non-digit chars or columns!" ; fi
done
}
nondigitsfind ""
 
maxcoltmp=$maxcol; countcompare=$maxcol; countcomparereset=$maxcol; listresults=$maxcol
compcol=$( echo "$1" | sed "s/c=\([1-$maxcol]\)/\1/" )
col='"[^ ]*"' ;maxfmt=() ; charforrhs='"[A-Za-z0-9][A-Za-z0-9]*"' ; colsed='\"\[A-Za-z0-9]\[A-Za-z0-9]\*\",'
 
if [ ! $(echo "$compcol" | sed -n "/^[1-$maxcol]$/p") ] ; then
echo "You entered an invalid value for compare column!!
You have max $maxcol --> $col <-- string!!
Usage $0 c=[1-$maxcol] (c--> Compare Which Column)"
exit 1
fi
 
while [ $(( maxcol -= 1 )) -ge 0 ]
 do
  maxfmt=(${maxfmt[@]} $charforrhs)
 done
fmtarr=( $(echo ${maxfmt[@]} | sed 's/\" /",/g') )
x=0 ; compcolarr[x]=${fmtarr[@]} ; colsedtmp=$colsed
while [ $(( maxcoltmp -= 1 )) -ge 1 ]
do
 ((x++)) ; compcolarr[x]=$(echo ${fmtarr[@]}|sed "s/$colsed//")
 colsed=$colsed$colsedtmp
done
 
>newout
x=0;count=0 ; first=ok
for j in file1 file2
do
while [ $(( countcompare -= 1 )) -ge 0 ] ;
 do
 if [[ $(sed -n "/^${compcolarr[x]}$/p" $j) ]] ; then
 for i in $(sed "" $j)
  do
   if [[ $(echo $i|sed -n "/^${compcolarr[x]}$/p") ]] ; then
      findcompcol=$(echo ${compcolarr[x]}|sed 's/\[A-Za-z0-9]\[A-Za-z0-9]\*/\\(&\\)/'$compcol' ' )
    (echo $i|sed "s/$findcompcol/\1/") 2>&1 >/dev/null ## test columns for compare values according to file content
      if [ $? != 0 ] ; then
         echo "Check your values.. file ['$j'], line ['$i'], compare column ['$compcol'] "
         echo "\"SED will give error!! And the result is partially false!!!\" "
      fi
    if [[ ! $(echo $i|sed "s/$findcompcol/\1/"|sed -n '/[^0-9]/p') ]] ; then
      okline="$i" ; isstring=no
    else
      falsecompcol=$(echo $i|sed "s/$findcompcol/\1/")
      echo "['$j'] file, ['$i'] line, ['$((compcol))'] numbered ['$falsecompcol'] column, has contain non-digit chars!!"
      isstring=yes ;
    fi
     if [ ! $isstring = "yes" ] ; then
     comparr=$(echo $okline|sed "s/$findcompcol/\1/" )
     isadded=no ;
     fi
   fi
 
if [ ! $isadded = "yes" ] && [ ! $isstring = "yes" ] ; then
    if [ "$first" = "ok" ] ; then
      addedarr=(${addedarr[@]} $comparr ) ; first=notok ; isadded=yes
    else
     for k in ${addedarr[@]}
      do
        if [[ $comparr = $k ]] ; then
         isadded=yes ;break
        fi
      done
        if [ ! "$isadded" = "yes" ] ; then
          addedarr=(${addedarr[@]} $comparr )
        fi
    fi
fi
 done
 fi
 ((x++)) ; ((count++))
 done
countcompare=$countcomparereset ;x=0; count=0
done
 
l=0 ; listreset=$listresults
for i in ${addedarr[@]}
 do
   for x in "$(sed -n "/$i/p" file1 file2)"
    do
     while [ $(( listresults -= 1 )) -ge 0 ] ;
     do
     if [[ $(echo "$x"|sed -n "/^${compcolarr[l]}$/p") ]] ; then
       addthis=$(echo "$x"|sed -n "/^${compcolarr[l]}$/p")
       if [[ $(echo "$addthis"|sed -n "/$i/p" | sed = | sed -e '$!N;s/\n/ /g' -ne '$s/^\([0-9]\).*/\1/p' ) != 1 ]] ; then
         echo "Multiple lines have same number of columns for compare column ['$i'] !!"
         echo "$addthis"
         read -p "Multiple lines add to results [y/n]? " ch
         if [ $ch = 'y' ] ; then
          echo "$addthis" >>newout
         fi
       else
          echo "$addthis" >>newout
       fi
         break
     fi
     ((l++))
     done
    done
     listresults=$listreset ; l=0
 done
 
echo "--> Showing compare results..
==================================
$(more newout)
================================="


regards
ygemici

Last edited by ygemici; 04-24-2011 at 01:30 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merge the three csv files as one according to first coloumn.

I have three files with similar pattern i need to merge all the coloumns side by side from all three files according to the first coloumn example as shown below I mentioned 5 coloumns only in example but i have around 15 coloumns in each file. file1: Name,Samples,Error,95RT,90RT... (4 Replies)
Discussion started by: Raghuram717
4 Replies

2. Shell Programming and Scripting

Compare and merge two big CSV files

Hi all, i need help. I have two csv files with a huge amount of data. I need the first column of the first file, to be compared with the data of the second, to have at the end a file with the data not present in the second file. Example File1: (only one column) profile_id 57036226... (11 Replies)
Discussion started by: SirMannu
11 Replies

3. Shell Programming and Scripting

I am trying to merge all csv files from source path into 1 file

I am trying to merge all csv files from source path into one single csv file in target. but getting error message: hadoop fs -cat /user/hive/warehouse/stage.db/PK_CLOUD_CHARGE/TCH-charge_*.csv > /user/hive/warehouse/stage.db/PK_CLOUD_CHARGE/final/TCH_pb_charge.csv getting error message:... (0 Replies)
Discussion started by: cplusplus1
0 Replies

4. UNIX for Dummies Questions & Answers

Merge two csv files using column name

Hi all, I have two separate csv files(comma delimited) file 1 and file 2. File 1 contains PAN,NAME,Salary AAAAA5467D,Raj,50000 AAFAC5467D,Ram,60000 BDCFA5677D,Kumar,90000 File 2 contains PAN,NAME,Dept,Salary ASDFG6756T,Karthik,ABC,450000 QWERT8765Y,JAX,CDR,780000... (5 Replies)
Discussion started by: Nivas
5 Replies

5. Shell Programming and Scripting

Merge CSV files

I have lot of csv file collected from script like below : Name of files (some examples) there are thousands of it: 192.168.0.123_251_18796_1433144473.csv 192.168.0.123_251_18796_1433144772.csv 192.168.0.123_251_18796_1433145073.csv 192.168.0.123_251_18796_1433145372.csvContent of each... (5 Replies)
Discussion started by: rk4k
5 Replies

6. Shell Programming and Scripting

Merge 2 last columns in a csv file using sed or perl

Hello everyone, want to merge 2 last columns 3rd and 4rd (if it exists). I have this 1;2;1;1 2;3;1;1 1;1;2 1;2;3;4 1;1;2 1;2;3;1 Desired output: 1;2;1 1 2;3;1 1 1;1;2 1;2;3 4 1;1;2 1;2;3 1 (3 Replies)
Discussion started by: satir
3 Replies

7. UNIX for Dummies Questions & Answers

Need help combining txt files w/ multiple lines into csv single cell - also need data merge

:confused:Hello -- i just joined the forums. I am a complete noob -- only about 1 week into learning how to program anything... and starting with linux. I am working in Linux terminal. I have a folder with a bunch of txt files. Each file has several lines of html code. I want to combine... (2 Replies)
Discussion started by: jetsetter
2 Replies

8. Shell Programming and Scripting

Merge *.csv files, each in separate sheets

Does anyone know how to Merge *.csv files, each in seperate sheets? (7 Replies)
Discussion started by: frhling
7 Replies

9. Shell Programming and Scripting

Merge CSV files and create a column with the filename from the original file

Hello everyone!! I am not completely new to shell script but I havent been able to find the answer to my problem and I'm sure there are some smart brains here up for the challenge :D. I have several CSV files that I need to combine into one, but I also need to know where each row came from.... (7 Replies)
Discussion started by: fransanchezoria
7 Replies

10. Shell Programming and Scripting

Merge 2 csv files with awk

I have 2 files pipe delimted and want to merge them based on a key e.g file 1 123$aaa$yyy$zzz 345$xab$yzy$zyz 456$sss$ttt$foo 799$aaa$ggg$dee file 2 123$hhh 345$ddd 456$xxx 888$zzz so if the key is the first field, and the result should be the common key between file 1 and 2 (6 Replies)
Discussion started by: loloAix
6 Replies
Login or Register to Ask a Question