Duplicate rows in CSV files based on values

04-15-2011

Registered User

5, 0

Join Date: Apr 2011

Last Activity: 31 May 2011, 8:17 PM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

Duplicate rows in CSV files based on values

I am new to this forum and this is my first post.

I am looking at an old post with exactly the same name. Can not paste URL because I do not have 5 posts

My requirement is exactly opposite.

I want to get rid of duplicate rows and try to append the values of columns in those rows

Input

Code:

abc, first line, value1
def, second line, value2
def, second line, value3
ghi, third line, value4

Output

Code:

abc, first line, value1
def, second line, "value2,value3"
ghi, third line, value4

Last edited by Scott; 04-18-2011 at 05:55 PM..

vbhonde11

View Public Profile for vbhonde11

Find all posts by vbhonde11

04-15-2011

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

sort and use sed, with two lines in the buffer, to fold them together.

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

04-15-2011

Registered User

5, 0

Join Date: Apr 2011

Last Activity: 31 May 2011, 8:17 PM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks a lot for your quick reply. I appreciate your help but I am new to scripting could you please add some sample code. I can modify it as per my requirement.

vbhonde11

View Public Profile for vbhonde11

Find all posts by vbhonde11

04-15-2011

Registered User

1,203, 103

Join Date: Mar 2007

Last Activity: 28 January 2020, 10:33 PM EST

Location: Orlando, Florida

Posts: 1,203

Thanks Given: 1

Thanked 103 Times in 100 Posts

See if this works for you:

Code:

#!/usr/bin/ksh
typeset -i mCnt=0
mPFld1="First_Time"
IFS=','
while read mFld1 mFld2 mValue
do
  if [[ "${mFld1}" != "${mPFld1}" || "${mFld2}" != "${mPFld2}" ]]; then
    if [[ "${mPFld1}" != "First_Time" ]]; then
      if [[ ${mCnt} -gt 1 ]]; then
        echo ${mPFld1}'COMMA'${mPFld2}'COMMA"'${mOutValue}'"'
      else
        echo ${mPFld1}'COMMA'${mPFld2}'COMMA'${mOutValue}
      fi
    fi
    mOutValue=''
    mCnt=0
  fi
  if [[ "${mOutValue}" = "" ]]; then
    mOutValue=${mValue}
  else
    mOutValue=${mOutValue}'COMMA'${mValue}
  fi
  mPFld1=${mFld1}
  mPFld2=${mFld2}
  mCnt=${mCnt}+1
done < Inp_File
if [[ "${mPFld1}" != "First_Time" ]]; then
  if [[ ${mCnt} -gt 1 ]]; then
    echo ${mPFld1}'COMMA'${mPFld2}'COMMA"'${mOutValue}'"'
  else
    echo ${mPFld1}'COMMA'${mPFld2}'COMMA'${mOutValue}
  fi
fi

Run it as follows:

Code:

>my_script > Out_File
>sed 's/COMMA/,/g' Out_File

Shell_Life

View Public Profile for Shell_Life

Find all posts by Shell_Life

04-15-2011

Registered User

5, 0

Join Date: Apr 2011

Last Activity: 31 May 2011, 8:17 PM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks a lot for your reply. Really appreciate your quick help
The output is as follows

Code:

 
abc, first line, value1
def, second line," value2, value3"
ghi, third line, value4
,,

I am sorry but one more question. If file has 3 more columns

Input

Code:

 
abc, first line, value1, col1,col2,col3
def, second line, value2, col4,col5,col6
def, second line, value3, col4,col5,col6
ghi, third line, value4, col7,col8,col9

output

Code:

 
abc, first line, value1, col1,col2,col3

def, second line," value2, value3",col4,col5,col6
ghi, third line, value4,  col7,col8,col9

will there be a major change in this code? I am trying it now. Also I am trying to get rid of those extra commas on the last line of the output file.

vbhonde11

View Public Profile for vbhonde11

Find all posts by vbhonde11

04-15-2011

Registered User

436, 107

Join Date: Feb 2011

Last Activity: 24 March 2015, 6:12 AM EDT

Posts: 436

Thanks Given: 9

Thanked 107 Times in 106 Posts

Code:

echo "abc, first line, value1, col1,col2,col3
def, second line, value2, col4,col5,col6
def, second line, value3, col4,col5,col6
ghi, third line, value4, col7,col8,col9" |sed -n -r '1h;{2,$H;x;s/(.*), "?(.*), ([^\n]*)\n\1, (.*), \3/\1 "\2, \4",\3/;h};${s/", /, /g;p}'
abc, first line, value1, col1,col2,col3
def, second line "value2, value3",col4,col5,col6
ghi, third line, value4, col7,col8,col9

---------- Post updated at 06:02 PM ---------- Previous update was at 05:52 PM ----------

awk would be much more controllable

Code:

echo "abc, first line, value1, col1,col2,col3
def, second line, value2, col4,col5,col6
def, second line, value3, col4,col5,col6
ghi, third line, value4, col7,col8,col9" |awk '{sub(",","",$4);x=$1 FS $2 FS $3 FS $5;a[x]=a[x]?a[x] FS $4:$4}END{for(i in a) {split(i,b,FS);print b[1],b[2],b[3],a[i]~FS?"\""a[i]"\",":a[i]",",b[4]}}'
abc, first line, value1, col1,col2,col3
def, second line, "value2 value3", col4,col5,col6
ghi, third line, value4, col7,col8,col9

yinyuemi

View Public Profile for yinyuemi

Find all posts by yinyuemi

04-18-2011

Registered User

4,673, 588

Join Date: Oct 2010

Last Activity: 1 February 2016, 3:35 PM EST

Location: Southern NJ, USA (Nord)

Posts: 4,673

Thanks Given: 8

Thanked 588 Times in 561 Posts

Is this csv code robust against quoted commas?

DGPickett

View Public Profile for DGPickett

Find all posts by DGPickett

Shell Programming and Scripting

Duplicate rows in CSV files based on values

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Get duplicate rows from a csv file

Discussion started by: ggupta

2. Shell Programming and Scripting

Extract and exclude rows based on duplicate values

Discussion started by: CHoggarth

3. Shell Programming and Scripting

Average values of duplicate rows

Discussion started by: Sanchari

4. Shell Programming and Scripting

Remove duplicate rows based on one column

Discussion started by: clarissab

5. Shell Programming and Scripting

How to generate a csv files by separating the values from the input file based on position?

Discussion started by: babom

6. Shell Programming and Scripting

printing 3 files side by side based on similar values in rows

Discussion started by: zerofire123

7. UNIX for Dummies Questions & Answers

forming duplicate rows based on value of a key

Discussion started by: ruby_sgp

8. Shell Programming and Scripting

how to delete duplicate rows based on last column

Discussion started by: reva

9. Shell Programming and Scripting

Duplicate rows in CSV files based on values

Discussion started by: Incrediblian

10. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Discussion started by: risk_sly