Remove the values from a certain column without deleting the Column name in a .CSV file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove the values from a certain column without deleting the Column name in a .CSV file
# 8  
Old 12-18-2013
As a first step you can for example make the difference between comma that are field separator and comma that are inside fields' values :

If there are multiple comma everywhere in you file like :
Code:
# cat txt
"a,b,c","a,b,c","a,b,c","a,b,c","a,b,c"
"a,b,c","a,b,c","a,b,c","a,b,c",5,"a,b,c",7
#

You can change the field separator to a semicolon instead of the comma... for example this way :
Code:
# sed -e ': a;s/"\([^",][^",]*\),\([^"]*\)"/"\1%\2"/' -e 't a' -e 's/,/;/g;s/%/,/g' txt
"a,b,c";"a,b,c";"a,b,c";"a,b,c";"a,b,c"
"a,b,c";"a,b,c";"a,b,c";"a,b,c";5;"a,b,c";7
#

or this way :
Code:
# sed -e 's/,/;/g' -e ': a;s/"\([^";][^";]*\);\([^"]*\)"/"\1,\2"/' -e 't a' txt
"a,b,c";"a,b,c";"a,b,c";"a,b,c";"a,b,c"
"a,b,c";"a,b,c";"a,b,c";"a,b,c";5;"a,b,c";7
#

The file can then be parsed easy using an awk -F\; ..... statement

Last edited by ctsgnb; 12-18-2013 at 02:35 PM..
This User Gave Thanks to ctsgnb For This Post:
# 9  
Old 12-18-2013
This might come close to what you want to do (tested only on mawk 1.3.3):
Code:
awk     'NR>1   {for (i=2; i<=NF; i+=2) gsub (/,/,"\001", $i)
                 FS=OFS=","; $0=$0; $5=" "; FS=OFS="\"";
                 gsub ("\001", ",")
                }
         1
        ' FS=\" OFS=\" file
column 1,column 2,column 3,column 4,column 5,column 6,column 7,column 8,column 9,column 10
12310,42324564756,"a simple string with a , comma","string with or, without commas", ,USD,12,70%,08/01/2013,
23455,12312255564,"string, with, multiple, commas,string with or, without commas,string 2,USD,433,70%,07/15/2013,,, 
23525,74535243123,"string , with commas, and - hypens and: semicolans","string with or, without commas", ,CAND,744,70%,05/06/2013,
46476,15467534544,"lengthy string, with commas, multiple: colans","string with or, without commas", ,CAND,388,70%,09/21/2013,

Unfortunately, your second line has unpaired double quotes which cannot be handled by this proposal.
This User Gave Thanks to RudiC For This Post:
# 10  
Old 12-18-2013
How has your file been generated ?

It looks it is corrupted since the line
Code:
23455,12312255564,"string,, multiple, commas,string with or, without commas,string 2,USD,433,70%,07/15/2013,

... contains only one doublequote which would need to be closed ...

Code:
# cat fin
column 1,column 2,column 3,column 4,column 5,column 6,column 7,column 8,column 9,column 10
12310,42324564756,"a simple string with a , comma","string with or, without commas",string 1,USD,12,70%,08/01/2013,
23455,12312255564,"string, with, multiple, commas,string with or, without commas",string 2,USD,433,70%,07/15/2013,
23525,74535243123,"string , with commas, and - hypens and: semicolans","string with or, without commas",string 1,CAND,744,70%,05/06/2013,
46476,15467534544,"lengthy string, with commas, multiple: colans","string with or, without commas",string 2,CAND,388,70%,09/21/2013,
#

Code:
# sed -e 's/,/;/g' -e ': a;s/"\([^";][^";]*\);\([^"]*\)"/"\1,\2"/' -e 't a' fin | awk -F\; '{$(NF-5)=z}1' OFS=\;
column 1;column 2;column 3;column 4;;column 6;column 7;column 8;column 9;column 10
12310;42324564756;"a simple string with a , comma";"string with or, without commas";;USD;12;70%;08/01/2013;
23455;12312255564;"string, with, multiple, commas,string with or, without commas";;USD;433;70%;07/15/2013;
23525;74535243123;"string , with commas, and - hypens and: semicolans";"string with or, without commas";;CAND;744;70%;05/06/2013;
46476;15467534544;"lengthy string, with commas, multiple: colans";"string with or, without commas";;CAND;388;70%;09/21/2013;
#

You could change the Output Field Separator to a comma with OFS=, but in my opinion, it would be a bad idea : that would make further parsing more complicated.

Last edited by ctsgnb; 12-18-2013 at 02:56 PM..
This User Gave Thanks to ctsgnb For This Post:
# 11  
Old 12-18-2013
This might be lengthy you may try, second line has unpaired double quotes

Code:
$ cat file
column 1,column 2,column 3,column 4,column 5,column 6,column 7,column 8,column 9,column 10
12310,42324564756,"a simple string with a , comma","string with or, without commas",string 1,USD,12,70%,08/01/2013,
23455,12312255564,"string, with, multiple, commas,string with or, without commas,string 2,USD,433,70%,07/15/2013,
23525,74535243123,"string , with commas, and - hypens and: semicolans","string with or, without commas",string 1,CAND,744,70%,05/06/2013,
46476,15467534544,"lengthy string, with commas, multiple: colans","string with or, without commas",string 2,CAND,388,70%,09/21/2013,

Code:
awk '
     NR==1
     NR > 1{
             flag = 0
             $0   = $0","                                 
while($0)  {
             match($0,/ *"[^"]*" *,|[^,]*,/)
             temp=substr($0,RSTART,RLENGTH)            
             gsub(/,$/,"",temp) 
             LFS = length($0) == 1 ? RS : OFS             
             printf "%s%s",temp = ++flag == column_mask ? "" : temp,LFS
             $0=substr($0,RLENGTH+1)                 
           }
        
           }
     ' OFS=, column_mask=5 file

Code:
$ sh tester.sh 
column 1,column 2,column 3,column 4,column 5,column 6,column 7,column 8,column 9,column 10
12310,42324564756,"a simple string with a , comma","string with or, without commas",,USD,12,70%,08/01/2013,
23455,12312255564,"string, with,, commas,string with or, without commas,string 2,USD,433,70%,07/15/2013,
23525,74535243123,"string , with commas, and - hypens and: semicolans","string with or, without commas",,CAND,744,70%,05/06/2013,
46476,15467534544,"lengthy string, with commas, multiple: colans","string with or, without commas",,CAND,388,70%,09/21/2013,

This User Gave Thanks to Akshay Hegde For This Post:
# 12  
Old 12-18-2013
All - I am sorry for the confusion. i have edited the sample file. Please take a look now...
# 13  
Old 12-18-2013
Quote:
Originally Posted by dhruuv369
All - I am sorry for the confusion. i have edited the sample file. Please take a look now...
I meanwhile also updated my post, please read #8 and #10 .

By the way in fact this should also work (add NR>1 if you need to skip the first line containing the header):
Code:
awk -F, '{$(NF-5)=z}1' OFS=, yourfile

indeed, since we count the 5th field from the last one , and since those ending fields (N-5,N-4,N-3,N-2,N-1,N) does not contain multiple comma it should work and we would not be bothered by fields containing multiple commas
or in sed :
Code:
sed '1!{s;\([^,]*\)\(\(,[^,]*\)\{5\}\)$;\2;}' yourfile.csv

Code:
# awk -F, '{$(NF-5)=z}1' OFS=, fin
column 1,column 2,column 3,column 4,,column 6,column 7,column 8,column 9,column 10
12310,42324564756,"a simple string with a , comma","string with or, without commas",,USD,12,70%,08/01/2013,
23455,12312255564,"string, with, multiple, commas,string with or, without commas",,USD,433,70%,07/15/2013,
23525,74535243123,"string , with commas, and - hypens and: semicolans","string with or, without commas",,CAND,744,70%,05/06/2013,
46476,15467534544,"lengthy string, with commas, multiple: colans","string with or, without commas",,CAND,388,70%,09/21/2013,


Last edited by ctsgnb; 12-19-2013 at 04:50 AM..
This User Gave Thanks to ctsgnb For This Post:
# 14  
Old 12-18-2013
Hi, in sed:
Code:
$ cat fich.csv
column 1,column 2,column 3,column 4,column 5,column 6,column 7,column 8,column 9,column 10
12310,42324564756,"a simple string with a , comma","string with or, without commas",string 1,USD,12,70%,08/01/2013,
23455,12312255564,"string, with, multiple, commas,string with or, without commas",string 2,USD,433,70%,07/15/2013,
23525,74535243123,"string , with commas, and - hypens and: semicolans","string with or, without commas",string 1,CAND,744,70%,05/06/2013,
46476,15467534544,"lengthy string, with commas, multiple: colans","string with or, without commas",string 2,CAND,388,70%,09/21/2013,

In red, it's the 5th field and this line have only 9 fields.
Code:
$ sed '1!{:xx;s/"/;>ICI<;/;s/"/;>ICI<;/;:yy;s/\(;>ICI<;[^,]*\),\(.*;>ICI<;\)/\1;>LA<;\2/;tyy;s/;>ICI<;/>ICI</g;/"/bxx;s/\(\([^,]*,\)\{4\}\)[^,]*/\1/;s/>ICI</"/g;s/;>LA<;/,/g;}' fich.csv
column 1,column 2,column 3,column 4,column 5,column 6,column 7,column 8,column 9,column 10
12310,42324564756,"a simple string with a , comma","string with or, without commas",,USD,12,70%,08/01/2013,
23455,12312255564,"string, with, multiple, commas,string with or, without commas",string 2,,433,70%,07/15/2013,
23525,74535243123,"string , with commas, and - hypens and: semicolans","string with or, without commas",,CAND,744,70%,05/06/2013,
46476,15467534544,"lengthy string, with commas, multiple: colans","string with or, without commas",,CAND,388,70%,09/21/2013,

Regards.
This User Gave Thanks to disedorgue For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate values in a column(not in the file)

Hi Gurus, I have a file(weblog) as below abc|xyz|123|agentcode=sample code abcdeeess,agentcode=sample code abcdeeess,agentcode=sample code abcdeeess|agentadd=abcd stereet 23343,agentadd=abcd stereet 23343 sss|wwq|999|agentcode=sample1 code wqwdeeess,gentcode=sample1 code... (4 Replies)
Discussion started by: ratheeshjulk
4 Replies

2. Shell Programming and Scripting

Identify duplicate values at first column in csv file

Input 1,ABCD,no 2,system,yes 3,ABCD,yes 4,XYZ,no 5,XYZ,yes 6,pc,noCode used to find duplicate with regard to 2nd column awk 'NR == 1 {p=$2; next} p == $2 { print "Line" NR "$2 is duplicated"} {p=$2}' FS="," ./input.csv Now is there a wise way to de-duplicate the entire line (remove... (4 Replies)
Discussion started by: deadyetagain
4 Replies

3. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

4. Shell Programming and Scripting

Fetching values in CSV file based on column name

input.csv: Field1,Field2,Field3,Field4,Field4 abc ,123 ,xyz ,000 ,pqr mno ,123 ,dfr ,111 ,bbb output: Field2,Field4 123 ,000 123 ,111 how to fetch the values of Field4 where Field2='123' I don't want to fetch the values based on column position. Instead want to... (10 Replies)
Discussion started by: bharathbangalor
10 Replies

5. Linux

Filter a .CSV file based on the 5th column values

I have a .CSV file with the below format: "column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10 "12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""... (2 Replies)
Discussion started by: dhruuv369
2 Replies

6. Shell Programming and Scripting

Check to identify duplicate values at first column in csv file

Hello experts, I have a requirement where I have to implement two checks on a csv file: 1. Check to see if the value in first column is duplicate, if any value is duplicate script should exit. 2. Check to verify if the value at second column is between "yes" or "no", if it is anything else... (4 Replies)
Discussion started by: avikaljain
4 Replies

7. Shell Programming and Scripting

How to avoid Delimiter occuring in column values in .csv file

Hello Gurus, I need to create a file from a .csv file extracting specific columns only. File structure is Column1,Column2,Column3,Column4 abcd,1234,"asdf, tew,123",123456 efgh,234,asdf,654321 My output file should have abcd,123456 efgh,654321 Can you pls help me with the code. ... (10 Replies)
Discussion started by: ritesh.bhawsar
10 Replies

8. Shell Programming and Scripting

Remove line feed from csv file column

Hi All, i have a csv file . In the 7th column i have data that has line feed in it. Requirement is to remove the line feed from the 7th column whenever it appears There are 11 columns in the file C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11 The value in C7 contains line feed ( Alt + Enter ),... (2 Replies)
Discussion started by: r_t_1601
2 Replies

9. Shell Programming and Scripting

Remove line feed from csv file column

Hi All, My requirement is to remove line (3 Replies)
Discussion started by: r_t_1601
3 Replies
Login or Register to Ask a Question