Replace columns in .csv using other .csv columns


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Replace columns in .csv using other .csv columns
# 1  
Old 08-13-2015
Replace columns in .csv using other .csv columns

Hi,

I have 2 csv files with 15 000 lines, which looks like this:

Daily.csv
Code:
"CODE","BRAND","DESIGNER","SIZE","TYPE","GENDER","SET","DESCRIPTION","IMAGE","COST","WEIGHT","MSRP","UPC"
"M-1001","212","Caroline Her","1.7 oz","EDT Spray","Men","","text.","http://www.domain.com/products/Men/Final/M-1001large.jpg","31.08","1","57.00","8411061341704"
"M-1003","1881","Nino Cer","1.7 oz","EDT Spray","Men","","text.","http://www.domain.com/products/Men/Final/M-1003large.jpg","24.13","1","36.00","688575003642"
"M-1004","1881","Nino Cer","3.4 oz","EDT Spray","Men",""," text.","http://www.domain.com/products/Men/Final/M-1004large.jpg","37.51","1","50.00","688575003659"

GoodImages.csv

Code:
CODE,Images URL
M-1001,http://www.domain.com/overstock-images/M-1001arger.jpg
M-1004,http://www.domain.com/overstock-images/M-1004larger.jpg
W-C-4948,http://www.domain.com/overstock-images/W-C-4948larger.jpg

I need to replace field 9 "IMAGE" from Daily.csv with field 2 "Images URL" from GoodImages.csv only if both field 1 match.

Output is desired as:
Code:
"CODE","BRAND","DESIGNER","SIZE","TYPE","GENDER","SET","DESCRIPTION","IMAGE","COST","WEIGHT","MSRP","UPC"
"M-1001","212","Caroline Her","1.7 oz","EDT Spray","Men","","text.","http://www.domain.com/overstock-images/M-1001larger.jpg","31.08","1","57.00","8411061341704"
"M-1003","1881","Nino Cer","1.7 oz","EDT Spray","Men","","text.","http://www.domain.com/overstock-images/M-1004larger.jpg","24.13","1","36.00","688575003642"
"M-1004","1881","Nino Cer","3.4 oz","EDT Spray","Men",""," text.","http://www.domain.com/products/Men/Final/M-1004large.jpg","37.51","1","50.00","688575003659"

I would like to run it from a cron job if possible.

Thank You
# 2  
Old 08-13-2015
You have tags for bash, ksh, awk, and grep on this thread. What operating system and shell are you using?

What have you tried?

Where are your input files located? Where do you want the output to go (what filename and in what directory)?
# 3  
Old 08-13-2015
  • CENTOS 6.7 x86_64 standard - server
  • bash
Both csv files are in the main directory
Code:
/home/oaperf/public_html

the output file could go to
Code:
/home/oaperf/public_html/output

My knowledge is extremely limited Smilie when it come to that it is all new, so I did not try much. I used excel to get this done but I replace even non matching items plus it is not automated, I wrote a script in php that download the csv.
# 4  
Old 08-13-2015
Try:
Code:
awk 'NR==FNR{A[q $1 q]=q $2 q; next} $1 in A{$9=A[$1]}1' q=\" FS=, OFS=, GoodImages.csv Daily.csv > output

The trouble with csv format like that is that it allows al kinds of exceptions. A complication for example would be if there can be commas within the quotes, particular field #3, which would mess up the field numbers. This could be mitigated by counting from back to front (since that last values are numerical with a dot a decimal separator and no thousands separators), so:
Code:
awk 'NR==FNR{A[q $1 q]=q $2 q; next} $1 in A{$(NF-4)=A[$1]}1' q=\" FS=, OFS=, GoodImages.csv Daily.csv > output

Technically the URL's could also contain commas. If that may be the case, then it will be more complex to make this reliable. Another problem could be if there are double quotes within the double quotes. If that will not be the case then this might be enough..

Last edited by Scrutinizer; 08-13-2015 at 03:34 AM..
# 5  
Old 08-13-2015
I started off with a fairly simple awk script, but (except for the header line) it didn't produce any output that matched what you said you wanted. So I added some debugging code to see where I was going wrong. The output I see seems to indicate that I do not understand your requirements at all. I think the code I had (without the debugging statements shown in red produces the output that your description said you wanted, but it doesn't match any of the lines in the sample output that you said should be produced:
Code:
#!/bin/ksh
cd /home/oaperf/public_html
awk -F, -v OFS=, '
FNR == 1 {f++}
FNR == NR {
	if(NR > 1)
		i["\""$1"\""] = "\""$2"\""
	next
}
f == 2 {e[$1] = $9
	next
}
{	print ""
	 print "$0:" $0
	printf(" Input $9:%s\n", $9)
	printf("Expect $9:%s\n", e[$1])
	printf("  Good $2:%s\n", ($1 in i) ? i[$1] : "")
}
$1 in i {
	$9 = i[$1]
}
{	printf("Output $9:%s\n", $9)}
1' GoodImages.csv expected.csv Daily.csv

produces the output:
Code:
 
$0:"CODE","BRAND","DESIGNER","SIZE","TYPE","GENDER","SET","DESCRIPTION","IMAGE","COST","WEIGHT","MSRP","UPC"
 Input $9:"IMAGE"
Expect $9:"IMAGE"
  Good $2:
Output $9:"IMAGE"
"CODE","BRAND","DESIGNER","SIZE","TYPE","GENDER","SET","DESCRIPTION","IMAGE","COST","WEIGHT","MSRP","UPC"

$0:"M-1001","212","Caroline Her","1.7 oz","EDT Spray","Men","","text.","http://www.domain.com/products/Men/Final/M-1001large.jpg","31.08","1","57.00","8411061341704"
 Input $9:"http://www.domain.com/products/Men/Final/M-1001large.jpg"
Expect $9:"http://www.domain.com/overstock-images/M-1001larger.jpg"
  Good $2:"http://www.domain.com/overstock-images/M-1001arger.jpg"
Output $9:"http://www.domain.com/overstock-images/M-1001arger.jpg"
"M-1001","212","Caroline Her","1.7 oz","EDT Spray","Men","","text.","http://www.domain.com/overstock-images/M-1001arger.jpg","31.08","1","57.00","8411061341704"

$0:"M-1003","1881","Nino Cer","1.7 oz","EDT Spray","Men","","text.","http://www.domain.com/products/Men/Final/M-1003large.jpg","24.13","1","36.00","688575003642"
 Input $9:"http://www.domain.com/products/Men/Final/M-1003large.jpg"
Expect $9:"http://www.domain.com/overstock-images/M-1004larger.jpg"
  Good $2:
Output $9:"http://www.domain.com/products/Men/Final/M-1003large.jpg"
"M-1003","1881","Nino Cer","1.7 oz","EDT Spray","Men","","text.","http://www.domain.com/products/Men/Final/M-1003large.jpg","24.13","1","36.00","688575003642"

$0:"M-1004","1881","Nino Cer","3.4 oz","EDT Spray","Men",""," text.","http://www.domain.com/products/Men/Final/M-1004large.jpg","37.51","1","50.00","688575003659"
 Input $9:"http://www.domain.com/products/Men/Final/M-1004large.jpg"
Expect $9:"http://www.domain.com/products/Men/Final/M-1004large.jpg"
  Good $2:"http://www.domain.com/overstock-images/M-1004larger.jpg"
Output $9:"http://www.domain.com/overstock-images/M-1004larger.jpg"
"M-1004","1881","Nino Cer","3.4 oz","EDT Spray","Men",""," text.","http://www.domain.com/overstock-images/M-1004larger.jpg","37.51","1","50.00","688575003659"

Note that in each set of output lines, the line starting with "$0" is the line from your Daily.cv file, the line starting with "Input $9" is the 9th input field from Daily.csv, the line starting with "Expect $9" is the 9th field in the output you said you expected, the line starting with "Good $2" is empty if there is no corresponding entry in the GoodImages.csv file corresponding to this input line or the corresponding 2nd field from that file (with added quotes) if there was a match, the line starting with "Output $9" is the output this script produces in field 9, and the final line is the complete output line this script thinks should be printed to match your stated requirements.

If the last line in each output set of lines does match what you were trying to produce, remove all of the stuff in red in the above script and redirect the output to your desired destination pathname. Otherwise, please explain what I misunderstood about your requirements so we might be able to help you diagnose the problem and fix it.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Import 2 columns from 8 .csv files into pandas df (side by side) and write a new csv

I have 8 .csv files with 16 columns and "n" rows with no Header. I want to parse each of these .csv and get column and put the data into a new.csv. Once this is done, the new.csv should have 16 columns (2 from each input.csv) and "n" rows. Now, I want to just take the average of Column from... (3 Replies)
Discussion started by: Zam_1234
3 Replies

2. Shell Programming and Scripting

Match columns from two csv files and update field in one of the csv file

Hi, I have a file of csv data, which looks like this: file1: 1AA,LGV_PONCEY_LES_ATHEE,1,\N,1,00020460E1,0,\N,\N,\N,\N,2,00.22335321,0.00466628 2BB,LES_POUGES_ASF,\N,200,200,00006298G1,0,\N,\N,\N,\N,1,00.30887539,0.00050312... (10 Replies)
Discussion started by: djoseph
10 Replies

3. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

4. Shell Programming and Scripting

Deleting all the fields(columns) from a .csv file if all rows in that columns are blanks

Hi Friends, I have come across some files where some of the columns don not have data. Key, Data1,Data2,Data3,Data4,Data5 A,5,6,,10,, A,3,4,,3,, B,1,,4,5,, B,2,,3,4,, If we see the above data on Data5 column do not have any row got filled. So remove only that column(Here Data5) and... (4 Replies)
Discussion started by: ks_reddy
4 Replies

5. Shell Programming and Scripting

CSV Sorting on only particular columns

Hello! So ive been presented with this comma-delimited file: I need a print to look as below " lastname, phone_number, zip for every person with a last name starting with the letter H, I only with a 650-area code phone number. output should be sorted by reverse ZIP code " I only have... (5 Replies)
Discussion started by: strangemachine
5 Replies

6. UNIX for Advanced & Expert Users

Help in Deleting columns and Renaming Mutliple columns in a .Csv File

Hi All, i have a .Csv file in the below format startTime, endTime, delta, gName, rName, rNumber, m2239max, m2239min, m2239avg, m100016509avg, m100019240max, metric3min, m100019240avg, propValues 11-Mar-2012 00:00:00, 11-Mar-2012 00:05:00, 300.0, vma3550a, a-1_CPU Index<1>, 200237463, 0.0,... (9 Replies)
Discussion started by: mahi_mayu069
9 Replies

7. Shell Programming and Scripting

Keep only certain columns in a csv file

I have a file that has 500+ columns but only need 15 of those and all the other ones can be removed. What is the best way to do this? Can I get some examples please? (3 Replies)
Discussion started by: GroveTuckey
3 Replies

8. UNIX for Advanced & Expert Users

Unix Bash: substitute columns in .csv using other .csv columns

Hi All, I have two .csv's input.csv having values as (7 columns) ABC,A19907103,ABC DEV YUNG,2.17,1000,2157,07/07/2006 XYZ,H00213850,MM TRUP HILL,38.38,580,23308,31/08/2010 output.csv having (25 columns) A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y... (4 Replies)
Discussion started by: abhivyas
4 Replies

9. Shell Programming and Scripting

no of columns in csv file

Hi, I would like to know how can I find the number of columns in each line in a csv file. I have a csv file wiht comma separated an dhas 10000 line in it. I want to verify the number of columns in each line. is there any way to find it out using awk or any other commands? Thanks.... (2 Replies)
Discussion started by: McLan
2 Replies

10. UNIX for Advanced & Expert Users

Adding columns in csv

I have the following data in FILE1.CSV: code: Amount1: Amount2: xxxxx ,, 200 ,,400 yyxxa ,,200 bbcgu ,,2500 ,,300 i want to be able to produce the following FILE2.CSV: code: Amount xxxxx ,, 600... (7 Replies)
Discussion started by: chachabronson
7 Replies
Login or Register to Ask a Question