How to align/sort the column pairs of an csv file, based on keyword word specified in another file?


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers How to align/sort the column pairs of an csv file, based on keyword word specified in another file?
# 1  
Old 05-01-2019
How to align/sort the column pairs of an csv file, based on keyword word specified in another file?

I have a csv file as shown below,
Code:
xop_thy		80	avr_njk		50	str_nyu		60
avr_irt		70	str_nhj		60	avr_ngt		50
str_tgt		80	xop_nmg		50	xop_nth		40
cyv_gty		40	cop_thl		40	vir_tyk		80
vir_plo		20	vir_thk		40	ijk_yuc		70		
cop_thy		70	ijk_yuc		80	irt_hgt		80

I need to align/sort the csv file based on the order mentioned in another file (keyword file) as shown below,
Code:
xop
avr
str
cyv
vir
cop
ijk
irt

The desired output is shown below,
Code:
xop_thy		80	xop_nmg		50	xop_nth		40
avr_irt		70	avr_njk		50	avr_ngt		50
str_tgt		80	str_nhj		60	str_nyu		60
cyv_gty		40		
vir_plo		20	vir_thk		40	vir_tyk		80		
cop_thy		70	cop_thl		40	
			ijk_yuc		80	ijk_yuc		70
						irt_hgt		80

The major condition is the column to be arranged pair wise (while rearrangement the string column should take the concerned value column adjacent to it together) like wise the pairs of columns to be rearranged based on the keywords. Another problem the keyword is the starting letters of each column, the keyword file has only starting strings of the columns strings. each column pairs are having common keyword but after the underscore symbol it vary. Therefore, I do not know how to make code for it. If it number or alphabet based sorting I can use
Quote:
sort
function. But here i could not use it due to the complexity of condition. I am not sure, Is it possible to do the same. if it possible please help me.
Thanks in advance.
# 2  
Old 05-01-2019
I don't think you get very far with sort. Try instead
Code:
awk -F"\t" '
NR==FNR {for (i=1; i<=3; i++)   {IX = (i-1)*3+1
                                 split ($IX, T, "_")
                                 O[T[1] FS i] = $IX FS FS $(IX+2)
                                }
         next
        }
        {for (i=1; i<=3; i++)  printf "%s%s", O[$1 FS i] (O[$1 FS i]?_:FS FS) , i==3?ORS:FS
        }
' file.csv file.key
xop_thy        80    xop_nmg        50    xop_nth        40
avr_irt        70    avr_njk        50    avr_ngt        50
str_tgt        80    str_nhj        60    str_nyu        60
cyv_gty        40                        
vir_plo        20    vir_thk        40    vir_tyk        80
cop_thy        70    cop_thl        40            
                     ijk_yuc        80    ijk_yuc        70
                                          irt_hgt        80

This User Gave Thanks to RudiC For This Post:
# 3  
Old 05-01-2019
Sorry Rudic,
When I tried your code, it gives output like this
Code:
xop_thy     80     avr_njk     50     str_nyu     60								
avr_irt     70     str_nhj     60     avr_ngt     50								
str_tgt     80     xop_nmg     50     xop_nth     40								
cyv_gty     40     cop_thl     40     vir_tyk     80								
vir_plo     20     vir_thk     40     ijk_yuc     70								
cop_thy     70     ijk_yuc     80     irt_hgt     80


Last edited by dineshkumarsrk; 05-01-2019 at 09:14 AM..
# 4  
Old 05-02-2019
Quote:
Originally Posted by dineshkumarsrk
Sorry Rudic,
When I tried your code, it gives output like this
Code:
xop_thy     80     avr_njk     50     str_nyu     60								
avr_irt     70     str_nhj     60     avr_ngt     50								
str_tgt     80     xop_nmg     50     xop_nth     40								
cyv_gty     40     cop_thl     40     vir_tyk     80								
vir_plo     20     vir_thk     40     ijk_yuc     70								
cop_thy     70     ijk_yuc     80     irt_hgt     80

That is not surprising. In post #1 in this thread you said you had CSV input files and used an example that used <tab> characters as the character that separates values. The code RudiC provided explicitly specified the <tab> character as the field separator.

In this post, however, there are no <tab> characters; only sequences of <space>s. And, since the number of spaces between fields is not a constant, we can't say that your field separator is a sequence of 8 <space>s or of 9 <space>s.

If you don't accurately describe your input file format, it is hard to guess at what might work with whatever random data format you decide to use when you run code that was designed to use the input format you originally specified.
These 2 Users Gave Thanks to Don Cragun For This Post:
# 5  
Old 05-02-2019
Dear Don,
That is why, I changed my data set as comma separated file and modified the awk input code as
Code:
awk -F","

and
Code:
awk -F,

respectively. But, this modification also failed to generate desired output as I mentioned.
# 6  
Old 05-02-2019
Quote:
Originally Posted by dineshkumarsrk
Dear Don,
That is why, I changed my data set as comma separated file and modified the awk input code as
Code:
awk -F","

and
Code:
awk -F,

respectively. But, this modification also failed to generate desired output as I mentioned.
I haven't seen where you mentioned anything at all about changing your sample input file format nor about changing the code RudiC suggested.

Please show us your new sample input file and the complete code that you are using to process that input file to produce the output you showed us in post #3. Note that the code RudiC suggested with the changes you showed us in post #5 would not produce output that looks at all like the output you showed us in post #3!
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 05-02-2019
I have changed my input dataset as given below,
Code:
xop_thy,80,avr_njk,50,str_nyu,60
avr_irt,70,str_nhj,60,avr_ngt,50
str_tgt,80,xop_nmg,50,xop_nth,40
cyv_gty,40,cop_thl,40,vir_tyk,80
vir_plo,20,vir_thk,40,ijk_yuc,70
cop_thy,70,ijk_yuc,80,irt_hgt,80

Then, I have modified the code suggested by rudic as follows,
Code:
awk -F, '
NR==FNR {for (i=1; i<=3; i++)   {IX = (i-1)*3+1
                                 split ($IX, T, "_")
                                 O[T[1] FS i] = $IX FS FS $(IX+2)
                                }
         next
        }
        {for (i=1; i<=3; i++)  printf "%s%s", O[$1 FS i] (O[$1 FS i]?_:FS FS) , i==3?ORS:FS
        }
' org1.csv key.txt > test.csv

and
Code:
awk -F "," '
NR==FNR {for (i=1; i<=3; i++)   {IX = (i-1)*3+1
                                 split ($IX, T, "_")
                                 O[T[1] FS i] = $IX FS FS $(IX+2)
                                }
         next
        }
        {for (i=1; i<=3; i++)  printf "%s%s", O[$1 FS i] (O[$1 FS i]?_:FS FS) , i==3?ORS:FS
        }
' org1.csv key.txt > test.csv

Both modification generated the output as follows,
For awk -F,
Code:
xop_thy,,avr_njk,,,,,,
avr_irt,,str_nhj,,,,,,
str_tgt,,xop_nmg,,,,,,
cyv_gty,,cop_thl,,,,,,
vir_plo,,vir_thk,,,,,,
cop_thy,,ijk_yuc,,,,,,
,,,,,,,,
,,,,,,,,
,,,,,,,,

For awk -F","
Code:
xop_thy,,avr_njk,,,,,,
avr_irt,,str_nhj,,,,,,
str_tgt,,xop_nmg,,,,,,
cyv_gty,,cop_thl,,,,,,
vir_plo,,vir_thk,,,,,,
cop_thy,,ijk_yuc,,,,,,
,,,,,,,,
,,,,,,,,
,,,,,,,,

However, I tried the modified code for (awk -F, and awk -F",") tab separated data set and got output as given below,
Code:
xop_thy	80	avr_njk	50	str_nyu	60,,,,,,,,
avr_irt	70	str_nhj	60	avr_ngt	50,,,,,,,,
str_tgt	80	xop_nmg	50	xop_nth	40,,,,,,,,
cyv_gty	40	cop_thl	40	vir_tyk	80,,,,,,,,
vir_plo	20	vir_thk	40	ijk_yuc	70,,,,,,,,
cop_thy	70	ijk_yuc	80	irt_hgt	80,,,,,,,,
,,,,,,,,
,,,,,,,,
,,,,,,,,

Note: The out shown here is viewed in text editor.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to sort a column in excel/csv file?

I have to sort the 4th column of an excel/csv file. I tried the following command sort -u --field-separator=, --numeric-sort -k 2 -n dinesh.csv > test.csv But, it's not working. Moreover, I have to do the same for more than 30 excel/csv file. So please help me to do the same. (6 Replies)
Discussion started by: dineshkumarsrk
6 Replies

2. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Hi, I tried filtering the records in a csv file using "awk" command listed below. awk -F"~" '$4 ~ /Active/{print }' inputfile > outputfile The output always has all the entries. The same command worked for different users from one of the forum links. content of file I was... (3 Replies)
Discussion started by: sunilmudikonda
3 Replies

3. Shell Programming and Scripting

Get maximum per column from CSV file, based on date column

Hello everyone, I am using ksh on Solaris 10 and I'm gathering data in a CSV file that looks like this: 20170628-23:25:01,1,0,0,1,1,1,1,55,55,1 20170628-23:30:01,1,0,0,1,1,1,1,56,56,1 20170628-23:35:00,1,0,0,1,1,2,1,57,57,2 20170628-23:40:00,1,0,0,1,1,1,1,58,58,2... (6 Replies)
Discussion started by: ejianu
6 Replies

4. Shell Programming and Scripting

Fetching values in CSV file based on column name

input.csv: Field1,Field2,Field3,Field4,Field4 abc ,123 ,xyz ,000 ,pqr mno ,123 ,dfr ,111 ,bbb output: Field2,Field4 123 ,000 123 ,111 how to fetch the values of Field4 where Field2='123' I don't want to fetch the values based on column position. Instead want to... (10 Replies)
Discussion started by: bharathbangalor
10 Replies

5. Linux

Filter a .CSV file based on the 5th column values

I have a .CSV file with the below format: "column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10 "12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""... (2 Replies)
Discussion started by: dhruuv369
2 Replies

6. UNIX for Dummies Questions & Answers

Sort csv file by duplicated column value

hello, I have a large file (about 1gb) that is in a file similar to the following: I want to make it so that I can put all the duplicates where column 3 (delimited by the commas) are shown on top. Meaning all people with the same age are listed at the top. The command I used was ... (3 Replies)
Discussion started by: jl487
3 Replies

7. Shell Programming and Scripting

Pick the column value based on another column from .csv file

My scenario is that I need to pick value from third column based on fourth column value, if fourth column value is 1 then first value of third column.Third column (2|3|4|6|1) values are cancatenated. Main imp point, in my .csv file, third column is having price value with comma (1,20,300), it has... (2 Replies)
Discussion started by: Ganesh L
2 Replies

8. Shell Programming and Scripting

Sort file based on column

Hi, My input file is $cat samp 1 siva 1 raja 2 siva 1 siva 2 raja 4 venkat i want sort this name wise...alos need to remove duplicate lines. i am using cat samp|awk '{print $2,$1}'|sort -u it showing raja 1 (3 Replies)
Discussion started by: rsivasan
3 Replies

9. Shell Programming and Scripting

sorting csv file based on column selected

Hi all, in my csv file it'll look like this, and of course it may have more columns US to UK;abc-hq-jcl;multimedia UK to CN;def-ny-jkl;standard DE to DM;abc-ab-klm;critical FD to YM;la-yr-tym;standard HY to MC;la-yr-ytm;multimedia GT to KJ;def-ny-jrt;critical I would like to group... (4 Replies)
Discussion started by: tententen
4 Replies
Login or Register to Ask a Question