Sort csv file by duplicated column value


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Sort csv file by duplicated column value
# 1  
Old 04-24-2013
Sort csv file by duplicated column value

hello, I have a large file (about 1gb) that is in a file similar to the following:

Quote:
"Timmy","??","Age 26","1","0"
"Jack","??","Age 21","1","0"
"Troy","??","Age 21","1","0"
"Kim","?","Age 26","1","0"
"Mark","???","Age 24","1","0"
"John","??","Age 27","1","0"
I want to make it so that I can put all the duplicates where column 3 (delimited by the commas) are shown on top. Meaning all people with the same age are listed at the top.

Quote:
"Timmy","??","Age 26","1","0"
"Kim","?","Age 26","1","0"
"Jack","??","Age 21","1","0"
"Troy","??","Age 21","1","0"
"Mark","???","Age 24","1","0"
"John","??","Age 27","1","0"
The command I used was
sort -t, +2 input.csv > output.csv

I assumed that-t would make ',' be the delimiter and "+2" would look at the second column. What am I doing wrong?
# 2  
Old 04-24-2013
Not sure I understand what you target at. Would 4 lines of e.g. "Age 10" be on top, three of "Age 56" below that, and then those doubles that you show above? If doubles, what should come first? "Age 26" or "Age 21"?
Or do you just want the file to be sorted by age in descending order?
# 3  
Old 04-25-2013
basically, if I could get a list of all duplicated ages and another list of all unique values, that would work. Then I could just concatenate them back together. For the duplicated age, descending order would be perfect.
# 4  
Old 04-26-2013
This may not be the most elegant not most efficient way, but you could try:
Code:
$ awk -F, '       {TMP[$0];CNT[$3]++}
           END    {for (i in TMP) {split (i,X); print i, CNT[X[3]]}}
          ' file | sort -k3rn -k2nr | awk '{$NF=""; $1=$1; $0=$0}1'
"Kim","?","Age 26","1","0" 
"Timmy","??","Age 26","1","0" 
"Jack","??","Age 21","1","0" 
"Troy","??","Age 21","1","0" 
"John","??","Age 27","1","0" 
"Mark","???","Age 24","1","0"

 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to align/sort the column pairs of an csv file, based on keyword word specified in another file?

I have a csv file as shown below, xop_thy 80 avr_njk 50 str_nyu 60 avr_irt 70 str_nhj 60 avr_ngt 50 str_tgt 80 xop_nmg 50 xop_nth 40 cyv_gty 40 cop_thl 40 vir_tyk 80 vir_plo 20 vir_thk 40 ijk_yuc 70 cop_thy 70 ijk_yuc 80 irt_hgt 80 I need to align/sort the csv file based... (7 Replies)
Discussion started by: dineshkumarsrk
7 Replies

2. UNIX for Beginners Questions & Answers

How to sort a column in excel/csv file?

I have to sort the 4th column of an excel/csv file. I tried the following command sort -u --field-separator=, --numeric-sort -k 2 -n dinesh.csv > test.csv But, it's not working. Moreover, I have to do the same for more than 30 excel/csv file. So please help me to do the same. (6 Replies)
Discussion started by: dineshkumarsrk
6 Replies

3. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

4. Shell Programming and Scripting

How to delete 'duplicated' column values and make a delimited file too?

Hi, I have the following output from an Oracle SQL statement and I want to remove duplicated column values. I know it is possible using Oracle analytical/statistical functions but unfortunately I don't know how to use any of those. So now, I've gone to PLAN B using awk/sed maybe or any... (5 Replies)
Discussion started by: newbie_01
5 Replies

5. Shell Programming and Scripting

Compare 2 files of csv file and match column data and create a new csv file of them

Hi, I am newbie in shell script. I need your help to solve my problem. Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file. File1: SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07... (8 Replies)
Discussion started by: refrain
8 Replies

6. Shell Programming and Scripting

How to remove duplicated column in a text file?

Dear all, How can I remove duplicated column in a text file? Input: LG10_PM_map_19_LEnd 1000560 G AA AA AA AA AA GG LG10_PM_map_19_LEnd 1005621 G GG GG GG AA AA GG LG10_PM_map_19_LEnd 1011214 A AA AA AA AA GG GG LG10_PM_map_19_LEnd 1011673 T TT TT TT TT CC CC... (1 Reply)
Discussion started by: huiyee1
1 Replies

7. Shell Programming and Scripting

Sort, sed, and zero padding date column csv bash scripting

Hello people, I am having problem to sort, sed and zero padding of column in csv file. 7th column only. Input of csv file: 1,2,3,4,5,6,4/1/2010 12:00 AM,8 1,2,3,4,5,6,3/11/2010 9:39 AM,8 1,2,3,4,5,6,5/12/2011 3:43 PM,8 1,2,3,4,5,6,12/20/2009 7:23 PM,8 Output:... (5 Replies)
Discussion started by: sean1357
5 Replies

8. UNIX for Dummies Questions & Answers

duplicated lines not recognized by sort and uniq

Hello all, I've got a strange behaviour of sort and uniq commands: they do not recognise apparently duplicated lines in a file (already sorted). The lines are identical by eye, but they must differ in smth, because when they are put in two files, those have slightly different size. What can make... (8 Replies)
Discussion started by: roussine
8 Replies

9. Shell Programming and Scripting

remove duplicated lines without sort

Hi Just wondering whether or not I can remove duplicated lines without sort For example, I use the command who, which shows users who are logging on. In some cases, it shows duplicated lines of users who are logging on more than one terminal. Normally, I would do who | cut -d" " -f1 |... (6 Replies)
Discussion started by: lalelle
6 Replies
Login or Register to Ask a Question