Sorting a .csv using awk or other


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sorting a .csv using awk or other
# 1  
Old 04-18-2011
Sorting a .csv using awk or other

Hello all, I am new here and *relatively* new to Unix. I have a bit of an emergency. I have a three column file that I need to sort:

sample name, miRNA, reads per million (RPM)

There are multiple samples, and for each sample name there are multiple miRNAs and associated RPMs. Some of these miRNAs overlap between samples, but not all, so here is what the input file looks like

Sample 1, mirna1,RPM
Sample 1, mirna3,RPM
Sample 1, mirna4, RPM
Sample 2, mirna2,RPM
Sample2, mirna3, RPM
Sample2, mirna4, RPM

I need an output file that looks like this
miRNA1 miRNA2 .... miRNAx
Sample 1 RPM RPM
Sample 2 RPM RPM
Sample 3 RPM RPM
etc


Basically, each row should be a sample and each column should have the RPM corresponding the the miRNA for that sample. I also need to put a zero if there is no value for an miRNA in a specific sample. The output should be csv as well.

Any ideas or help would be greatly appreciated. I am trying to use awk but I'm lost and very much in a rush. MOST SINCERE THANKS IN ADVANCE!
# 2  
Old 04-19-2011
Hello dunnyboctor

You could try with this shell script, but it may be rather slow if you have very large files. It might be rather awkward if the format is a bit flexible. I will have a try with a simple one and we can vary it if the input really is as you describe:-
Code:
#!/bin/ksh
inputfile=/tmp/myfile.input
outputfile=/tmp/myfile.output

for sample in `sort -n +1 $inputfile|cut -f1 -d","`
do
   echo "\n${sample} \c"              # Start new output record
   for mirna in `grep "^${sample}," $inputfile | cut -f2 -d "," | sort -n`
   do
      line=`grep "^${sample}, ${mirna}," $inputfile"`
      rpm="${line##*,}"                  # Get last field based on comma separator
      echo "${mirna} \c"                  # Append but do not end output record
   done
done >> $outputfile

The other option, if you know the number of "sample"s and the number of "mirna"s then you write a loop similar to the following:-

Code:
#!/bin/ksh
inputfile=/tmp/myfile.input
outputfile=/tmp/myfile.output
typeset -i max_sample max_mirna RPM
read max_sample?"How many samples? "
read max_mirna?"How many mirna? "

{
echo "Sample, \c"
mirna=1
while [ $mirna -le $max_mirna ]
do
   echo "miRNA${mirna}, \c"           # Build up title line
   ((mirna=$mirna+1))
done

# Now process the data records
sample=1
while [ $sample -le $max_sample ]
do
   echo "\n${sample},\c"              # Start new line
   mirna=1
   while [ $mirna -le $max_mirna ]
   do
      line=`grep "^Sample${sample}, mirna${mirna}, " $inputfile`
      RPM="${line##*,}"               # Get last field only
      echo "${RPM},\c"
      ((mirna=$mirna+1))
   done
   ((sample=$sample+1))
done ; echo                           # Complete last record
} > $outputfile


Give those a try and see if that helps. They may be a bit flakey if the input format varies, so do let us know if I need to handle that a bit better.





Robin
Liverpool/Blackburn
UK
# 3  
Old 04-19-2011
Give this awk script a go...
Code:
awk -F, '{
   v = $2
   gsub(/[aA-zZ ]/, "", v)
   if (!x[$1])
      x[$1] = $1
   n = split(x[$1], a, " ") 
   a[v+2] = $NF
   x[$1] = ""
   for (i=1; i<=v+2; i++)
      x[$1] = sprintf("%s%s ", x[$1] ? x[$1] : "", a[i] ? a[i] : 0)
} END {for (i in x) print x[i]}' file

# 4  
Old 04-19-2011
Thank you so much for your replies! I found temporary route using an alternate program (pedsys and then some additional work in excel) but I will definitely start with these ideas and see where I get because in the future we will definitely need a quicker solution than what I resorted to yesterday! Besides that, I need to practice and play to learn Smilie
Again, THANK YOU!!
# 5  
Old 04-19-2011
Quote:
Originally Posted by dunnybocter
Thank you so much for your replies! I found temporary route using an alternate program (pedsys and then some additional work in excel) but I will definitely start with these ideas and see where I get because in the future we will definitely need a quicker solution than what I resorted to yesterday! Besides that, I need to practice and play to learn Smilie
Again, THANK YOU!!
Did you try the awk script yet...it does what you are looking for.
# 6  
Old 04-19-2011
Not yet, but likely tomorrow. Since I am most comfortable with awk I will probably start there. Thx again!
# 7  
Old 04-20-2011
Quote:
Originally Posted by dunnybocter
Not yet, but likely tomorrow. Since I am most comfortable with awk I will probably start there. Thx again!
I updated the awk script also so that it now outputs in csv format instead of ssv...so give it a try.
Code:
awk -F, '{
   v = $2
   gsub(/[aA-zZ ]/, "", v)
   if (!x[$1])
      x[$1] = $1
   n = split(x[$1], a, " ")
   a[v+2] = $NF+0
   x[$1] = ""
   v = (v < n ? n : v+2)
   for (i=1; i<=v; i++)
      x[$1] = sprintf("%s%s%s", x[$1] ? x[$1] : "", a[i] ? a[i] : 0, i < v ? " " : "")
} END {
   for (i in x) {
      gsub(" ", ",", x[i])
      sub(",", " ", x[i])
      print x[i]
   }
}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sorting a CSV file by DOB

I have absolutaly no idea how to get this script to sort the info in Birthdays.csv by date of birth. I know the sort -n command, however i wish to sort the file birthdays.csv by DOB. How would i go about doing this? The below script gets user info and date of birth and then puts these info a... (3 Replies)
Discussion started by: redshine6
3 Replies

2. Shell Programming and Scripting

Sorting the csv file in Perl

Hi All How all are doing today. Just struck in an issue in Perl I have a csv file which contain 32 column, I want to make sorting in that csv file with respect to 26th column. Is it possible to do so without any module being added? Regards Aditya ---------- Post updated at 10:02 AM... (11 Replies)
Discussion started by: adisky123
11 Replies

3. Shell Programming and Scripting

CSV Sorting on only particular columns

Hello! So ive been presented with this comma-delimited file: I need a print to look as below " lastname, phone_number, zip for every person with a last name starting with the letter H, I only with a 650-area code phone number. output should be sorted by reverse ZIP code " I only have... (5 Replies)
Discussion started by: strangemachine
5 Replies

4. Shell Programming and Scripting

awk read column csv and search in other csv

hi, someone to know how can i read a specific column of csv file and search the value in other csv columns if exist the value in the second csv copy entire row with all field in a new csv file. i suppose that its possible using awk but i m not expertise thanks in advance (8 Replies)
Discussion started by: giankan
8 Replies

5. Shell Programming and Scripting

CSV to SQL insert: Awk for strings with multiple lines in csv

Hi Fellows, I have been struggling to fix an issue in csv records to compose sql statements and have been really losing sleep over it. Here is the problem: I have csv files in the following pipe-delimited format: Column1|Column2|Column3|Column4|NEWLINE Address Type|some descriptive... (4 Replies)
Discussion started by: khayal
4 Replies

6. Shell Programming and Scripting

Need help in writing a routine for sorting a CSV file

Hi, I have a CSV file in following manner: server1,env1,patch1 server1,env1,patch2 server1,env1,patch3 server1,env2,patch1 server1,env2,patch3 server2,env3,patch1 server2,env3,patch5 server2,env4,patch1 server3,env6,patch1 server3,env7,patch2 server3,env7,patch3 I want to... (6 Replies)
Discussion started by: avikaljain
6 Replies

7. UNIX and Linux Applications

UNIX sorting - csv file

Hi, Please help me to solve sorting in CSV file. I have 25 columns in my CSV. (Delimiter is ",") Summary columns are from 10 to 13 which are should not be sorted. From 1-9 and 14-25 should be able to sort in ascending. how should i do this using Sort command in unix. If no simple commands pls... (0 Replies)
Discussion started by: rajani_p
0 Replies

8. Shell Programming and Scripting

sorting csv file based on column selected

Hi all, in my csv file it'll look like this, and of course it may have more columns US to UK;abc-hq-jcl;multimedia UK to CN;def-ny-jkl;standard DE to DM;abc-ab-klm;critical FD to YM;la-yr-tym;standard HY to MC;la-yr-ytm;multimedia GT to KJ;def-ny-jrt;critical I would like to group... (4 Replies)
Discussion started by: tententen
4 Replies

9. Shell Programming and Scripting

Help sorting .csv file

Hi, I have a .csv file which contains script names, subjects and email_addresses The first two colums are always script name and subject, the next 20 colums are email address.. What i want to do is sort the email address in alphabetical order for each row and there's around 1200 rows. So... (3 Replies)
Discussion started by: Jazmania
3 Replies

10. Shell Programming and Scripting

awk sorting

Hi, I have used the following code to sort two sets of data: awk '{printf "%10s %s\n",$1,$2}' The first column is text and the second involves numbers. I was just wondering how i would go about sorting the second number so that they ascend from the top? Thanks for any help (4 Replies)
Discussion started by: Jaken
4 Replies
Login or Register to Ask a Question