Multicolumn csv output

08-06-2008

Registered User

5, 0

Join Date: Aug 2008

Last Activity: 8 August 2008, 5:22 PM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

Multicolumn csv output

I'm revising a quick and dirty script I wrote for my work that takes a series of data files (tab delimited) and condenses them into one. Essentially, each original file looks like these examples:

Code:

A1  data
B1 data
C1 data

Code:

A2 data
B2 data
C2 data

The script currently squashes them into the final form:

Code:

A1,data
A2,data
A3,data
B1,data
B2,data
B3,data
...

What I'm having difficulty implementing is instead changing the data format to something like:

Code:

A1,data,B1,data,C1,data
A2,data,B2,data,C2,data
A3,data,B3,data,C3,data

I'm attaching my script below... I'd appreciate any feedback-- I'm very new to this!

Code:

#!/bin/bash
#ATR Kinetic Analysis Script for KDE
#author: ParadoxDruid
#created March 13, 2008
#modified August 6th, 2008

#say hello
kdialog --msgbox "Signalyze ATR Data Converter\nPress OK to select working directory"
#navigate to proper directory
DIRECTORY=`kdialog --getexistingdirectory .`
#get necessary info from user
NAME=`kdialog --title "Trial Name" --inputbox "Name of your trial"`
echo $NAME" Trial Set" > $DIRECTORY/$NAME.csv
echo -e "Seq,Point,Integral,sd" >> $DIRECTORY/$NAME.csv
#filename set
SET=`kdialog --title "Sets" --inputbox "What is the name of your results files? (i.e.  kinetics-1_  )"`
#end filename set

#probe sequence detection
cat $DIRECTORY/"$SET"1.stx | sed '1,22d' > $DIRECTORY/$NAME.temp
SEQNUM=`cat $DIRECTORY/$NAME.temp | wc -l`
#end probe sequence detection

#data points auto-detection
POINTS=`ls $DIRECTORY | grep $SET | grep stx | sed 's/'$SET'//' | sed 's/\.stx//' | sort -n | tail -n1`
#end data points auto detection

#probe names routine
for ((c=1;c<$SEQNUM+1;c++)); do
sequences[$c]=`cat $DIRECTORY/$NAME.temp | head -n$c | tail -n1 | cut -f 1`
done
#end routine

#increment the number of data points, since the ATR doesn't use 0
let "TRIALS=$POINTS+1"

#define a function to grab data
seqgrab() {
for ((i=1;i<$TRIALS;i++)); do
data=`cat $DIRECTORY/"$SET"$i.stx | grep -w $1 | cut -f 5`
sd=`cat $DIRECTORY/"$SET"$i.stx | grep -w $1 | cut -f 6`
 echo -e $1 "," $i "," $data "," $sd  >> $DIRECTORY/$NAME.csv
done
}

#tell about our progress
dcopRef=`kdialog --progressbar "Initialising" ${#sequences[@]}`

#iterate through the data files
for ((n=1;n<${#sequences[@]}+1;n++)); do
seqgrab ${sequences[${n}]}
dcop $dcopRef setProgress $n
dcop $dcopRef setLabel "Working..."
done

rm $DIRECTORY/$NAME.temp
dcop $dcopRef close
kdialog --textbox $DIRECTORY/$NAME.csv 440 800
exit 0

Thanks again for any help!

Paradoxdruid

View Public Profile for Paradoxdruid

Find all posts by Paradoxdruid

08-06-2008

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

You might want to use the "paste" command instead - seems like it could do all the work.

I hope this helps.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

08-06-2008

Registered User

5, 0

Join Date: Aug 2008

Last Activity: 8 August 2008, 5:22 PM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

The paste command does look promising, and I feel silly for not knowing it.

However, I may have oversimplified my explanation above to try and make it legible.
Here's a snippet from two actual origin files:

Code:

Time 6
Probe_Name    Count    Net_Signal    Net_Signal_SD    Net_Integral    Net_Integral_SD    Proc_Control    
A1    5    0.04594    0.01175    0.81596    0.23182    OK    
B1    3    0.02464    0.00381    0.59647    0.15367    OK    
C1    5    0.13487    0.02862    2.54441    0.29700    OK

Code:

Time 7
Probe_Name    Count    Net_Signal    Net_Signal_SD    Net_Integral    Net_Integral_SD    Proc_Control    
A1    5    0.04545    0.01211    0.82307    0.24171    OK    
B1    3    0.02332    0.00557    0.56161    0.10771    OK    
C1    5    0.13672    0.02963    2.54276    0.26535    OK    
D1    5    0.14061    0.07675    2.58301    1.31850    OK

All I want to preserve is the first column (A1) and the last two numbers (0.81596 and 0.23182), in a time-dependent fashion, so that my output shows that data named A1 had value 0.81596 at time 6, and 0.82307 at time 7. A current actual final data snippet:

Code:

Seq,Point,Integral,sd
A1 ,1,0.81596,0.23182
A1 ,2,0.81793,0.2443
A1 ,3,0.82073,0.24254
A1 ,4,0.82307,0.24171
A1 ,5,0.81935,0.23554
B1 ,1,0.59647,0.15367
B1 ,2,0.57585,0
B1 ,3,0.55278,0.11597
B1 ,4,0.56161,0.10771
B1 ,5,0.49331,0.08419
C1 ,1,2.54441,0.297

Currently, I do this by iterating over the time/files with cut to grab the correct fields (greping the name) and save them, but I output it via echo, which makes it difficult to line up multiple columns.

I'll look at paste more, but I'd appreciate further ideas, too. There's probably an easy way to do with with awk or paste or something that i just haven't seen. Thanks!

Paradoxdruid

View Public Profile for Paradoxdruid

Find all posts by Paradoxdruid

08-06-2008

Registered User

1,613, 160

Join Date: Oct 2007

Last Activity: 12 February 2019, 12:19 PM EST

Location: USA

Posts: 1,613

Thanks Given: 40

Thanked 160 Times in 150 Posts

I agree with Bakunin. paste can do all that for you with a little help from tr

Code:

paste -s file1 file2 file3 ... | tr '\t' ','

shamrock

View Public Profile for shamrock

Find all posts by shamrock

08-06-2008

Registered User

5, 0

Join Date: Aug 2008

Last Activity: 8 August 2008, 5:22 PM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by shamrock

I agree with Bakunin. paste can do all that for you with a little help from tr Smilie

Code:

paste -s file1 file2 file3 ... | tr '\t' ','

I'm not sure I understand how I can use paste to make multiple columns.
When I stripped out header info from a few test files and ran a command like the above, it gave me

Code:

A1(time 1) and all numerical fields
B1(time 1) and all numerical fields
...

A1(time 2) and all numerical fields
B1(time 2) and all numerical fields
...

What I'm looking for is a final output like:

Code:

A1(time 1) and fields 5 and 6,B1(time 1) and fields 5 and 6, C1(time 1) and fields 5 and 6,...
A1(time 2) and fields 5 and 6,B1(time 2) and fields 5 and 6, C1(time 2) and fields 5 and 6,... 
...

where the columns are the data for different names (A1,B1, etc-- separate lines in the original file) and the rows are iterating through each file for the different timepoints.

Thanks again!

Paradoxdruid

View Public Profile for Paradoxdruid

Find all posts by Paradoxdruid

08-07-2008

Registered User

1,305, 26

Join Date: Jun 2007

Last Activity: 11 November 2016, 3:44 AM EST

Location: Beijing China

Posts: 1,305

Thanks Given: 0

Thanked 26 Times in 26 Posts

below one should be ok. But it supposes that there is no "|" in your file.

If not, just replace "|" with another special character which will never appear in your file.

Code:

paste -d"|" file1 file2 file3 | tr "|" "\n"

summer_cherry

View Public Profile for summer_cherry

Find all posts by summer_cherry

08-07-2008

Registered User

101, 0

Join Date: Jun 2008

Last Activity: 29 April 2010, 11:34 AM EDT

Posts: 101

Thanks Given: 0

Thanked 0 Times in 0 Posts

Check if this is what u r looking for and check if its helpful

cat file1 file2 file3 | sort | awk '{ct=1;}$1==prv{ct++;}{printf("%s\n",$1 "," ct "," $5 "," $6);ct=1;prv=$1}'

sudhamacs

View Public Profile for sudhamacs

Find all posts by sudhamacs

Shell Programming and Scripting

Multicolumn csv output

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to translate df -h output into a CSV format?

Discussion started by: new2prog

2. Shell Programming and Scripting

Save output of updated csv file as csv file itself, part 2

Discussion started by: refrain

3. Shell Programming and Scripting

Save output of updated csv file as csv file itself

Discussion started by: refrain

4. Shell Programming and Scripting

Output to csv contains quotes

Discussion started by: strykergli250hp

5. Shell Programming and Scripting

Csv format output file using scirpt

Discussion started by: Optimus81

6. Shell Programming and Scripting

awk math and csv output

Discussion started by: nakaedu

7. Shell Programming and Scripting

compare 2 CSV fields from same diff output

Discussion started by: gvolpini

8. Shell Programming and Scripting

handling CSV file to get desired output

Discussion started by: deepakiniimt

9. Shell Programming and Scripting

format output in csv file

Discussion started by: Prashant Jain

10. Shell Programming and Scripting

taking output in csv file from perl

Discussion started by: prakash.gr