chop a data file into rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting chop a data file into rows
# 1  
Old 08-26-2006
chop a data file into rows

A very naive question... I have a file which has many rows and many columns and I would like to chop off the rows and create a new file per row named after the first column of every row + 1. The data files look like:

Code:
# Donades de la trajectoria de la particula 60001
# 1:T 2:Massa 3:Rx 4:Ry 5:Rz 6:Vx 7:Vy 8:Vz 9:R
0.90000E+01 0.163931734859943E-01 0.149392051696777E+02 -0.115339336395264E+02 0.145814510688069E-04 -0.300543576478958E+00 0.102525159716606E+00 0.265361722995294E-05 0.188735656738281E+02
0.10000E+02 0.163931734859943E-01 0.146375942230225E+02 -0.114305801391602E+02 0.172515174199361E-04 -0.302688330411911E+00 0.104193426668644E+00 0.266770848611486E-05 0.185719509124756E+02
0.11000E+02 0.163931734859943E-01 0.143338079452515E+02 -0.113255243301392E+02 0.198788038687780E-04 -0.304894536733627E+00 0.105929985642433E+00 0.256998623626714E-05 0.182681560516357E+02
0.12000E+02 0.163931734859943E-01 0.140277833938599E+02 -0.112186956405640E+02 0.223633705900284E-04 -0.307165294885635E+00 0.107739351689816E+00 0.238643792727089E-05 0.179621219635010E+02

etc...

And the new files should look like

File Splotch4_0010_part_b.asc :

Code:
0.90000E+01 0.163931734859943E-01 0.149392051696777E+02 -0.115339336395264E+02 0.145814510688069E-04 -0.300543576478958E+00 0.102525159716606E+00 0.265361722995294E-05 0.188735656738281E+02

Where the _0010_ comes from the first entry, in bold + 1

It should be easy but I don't know how... Smilie
# 2  
Old 08-26-2006
code added

Hi ,
Just telll me again the new file naming convention i will abck to you in 5 min it is simple one I could not find a word like Splotch ---- part_b in your first column and if i add to 1 with the forst row first column then also i did not get what you write
any way just tell me again


Here is the approach you can start with

say your datafile name is main.data

while read data
do
echo $data > Newfile_name.data ##based on column value this file
done<main.data

it will one row every timer from your main file and create a new file and save the row !

Last edited by jambesh; 08-26-2006 at 05:58 AM.. Reason: spelling mistake
# 3  
Old 08-26-2006
The main data file is called 60001.dat and looks like this:

Code:
# Donades de la trajectoria de la particula 60001
# 1:T 2:Massa 3:Rx 4:Ry 5:Rz 6:Vx 7:Vy 8:Vz 9:R
0.90000E+01 0.163931734859943E-01 0.149392051696777E+02 -0.115339336395264E+02 0.145814510688069E-04 -0.300543576478958E+00 0.102525159716606E+00 0.265361722995294E-05 0.188735656738281E+02
0.10000E+02 0.163931734859943E-01 0.146375942230225E+02 -0.114305801391602E+02 0.172515174199361E-04 -0.302688330411911E+00 0.104193426668644E+00 0.266770848611486E-05 0.185719509124756E+02
0.11000E+02 0.163931734859943E-01 0.143338079452515E+02 -0.113255243301392E+02 0.198788038687780E-04 -0.304894536733627E+00 0.105929985642433E+00 0.256998623626714E-05 0.182681560516357E+02
0.12000E+02 0.163931734859943E-01 0.140277833938599E+02 -0.112186956405640E+02 0.223633705900284E-04 -0.307165294885635E+00 0.107739351689816E+00 0.238643792727089E-05 0.179621219635010E+02
0.13000E+02 0.163931734859943E-01 0.137194547653198E+02 -0.111100196838379E+02 0.246281106228707E-04 -0.309503912925720E+00 0.109626539051533E+00 0.213038515539665E-05 0.176537799835205E+02
0.14000E+02 0.163931734859943E-01 0.134087514877319E+02 -0.109994144439697E+02 0.265907547145616E-04 -0.311913937330246E+00 0.111596912145615E+00 0.177728884409589E-05 0.173430595397949E+02
0.15000E+02 0.163931734859943E-01 0.130956010818481E+02 -0.108867959976196E+02 0.281609554804163E-04 -0.314399152994156E+00 0.113656342029572E+00 0.135800030420796E-05 0.170298881530762E+02
0.16000E+02 0.163931734859943E-01 0.127799272537231E+02 -0.107720699310303E+02 0.292996846837923E-04 -0.316963583230972E+00 0.115811295807362E+00 0.917207898964989E-06 0.167141857147217E+02
0.17000E+02 0.163931734859943E-01 0.124616460800171E+02 -0.106551389694214E+02 0.299970088235568E-04 -0.319611668586731E+00 0.118068918585777E+00 0.477791445518960E-06 0.163958721160889E+02
0.18000E+02 0.163931734859943E-01 0.121406736373901E+02 -0.105358953475952E+02 0.302320313494420E-04 -0.322348177433014E+00 0.120437055826187E+00 -0.355312366195903E-07 0.160748577117920E+02

it's much longer, of course, I am only posting a few rows

And what I want is to produce a row file for each row of 60001.dat looking like

Code:
0.90000E+01 0.163931734859943E-01 0.149392051696777E+02 -0.115339336395264E+02 0.145814510688069E-04 -0.300543576478958E+00 0.102525159716606E+00 0.265361722995294E-05 0.188735656738281E+02

Code:
0.10000E+02 0.163931734859943E-01 0.146375942230225E+02 -0.114305801391602E+02 0.172515174199361E-04 -0.302688330411911E+00 0.104193426668644E+00 0.266770848611486E-05 0.185719509124756E+02

etc... with names after the first entry (in black) plus one unit

--> Splotch4_BLABLABLA_partb.asc

BLABLABLA = 0.90000E+01 + 1 = 10

Therefore the first file should be called

Splotch4_0010_part9.asc
# 4  
Old 08-26-2006
complicated try

Code:
#!/usr/bin/ksh
ifile="/home/xrkrb/unix_forum/testdata"
tmpfile="/home/xrkrb/unix_forum/tmpfile.dat"
cfile="/home/xrkrb/unix_forum/cleanfile"
awk 'substr($1,1,1)!="#"' $ifile >$cfile
awk ' {num=substr($1,1,7);xp=substr($1,10,2);xp+=0;for(i=0;i<xp;i++){num*=10;}; num=num+1; printf("%s%04d%s\n","Splotch4_",num,"_par
t_b.asc");}' $cfile >$tmpfile  ##get the filenames
i=1
while read fname 
do
sed -n "$i"p $cfile >$fname
((i=i+1))
done <$tmpfile
rm -f $tmpfile $cfile

# 5  
Old 08-27-2006
Hi ranj@chn!

thanks! Now for each data file named "Splotch4_0010_part.asc" I have an additional one containing only the information of the row I looked for with gawk: "Splotch4_0010_part_b.asc"

Now it turns out that I am so silly that I don't know how to tell the script to plot first Splotch4_0010_part.asc and then immediately "Splotch4_0010_part_b.asc" Smilie

It must be easy but I am not good at all in scripting... The script looks like

------------------------------------------
Code:
for f in Splotch4*  ; do
  eps_file=`basename $f .asc.gz`.eps

  gnuplot <<EOF
    unset key
    set xrange [-15:15]
    set yrange [-15:15]
    set xlabel "X (pc)" font "Helvetica,12"
    set ylabel "Y (pc)" font "Helvetica,12"
    set terminal postscript eps enhanced ; set output "$eps_file"
    set pointsize 1
    plot "< zcat $f" using 3:4 with dots lt 8, plot "< zcat $f | gawk -v NAME=60001 '/AS/ {TIME=$4} $1==NAME {print TIME,$2,$3,$4,$5,$6,$7,$8,$11; exit}'" using 3:4 with points pt 6 lt -1
    unset output
EOF
done

---------------------------------
The part containing "plot "< zcat $f | gawk -v NAME=60001 '/AS/ {TIME=$4} $1==NAME {print TIME,$2,$3,$4,$5,$6,$7,$8,$11; exit}'" using 3:4 with points pt 6 lt -1"
should be replaced with "gnuplot please now plot the file Splotch4_0010_part_b.asc before you plot the following one, Splotch4_0011_part.asc"

I know this is a silly question, but since you've been in this story since the beginning and you seem to be good at scripting too, I resort to you Smilie
# 6  
Old 08-27-2006
a doubt

I have not used gnuplot. But tell me one thing - Do you mean to say that in this line -
Code:
plot "< zcat $f" using 3:4 with dots lt 8, plot "< zcat $f | gawk -v NAME=60001

the first '$f' should be Splotch4_0010_part.asc and the next '$f' should be Splotch4_0010_part_b.asc. Or is it something else.
# 7  
Old 08-27-2006
I wanto to firstly plot the Splotch4_0010_part.asc and then immediately the _b_ corresponding I created thanks to your script, Splotch4_0010_part_b.asc

So that it should be something like

Code:
for f in Splotch4_*_part.asc and g in Splotch4_*_part_b.asc ; do
  eps_file=`basename $f .asc.gz`.eps

  gnuplot <<EOF
    unset key
    set xrange [-15:15]
    set yrange [-15:15]
    set xlabel "X (pc)" font "Helvetica,12"
    set ylabel "Y (pc)" font "Helvetica,12"
    set terminal postscript eps enhanced ; set output "$eps_file"
    set pointsize 1
    plot "< zcat $f" using 3:4 with dots lt 8, plot "< zcat $g" using 3:4 with points pt 6 lt -1
    unset output
EOF
done

where $f are my files and $g the companion files created with your script

I don't know how to define a second variable in the "for" block Smilie

Of course the dark red comment is wrong, but I guess you get the idea
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove new line characters from data rows in a Pipe delimited file?

I have a file as below Emp1|FirstName|MiddleName|LastName|Address|Pincode|PhoneNumber 1234|FirstName1|MiddleName2|LastName3| Add1 || ADD2|123|000000000 2345|FirstName2|MiddleName3|LastName4| Add1 || ADD2| 234|000000000 OUTPUT : ... (1 Reply)
Discussion started by: styris
1 Replies

2. Shell Programming and Scripting

Moving or copying first rows and last rows into another file

Hi I would like to move the first 1000 rows of my file into an output file and then move the last 1000 rows into another output file. Any help would be great Thanks (6 Replies)
Discussion started by: kylle345
6 Replies

3. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

I have 2 files, file01= 7 columns, row unknown (but few) file02= 7 columns, row unknown (but many) now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there e.g. file 01 James|0|50|25|10|50|30... (1 Reply)
Discussion started by: A-V
1 Replies

4. UNIX for Dummies Questions & Answers

Merge rows in bid data file

Dear all, Please help me ,,,, if I have input file like this A_AA960715 leucine-rich repeat-containing protein GO:0006952 defense response P A_AA960715 leucine-rich repeat-containing protein GO:0008152 metabolic process P A_AA960715 leucine-rich... (5 Replies)
Discussion started by: AAWT
5 Replies

5. UNIX for Dummies Questions & Answers

Suggestion to convert data in rows to data in columns

Hello everyone! I have a huge dataset looking like this: nameX nameX 0 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ............... nameY nameY 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ..... nameB nameB 0 1 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 ..... (can be several thousands of codes) and I need... (8 Replies)
Discussion started by: kush
8 Replies

6. Shell Programming and Scripting

How to insert/expand data/rows in a file, with rules??

Hi, I am rather new to Unix/Linus. I have this problem that I would like to solve using unix. Here is what I have start stop expression 1 5 15 2 6 10 I want a output like this position expression 1 15 2 25 3 ... (3 Replies)
Discussion started by: wanghlv
3 Replies

7. Shell Programming and Scripting

how to add if data in rows

Hi Friends, How to add if data is in different rows. Input: 1;20091102;20170930;-9.00;| 1;20091026;20170930;-2.00;| 1;20100720;20170930;-25.00;| 1;20090901;20211227;-10.00;| Output 9+2+25+10 = 46 Thx Suresh (4 Replies)
Discussion started by: suresh3566
4 Replies

8. UNIX for Dummies Questions & Answers

Search for & edit rows & columns in data file and pipe

Dear unix gurus, I have a data file with header information about a subject and also 3 columns of n rows of data on various items he owns. The data file looks something like this: adam peter blah blah blah blah blah blah car 01 30 200 02 31 400 03 57 121 .. .. .. .. .. .. n y... (8 Replies)
Discussion started by: tintin72
8 Replies

9. Shell Programming and Scripting

Need to Chop Header and Footer record from input file

Hi, I need to chope the header and footer record from an input file and make a new output file, please let me know how i can do it in unix.thanks. (4 Replies)
Discussion started by: coolbudy
4 Replies

10. Shell Programming and Scripting

chomp and chop in perl

What is the differnece between chomp and chop in perl? (1 Reply)
Discussion started by: seismic_willy
1 Replies
Login or Register to Ask a Question