Copying and pasting columns from different files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Copying and pasting columns from different files
# 1  
Old 04-10-2010
Copying and pasting columns from different files

hi all, first time posting, hoping i can get some help on this.

I have about 80 text files containing text in this format:

Code:
# Rg         Mass         Density     Rcm-x         Rcm-y         Area         Rsph/sqrt(2)
# ==         ====         =======     =====         =====         ====         ============
3.757608     16651.000000     168.191925     4.556663     238.047684     98.9999965813     3.96942547344     
3.021153     9158.000000     136.686569     77.605812     268.392334     66.9999991001     3.26548325465     
2.822630     6997.000000     162.720932     152.255829     256.117035     42.9999995329     2.61603944495     
3.416244     14350.000000     166.860458     161.680557     252.263550     86.0000036677     3.69963856173     
2.299375     5884.000000     140.095245     164.127808     242.561691     41.99999793     2.5854413901     
2.384314     4527.000000     129.342850     182.847580     246.922028     35.0000019328     2.36017440744     
1.943436     3887.000000     117.787880     269.891937     255.048630     32.9999996604     2.2917488934     
3.308430     14034.000000     169.084335     270.984741     269.790710     83.0000011533     3.63453714591     
3.476575     15121.000000     164.358688     283.803314     246.089340     92.0000042833     3.82651998948     
3.821949     16808.000000     158.566040     288.413605     107.780937     105.999998486     4.10736210694

what i need my script to do is the following:

ignore the column headers and copy only the numbers under columns Rcm-x and Rcm-y from every file and paste them to a new file on top of each other and add a column with a label describing which file the data is coming from.

so if I have n files, the end result would look like this( end result should be without the Rcm-x, Rcm-y and z headers):

Rcm-x Rcm-y z
### ### 1
### ### 1
. . .
. . .
. . .
### ### 2
### ### 2
. . .
. . .
. . .
### ### n

thanks for the help
# 2  
Old 04-10-2010
Hi, Arlamos:

Welcome to the forum.

Code:
awk 'FNR>2{print $4, $5, FILENAME}' filenames

Regards,
Alister
# 3  
Old 04-10-2010
And here's the corresponding Perl one-liner -

Code:
$ 
$ 
$ cat file1.txt
# Rg          Mass            Density       Rcm-x           Rcm-y           Area             Rsph/sqrt(2)
# ==          ====            =======       =====           =====           ====             ============
3.757608      16651.000000    168.191925      4.556663      238.047684       98.9999965813   3.96942547344
3.021153       9158.000000    136.686569     77.605812      268.392334       66.9999991001   3.26548325465
2.822630       6997.000000    162.720932    152.255829      256.117035       42.9999995329   2.61603944495
$ 
$ cat file2.txt
# Rg          Mass            Density       Rcm-x           Rcm-y           Area             Rsph/sqrt(2)
# ==          ====            =======       =====           =====           ====             ============
3.416244      14350.000000    166.860458    161.680557      252.263550       86.0000036677   3.69963856173
2.299375       5884.000000    140.095245    164.127808      242.561691       41.99999793     2.5854413901
2.384314       4527.000000    129.342850    182.847580      246.922028       35.0000019328   2.36017440744
$ 
$ cat file3.txt
# Rg          Mass            Density       Rcm-x           Rcm-y           Area             Rsph/sqrt(2)
# ==          ====            =======       =====           =====           ====             ============
1.943436       3887.000000    117.787880    269.891937      255.048630       32.9999996604   2.2917488934
3.308430      14034.000000    169.084335    270.984741      269.790710       83.0000011533   3.63453714591
3.476575      15121.000000    164.358688    283.803314      246.089340       92.0000042833   3.82651998948
3.821949      16808.000000    158.566040    288.413605      107.780937      105.999998486    4.10736210694
$ 
$ # Perl one-liner
$ perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV if eof' file?.txt
4.556663 238.047684 file1.txt
77.605812 268.392334 file1.txt
152.255829 256.117035 file1.txt
161.680557 252.263550 file2.txt
164.127808 242.561691 file2.txt
182.847580 246.922028 file2.txt
269.891937 255.048630 file3.txt
270.984741 269.790710 file3.txt
283.803314 246.089340 file3.txt
288.413605 107.780937 file3.txt
$ 
$

Use printf instead of print if a formatted output is desired.

HTH,
tyler_durden
# 4  
Old 04-11-2010
thanks to both of you.

I don't quite understand the AWK syntax so I tried the perl line and aside from a couple of details it does exactly what I need.

I have only few followup questions.

the perl command is including an extra line with the filename at the bottom of the output from each file:

Code:
500.423065 3.526756 3.dat
503.648834 82.129471 3.dat
506.528046 21.388363 3.dat
510.004608 12.926329 3.dat
  3.dat
21.214518 273.515930 4.dat
33.278835 282.828949 4.dat
29.159582 270.977661 4.dat

how do I get rid of the "extra" 3.dat at the bottom?

How can I specify a range of input files to the perl line, so that instead of
Code:
perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV if eof' 1.dat
perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV if eof' 2.dat

i can perhaps do something like
Code:
perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV if eof' 0.dat .. 82.dat

lastly, I'd like to get rid of the ".dat" file handle from the output if that is at all possible without renaming all the files.

thanks again guys, your knowledge and willingness to help has already saved me hours.
# 5  
Old 04-11-2010
Quote:
Originally Posted by Arlamos
thanks to both of you.

I don't quite understand the AWK syntax so I tried the perl line and aside from a couple of details it does exactly what I need.

I have only few followup questions.

the perl command is including an extra line with the filename at the bottom of the output from each file:

Code:
500.423065 3.526756 3.dat
503.648834 82.129471 3.dat
506.528046 21.388363 3.dat
510.004608 12.926329 3.dat
  3.dat
21.214518 273.515930 4.dat
33.278835 282.828949 4.dat
29.159582 270.977661 4.dat

how do I get rid of the "extra" 3.dat at the bottom?

How can I specify a range of input files to the perl line, so that instead of
Code:
perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV if eof' 1.dat
perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV if eof' 2.dat

i can perhaps do something like
Code:
perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV if eof' 0.dat .. 82.dat

lastly, I'd like to get rid of the ".dat" file handle from the output if that is at all possible without renaming all the files.

thanks again guys, your knowledge and willingness to help has already saved me hours.
With a little modification of alister's code:

Code:
awk 'FNR==1{sub(".dat","",FILENAME)}FNR>2{print $4, $5, FILENAME}' {0..82}.dat

Explanation of the code:

Code:
FNR==1{sub(".dat","",FILENAME)}

Remove the .dat part of the filename on the 1st line of a file

Code:
awk 'FNR>2

If the line number is greater then 2

Code:
print $4, $5, FILENAME}

print the desired fields

Last edited by Franklin52; 04-11-2010 at 12:55 PM.. Reason: Correcting code
# 6  
Old 04-11-2010
Quote:
Originally Posted by Arlamos
...
the perl command is including an extra line with the filename at the bottom of the output from each file:

Code:
500.423065 3.526756 3.dat
503.648834 82.129471 3.dat
506.528046 21.388363 3.dat
510.004608 12.926329 3.dat
  3.dat
21.214518 273.515930 4.dat
33.278835 282.828949 4.dat
29.159582 270.977661 4.dat

how do I get rid of the "extra" 3.dat at the bottom?
First of all, you need to know why the extra filename is added at the bottom. The Perl script splits your input line due to the "-a" option, and the tokens are filled in a system-defined array called "@F". Therefore, $F[3] is the 4th element of the array (it starts with index 0). And $F[4] is the 5th element.

Now, my hunch is that you have an extra blank line at the end of your file "3.dat". You can check this by using "cat -n 3.dat". In the output below, I see line number 7 but no data in there which means it is a blank line.

Code:
$ 
$ 
$ cat -n 3.dat
     1    # Rg          Mass            Density       Rcm-x            Rcm-y           Area             Rsph/sqrt(2)
     2    # ==          ====            =======       =====            =====           ====             ============
     3    0.086692       2850.681081    345.545735    607.726550       255.130096      138.0069921579   1.35210017578
     4    0.834407      22025.882038     40.423520    986.900503       548.502799      863.9181962071   5.10301645927
     5    0.731624       6960.747103    645.505466    457.952424       691.618854      820.1901098519   5.88622435063
     6    0.511055      17352.058392    426.119742    800.869647       671.749259      752.8314025622   3.07408757877
     7    
$ 
$

The Perl one-liner splits this blank line faithfully and assigns null strings to $F[3] and $F[4]. And the print command prints nulls for $F[3] and $F[4] and the filename for $ARGV. See below -

Code:
$ 
$ perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV  if eof' 3.dat
607.726550 255.130096 3.dat
986.900503 548.502799 3.dat
457.952424 691.618854 3.dat
800.869647 671.749259 3.dat
  3.dat
$

I've added one more condition before printing - check if the line is not a blank line (or a line that contains nothing but whitespaces). The updated Perl one-liner is as follows -

Code:
$ 
$ 
$ perl -lane 'print "$F[3] $F[4] $ARGV" if ($.>2 and !/^\s*$/); close  ARGV if eof' 3.dat
607.726550 255.130096 3.dat
986.900503 548.502799 3.dat
457.952424 691.618854 3.dat
800.869647 671.749259 3.dat
$ 
$

Quote:
...
How can I specify a range of input files to the perl line, so that instead of
Code:
perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV if  eof' 1.dat
perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV if  eof' 2.dat

i can perhaps do something like
Code:
perl -lane '$.>2 && print "$F[3] $F[4] $ARGV"; close ARGV if  eof' 0.dat .. 82.dat

In my earlier post, I had specified the regular expression "file?.dat" for this. That was because the test files (for your problem) in my system were named "file1.dat", "file2.dat" and "file3.dat".

If you can come up with a regular expression for all your "dat" files that your shell understands, then you can put that instead.

Note that your shell probably won't expand "0.dat .. 82.dat" to an array of 83 elements - 0.dat, 1.dat, 2.dat, ... 82.dat. If all your data files have the extension ".dat" then you can use "*.dat", as shown below -

[code]
$
$
$ # display all ".dat" test files in my system
$ ls -1 *.dat
0.dat
1.dat
2.dat
3.dat
$
$
$ # now feed all of these files to the Perl one-liner
$ perl -lane 'print "$F[3] $F[4] $ARGV" if ($.>2 and !/^\s*$/); close ARGV if eof' *.dat
4.556663 238.047684 0.dat
77.605812 268.392334 0.dat
152.255829 256.117035 0.dat
161.680557 252.263550 1.dat
164.127808 242.561691 1.dat
182.847580 246.922028 1.dat
269.891937 255.048630 2.dat
270.984741 269.790710 2.dat
283.803314 246.089340 2.dat
288.413605 107.780937 2.dat
607.726550 255.130096 3.dat
986.900503 548.502799 3.dat
457.952424 691.618854 3.dat
800.869647 671.749259 3.dat
$
$
[quote]

Maybe you want more control than that. Say, for example, you have data in files "0.dat" through "82.dat". But you also have files like so - "abc.dat", "zzz.dat", "myfile.dat" which you do not want to process.

The one-liner above will actually feed the files "abc.dat", "zzz.dat" and "myfile.dat" because you are simply specifying a wildcard character "*" for whatever is there to the left of the dot i.e. the base file name.

One way out is "brace expansion". See if your shell supports this - if it does, it will expand numbers within braces like so -

Code:
$ 
$ echo {1..25}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
$ 
$

And you can use something similar for your files -

Code:
perl -lane 'print "$F[3] $F[4] $ARGV" if ($.>2 and !/^\s*$/); close  ARGV if eof' {0..82}.dat

Of course, I don't have all 83 test files in my system, so I can show you the test for 0.dat, 1.dat, 2.dat, 3.dat -

Code:
$ 
$ perl -lane 'print "$F[3] $F[4] $ARGV" if ($.>2 and !/^\s*$/); close  ARGV if eof' {0..3}.dat
4.556663 238.047684 0.dat
77.605812 268.392334 0.dat
152.255829 256.117035 0.dat
161.680557 252.263550 1.dat
164.127808 242.561691 1.dat
182.847580 246.922028 1.dat
269.891937 255.048630 2.dat
270.984741 269.790710 2.dat
283.803314 246.089340 2.dat
288.413605 107.780937 2.dat
607.726550 255.130096 3.dat
986.900503 548.502799 3.dat
457.952424 691.618854 3.dat
800.869647 671.749259 3.dat
$

Quote:
lastly, I'd like to get rid of the ".dat" file handle from the output if that is at all possible without renaming all the files.
Yes, it possible in Perl without renaming all the files. The scalar variable "$ARGV" simply holds the current file name, i.e. the name of the file that is being processed right now. It's value can be modified to remove ".dat" from the end. Here's the updated one-liner -

Code:
$ 
$ 
$ # Remove ".dat" from the end of $ARGV before printing the line
$ perl -lane '$ARGV=~s/.dat$//; print "$F[3] $F[4] $ARGV" if ($.>2  and !/^\s*$/); close ARGV if eof' *.dat
4.556663 238.047684 0
77.605812 268.392334 0
152.255829 256.117035 0
161.680557 252.263550 1
164.127808 242.561691 1
182.847580 246.922028 1
269.891937 255.048630 2
270.984741 269.790710 2
283.803314 246.089340 2
288.413605 107.780937 2
607.726550 255.130096 3
986.900503 548.502799 3
457.952424 691.618854 3
800.869647 671.749259 3
$ 
$ # Same one-liner as above, except that the file names are fed via the  "brace expansion" feature of the shell
$ perl -lane '$ARGV=~s/.dat$//; print "$F[3] $F[4] $ARGV" if ($.>2  and !/^\s*$/); close ARGV if eof' {0..3}.dat
4.556663 238.047684 0
77.605812 268.392334 0
152.255829 256.117035 0
161.680557 252.263550 1
164.127808 242.561691 1
182.847580 246.922028 1
269.891937 255.048630 2
270.984741 269.790710 2
283.803314 246.089340 2
288.413605 107.780937 2
607.726550 255.130096 3
986.900503 548.502799 3
457.952424 691.618854 3
800.869647 671.749259 3
$ 
$

HTH,
tyler_durden

Last edited by durden_tyler; 04-11-2010 at 01:54 PM..
# 7  
Old 04-12-2010
thanks for the help and explanation, it is much appreciated.

---------- Post updated 04-12-10 at 01:42 PM ---------- Previous update was 04-11-10 at 04:24 PM ----------

next step hah....is there a way to multiply the numbers in a specific column by some factor?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pasting multiple files using awk with delimiter

hi, i want to PASTE two files, with a delimiter in between, using awk and pipe the output to another file. i am able to achive the reqirement using PASTE command. but it has a limitation of length till 511 bytes. Example: ------- File1: ---- sam micheal file2: ---- bosco... (11 Replies)
Discussion started by: mohammedsadath
11 Replies

2. Shell Programming and Scripting

pasting two files in every directory (+100 directories)

Hi, I have around 400 directories each one named as hour_1/ , hour_2/ .....hour_400/ and each of these contains two files, namely: File1: hour_1.txt (in hour_1/) , hour_2.txt (in hour_2/) ....hour_400.txt (in hour_400/) etc... File2: client_list_hour_1.txt (in hour_1/),... (7 Replies)
Discussion started by: amarn
7 Replies

3. Shell Programming and Scripting

help with pasting files in filesystem

quick question.. say i have few files in D or E drive.. i want to paste them in Filesystem that is /home/vivek folder... but when i try to do that it shows some error saying "There is not enough space on the destination. Try to remove files to make space." but i think its due to authorization... (3 Replies)
Discussion started by: vivek d r
3 Replies

4. Shell Programming and Scripting

reading files and pasting in another text file

Hi all, I have certain task to do, which involves reading the first column of 1.txt file. This is variable "event" 28434710 23456656 3456895 & finding this "event" in some other text file 2.txt, which has information in the following format #Zgamma: 1 run: 160998 event: ... (7 Replies)
Discussion started by: nrjrasaxena
7 Replies

5. Shell Programming and Scripting

Pasting files with different number of lines

Hi all, I tried to use the paste command to paste two files with different number of lines. e.g. file1 A 1 B 1 C 2 D 2 file2 A 2 B 3 C 4 D 4 E 4 (2 Replies)
Discussion started by: f_o_555
2 Replies

6. UNIX for Dummies Questions & Answers

Copying and Pasting columns from one text file to another

I have a tab delimited text file that I want to cut columns 3,4,5 from. Then I want to paste these columns into a space delimited text file between columns 2 and 3. I still want to keep the space delimited format in the final text file. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

7. Shell Programming and Scripting

pasting fields from two files into one

i have two files with contents file a 1234,abcf 2345,drft 4444,befr file b tom,3 sam,5 dog,7 i want to print first column of file b and join to file a and get output as below tom,1234,abcf sam,2345,drft dog,4444,befr (2 Replies)
Discussion started by: dealerso
2 Replies

8. Shell Programming and Scripting

pasting two files while transposing one of them

hey, I have more a structural problem. I have two input files: 1.inp: 1 2 3 a b c 2 3 4 d f g and the 2.inp 6 6 6 7 7 7 8 8 8 The goal is to get as much output files (with a name 1_2_3.dat) as lines in 1.inp are like this: 6 6 6 a 7 7 7 b 8 8 8 c (5 Replies)
Discussion started by: ergy1983
5 Replies

9. Shell Programming and Scripting

copying selected records from two columns to another file

Hey guys I have got a tab-separated file and I want to copy only selected records from two columns at a time satisfying specified condition, and create a new file. My tab separated file is like this ID score ID score ID Score ID score ID score 1_11 0.80 2_23 0.74 2.36 0.78 2_34 0.75 A_34... (9 Replies)
Discussion started by: jacks
9 Replies

10. UNIX for Dummies Questions & Answers

Trouble pasting multiple files together!!

Hi, I would like to paste multiple files together into one large file. I have 23 of them and I would like to link them on a common variable without writing all the file names out (like in a simple join). Each has about 28,000 columns, but only 17 rows. So the final product would be a single file... (2 Replies)
Discussion started by: etownbetty
2 Replies
Login or Register to Ask a Question