I am trying to join a few hundred files using join. Is there a way to use while read or something else to automate this. My problem is the following.
Day 1
Day 2
Day 3
And so on for a few hundred days.
I am trying to make a file that looks like this
My command looks like this:
where the common identifier is in column 2 (-j2) and the value i want to amend my master file with is always in column 3 (-o 1.3 2.3). I'm trying to avoid running this command over and over again, as the -o option keeps on growing (-o 1.3 1.4 2.3 --> -0 1.3 1.4 1.5 2.3 --> -o 1.3 1.4 1.5 1.6 2.3 --> -o 1.3 1.4 1.5...1.n 2.3, where n=~200)
Is there a way to use while read or something else to make the file?
Your help is much appreciated.
Thanks.
Last edited by zaxxon; 03-24-2010 at 09:33 AM..
Reason: use code tags please, ty
Here's an idea using Perl, assuming that the file contents are in the same order as the file names i.e. file1.csv has data for day 1, file2.csv has data for day 2, and so on...
Not sure if this is what you wanted. Note that putting all those commas after "HIJ" means that
(a) either the list of cities is hardcoded/known/present in a separate file, or
(b) a separate walk through of all files is done first to get such a list, and then the hash is built
Consider what happens if "HIJ" is absent from files 1 through 199, and present in file # 200. You'd have to have a hash entry with 199 commas on the left.
The city list could also be determined by a single parse of all files, but the program for that would be much more elaborate, I think.
Thanks, but the problem is that the list of the cities changes from file to file, so they are not in the same order every time. Also this method makes it seem like KLM had temperatures of 5 and 8 on days 1 and 2, when the temps were actually observed on days 2 and 3, which is why I have to have commas in there. The file is a csv, so when i read it into a spreadsheet there will be empty cells at times when a temp was not updated for a particular city.
... but the problem is that the list of the cities changes from file to file, so they are not in the same order every time. Also this method makes it seem like KLM had temperatures of 5 and 8 on days 1 and 2, when the temps were actually observed on days 2 and 3, which is why I have to have commas in there. The file is a csv, so when i read it into a spreadsheet there will be empty cells at times when a temp was not updated for a particular city.
...
I thought so.
Here's a more elaborate program that should take care of those issues. The script comments should be self-explanatory.
HTH,
tyler_durden
NB - You may want to change the tab "\t" at line 28 to comma "," for a proper csv file.
Last edited by durden_tyler; 03-24-2010 at 03:05 PM..
Hello all,
I want to join 2 tabbed files on the first 2 fields, and filling the missing values with 0. The 3rd column in each file is constant for the entire file.
file1
12658699 ST5 XX2720 0 1 0 1
53039541 ST5 XX2720 1 0 1.5 1
file2 ... (6 Replies)
Please help, I want to join multiple files based on column 1, and put the missing values as 0. Also the colname in the output should say which file the values came from.
FILE1
1 11
2 12
3 13
FILE2
2 22
3 23
4 24
FILE3
1 31
3 33
4 34
FILE1 FILE2 FILE3
1 11 0 31 (1 Reply)
Hi there,
I am trying to join 24 files (i showed example of 3 files below). They all have 2 columns. The first columns is common to all. The files are tab delimited eg
file 1
rs0001 100e-34
rs0003 2.8e-01
rs008 1.9e-90
file 2
rs0001 1.98e-22
rs0004 3.77e-10... (4 Replies)
Hi,
I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column).
I want to write a script to join the files by the first common column so that in the... (5 Replies)
Hi all,
I searched through the forum but i can't manage to find a solution. I need to join a set of files placed in a directory (~1600) by column, and obtain an output with first and second column common to each file, but following columns are taken from the file in the list (precisely the fourth... (10 Replies)
Hello,
My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns:
File A: (tab-delimited)
PDB CHAIN Start End Fragment
1avq A 171 176 awyfan
1avq A 172 177 wyfany
1c7k A 2 7... (3 Replies)
Hi,
I have a big file of 50GB size. I need copy it to a second ftp from a ftp. I am not able to do the full 50GB transfer as it timesout after some time. SO i am trying to split the file into 5gb each 10 files with the below command.
split -b 5368709120 pack.tar.gz backup.gz
After I... (2 Replies)
I am a new to Linux and try to write a script to join three multiple files.
For example, there are three files
file1
# comment
a Kevin
b Vin
c Sam
file 2
# comment
a 10
b 20
c 40
file 3
# comment
a blue
b yellow (7 Replies)