merging files and adding special columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting merging files and adding special columns
# 1  
Old 08-19-2011
merging files and adding special columns

Hi everyone,

I got a problem with merging files and hoped one of you would have an idea how to approach this issue. I tried it with awk, but didn't get far. This is what I have:

I got 40 files looking like the ones below. All have three columns but the number of rows differs (20000 to 50000).

eg. file1

chromosome position_on_chromosome file1
chr1 62138 x
chr1 631246 x
chr1 1238847 x
chr1 1238854 x
....

eg. file2

chromosome position_on_chromosome file2
chr1 238398 x
chr1 533005 x
chr1 631246 x
chr1 657484 x
chr1 1281185 x
chr1 1448761 x
....

I would now need to merge them according to their genome coordinates (ie 'chromosome' and 'position_on_chromosome' -both infos together give the coordinates). All coordinates (column 1 & 2) should be listed, if present in one file or in all files (=complete list). The third columns of the original files should be added after each other.


This is how it should look like:

chromosome position_on_chromosome file1 file2 (and all other files 'file3' 'file4' etc)
chr1 62138 x e
chr1 238398 e x
chr1 533005 e x
chr1 631246 x x
chr1 657484 e x
chr1 1238847 x e
chr1 1238854 x e
chr1 1281185 e x
chr1 1448761 e x
.....


A bit complicated to explain, but I hope you got what I mean Smilie

Any help would be greatly appreciated!

Edit note: ...just saw now, that it doesn't leave the space in the output table for those 'x' which are empty. Replaced the space (empty cell in the table) with a 'e' for clarification.

Last edited by TuAd; 08-19-2011 at 05:08 PM.. Reason: ...had some pasting issues, so corrected issues
# 2  
Old 08-19-2011
Why in your output those lines are missing:
Code:
chr5 173197909 x
chr5 173418499 x
chr5 172800339 x
chr5 172800347 x
chr5 172805352 x
chr5 172805379 x

?
# 3  
Old 08-19-2011
sorry, you're of course right. I just didn't post the complete list as it was just for illustration purpose. Corrected it now. Thanks.
# 4  
Old 08-19-2011
One more thing... shouldn't there be line like this for entries that are identical in both files:
Code:
chr1 631246 x x

instead of
Code:
chr1 631246 x e
chr1 631246 e x

?
This User Gave Thanks to bartus11 For This Post:
# 5  
Old 08-19-2011
Of course, you're right. This is actually exactly the problem I have. To merge the files so that I only get one entry per position including all additional info.
Sorry, accidentally pasted the wrong output (where I exactly don't get that). Thanks for noticing! Corrected now.
# 6  
Old 08-19-2011
Try this script:
Code:
#!/usr/bin/perl
for $f (@ARGV){
  open I, "$f";
  while (<I>){
    chomp;
    @F=split / /;
    $h="$F[0] $F[1]" if $n==0;
    $h.=" $F[2]" if $.==1;
    $n++ if $.==1;
    $a{"$F[0] $F[1]"}=$a{"$F[0] $F[1]"}?$a{"$F[0] $F[1]"} . " $F[2]":"e " x ($n-1) . "$F[2]" if $.>1;
    if (eof){
      $.=0;
      for $i (keys %a){
        $a{$i}="$a{$i} e" if length $a{$i}<($n*2-1);
      }
    }
  }
}
print "$h\n";
for $i (keys %a){
  print "$i $a{$i}\n";
}

Run it like this: ./script.pl file1 file2 file3 ...
This User Gave Thanks to bartus11 For This Post:
# 7  
Old 08-22-2011
...the script works absolutely fantastic!

many thanks for this!!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Adding columns from 2 files with variable number of columns

I have two files, file1 and file2 who have identical number of rows and columns. However, the script is supposed to be used for for different files and I cannot know the format in advance. Also, the number of columns changes within the file, some rows have more and some less columns (they are... (13 Replies)
Discussion started by: maya3
13 Replies

2. Shell Programming and Scripting

Merging and Adding colon to columns

Hello, I have a tab delim file that looks like this CHROM POS ID REF ALT ID HGVS_C HGVS_P 1 17319011 rs2076603 G A NM_022089.3,NM_001141973.2,NM_001141974.2 c.1815C>T,c.1800C>T,c.1800C>T p.Pro605Pro,p.Pro600Pro,p.Pro600Pro 1 20960230 rs45530340 ... (3 Replies)
Discussion started by: nans
3 Replies

3. Shell Programming and Scripting

Merging Multiple Columns between two files

Hello guys, I have 2 CSV files which goes like this: CSV1: Breaking.csv: UTF-8 "Name","Description","Occupation","Email" "Walter White","","Chemistry Teacher","w.w@bb.com" "Jessie Pinkman","","Junkie","j.p@bb.com" "Hank Schrader","","DEA Agent","h.s@bb.com" CSV2: Bad.csv... (7 Replies)
Discussion started by: jeffreybsu
7 Replies

4. Shell Programming and Scripting

Merging two special character separated files based on pattern matching

Hi. I have 2 files of below format. File1 AA~1~STEVE~3.1~4.1~5.1 AA~2~DANIEL~3.2~4.2~5.2 BB~3~STEVE~3.3~4.3~5.3 BB~4~TIM~3.4~4.4~5.4 File 2 AA~STEVE~AA STEVE WORKS at AUTO COMPANY AA~DANIEL~AA DANIEL IS A ELECTRICIAN BB~STEVE~BB STEVE IS A COOK I want to match 1st and 3rd... (2 Replies)
Discussion started by: crypto87
2 Replies

5. Shell Programming and Scripting

Merging columns based on one or more column in two files

I have two files. FileA.txt 30910 rs7468327 36587 rs10814410 91857 rs9408752 105797 rs1133715 146659 rs2262038 152695 rs2810979 181843 rs3008128 182129 rs3008131 192118 rs3008170 FileB.txt 30910 1.9415219673 0 36431 1.3351312477 0.0107191428 36587 1.3169171182... (2 Replies)
Discussion started by: genehunter
2 Replies

6. UNIX for Dummies Questions & Answers

Merging two text files by two columns

Hi, I have two text files that I would like to merge/join. I would like to join them if the first columns of both text files match and the second column of the first text file matches the third column of the second text file. Example input: First file: 1334 10 0 0 1 5.2 1334 12 0 0 1 4.5... (4 Replies)
Discussion started by: evelibertine
4 Replies

7. Shell Programming and Scripting

Merging columns from multiple files

Hello, I have a number of tab delimited data files consists of two columns. Like that: File1 800.000000 0.002744 799.000000 0.002517 798.000000 0.002836 797.000000 0.002553 FIle2 800.000000 0.000261 799.000000 0.000001 798.000000 0.000551 797.000000 0.000275 File3... (19 Replies)
Discussion started by: erden
19 Replies

8. Shell Programming and Scripting

Merging columns from multiple files in one file

Hi, I want to select columns from multiple files and combine them in one file. The files are simulation-data-files with 23 columns each and about 50 rows. I now use: cut -f 11 Sweep?wing-30?scale=0.?0?fan2?.txt | pr -3 | awk '{printf("\n%s\t%s\t%s",$1,$2,$3)}' > ../Data_Processed/output.txtI... (1 Reply)
Discussion started by: isgoed
1 Replies

9. UNIX for Dummies Questions & Answers

Extracting columns from different files for later merging

Hello! I wan't to extract columns from two files and later combine them for plotting with gnuplot. If the files file1 and file2 look like: fiile1: a, 0.62,x b, 0.61,x file2: a, 0.43,x b, 0,49,x The desired output is a 0.62 0.62 b 0.61 0.49 Thank you in advance! (2 Replies)
Discussion started by: kingkong
2 Replies

10. Shell Programming and Scripting

merging few columns of two text files to a new file

hi i need to select a few columns of two txt files and write it to a new file. there is one common field for both of these files. plz help me in this thanks in advance (4 Replies)
Discussion started by: kolvi
4 Replies
Login or Register to Ask a Question