Sponsored Content
Top Forums Shell Programming and Scripting Joining files in a complex way Post 302404194 by durden_tyler on Monday 15th of March 2010 11:12:14 PM
Old 03-16-2010
Quote:
Originally Posted by stateperl
Condition::
Code:
1. if letters are same it has to be 1 or 3 [ A/A or T/T or G/G or C/C ]
2. if same ID has 2 same letters first one has to be 1 and other has to be 3 [ see the ID S1 has T/T(bold) as 1 and G/G as 3.:::Red bold ]
3. if letters are different it has to be 2 [ A/G or T/A or T/C or others ]

modified-input1

Code:
"aphab"    "S1"    "S2"    "S3"
"a"    "T/T"    "A/A"    "A/A"
"b"    "A/G"    "A/G"    "A/A"
"c"    "T/T"    "G/G"    "A/A"
"d"    "G/G"    "A/G"    "A/G"
"e"    "A/G"    "G/G"    "A/G"
"f"     "T/T"    "G/G"    "A/G"
"g"    "T/T"    "G/G"    "G/G"
"h"    "T/T"    "G/G"    "G/G"
"I"     "T/T"    "G/G"    "G/G"

same old-input2
Code:
"ID"    "Label"    "log"
"S1"    "xxx"    2.8
"S1"    "xxx"    3
"S1"    "xxx"    4
"S2"    "yyy"    6.8
"S2"    "yyy"    7
"S2"    "yyy"    7.4
"S2"    "yyy"    8
"S3"    "zzz"    12
"S3"    "zzz"    14
"S3"    "zzz"    16
"S3"    "zzz"    18
"S3"    "zzz"    20

newoutput
Code:
"ID"        "Label"     "StYPE"     "Ntype"     "Stype_No"  "log"
"S1"        "xxx"       "T/T"       1           6           2.8
"S1"        "xxx"       "A/G"       2           2           3
"S1"        "xxx"       "G/G"         3          1           4
"S2"        "yyy"       "A/A"       1           1           6.8
"S2"        "yyy"       "A/G"       2           2           7
"S2"        "yyy"       "G/G"       3           6           7.4
"S2"        "yyy"       "NULL"      "null"      "null"      8
"S3"        "zzz"       "A/A"       1           3           12
"S3"        "zzz"       "A/G"       2           3           14
"S3"        "zzz"       "G/G"       3           3           16
"S3"        "zzz"       "NULL"      "null"      "null"      18
"S3"        "zzz"       "NULL"      "null"      "null"      20

Well, in this case, you'll have to generate the key-value pairs in the three hashes - %chartonum, %numtochar and %mainhash as you iterate through "input1", based on the 3 conditions mentioned.

Code:
$ 
$ 
$ cat input1
"aphab"    "S1"    "S2"    "S3"
"a"    "T/T"    "A/A"    "A/A"
"b"    "A/G"    "A/G"    "A/A"
"c"    "T/T"    "G/G"    "A/A"
"d"    "G/G"    "A/G"    "A/G"
"e"    "A/G"    "G/G"    "A/G"
"f"     "T/T"    "G/G"    "A/G"
"g"    "T/T"    "G/G"    "G/G"
"h"    "T/T"    "G/G"    "G/G"
"I"     "T/T"    "G/G"    "G/G"
$ 
$ cat input2
"ID"    "Label"    "log"
"S1"    "xxx"    2.8
"S1"    "xxx"    3
"S1"    "xxx"    4
"S2"    "yyy"    6.8
"S2"    "yyy"    7
"S2"    "yyy"    7.4
"S2"    "yyy"    8
"S3"    "zzz"    12
"S3"    "zzz"    14
"S3"    "zzz"    16
"S3"    "zzz"    18
"S3"    "zzz"    20
$ 
$ cat -n combine_2.pl
     1  #!/usr/bin/perl -w
     2
     3  my %chartonum;
     4  my %numtochar;
     5  my %mainhash;
     6
     7  # first process all "input1" files i.e. all elements of the array @infile1
     8  $file1 = "input1";
     9
    10  open(INFILE, $file1) or die "Can't open $file1: $!";
    11  while (<INFILE>) {
    12    chomp;
    13    s/"//g;
    14    s/[ ]+/ /g;
    15    if ($. == 1) {
    16      @x = split/ /;
    17    } else {
    18      @y = split/ /;
    19      foreach $i (1..$#y) {
    20        @t = split (/\//, $y[$i]);
    21        if ($t[0] eq $t[1]) {
    22          if (not defined $chartonum{$x[$i].",".$y[$i]} and
    23              not defined $numtochar{$x[$i].",1"}       and
    24              not defined $numtochar{$x[$i].",3"}) {
    25            $numtochar{$x[$i].",1"} = $y[$i];
    26            $chartonum{$x[$i].",".$y[$i]} = 1;
    27          }
    28          elsif (not defined $chartonum{$x[$i].",".$y[$i]} and
    29                 not defined $numtochar{$x[$i].",3"}) {
    30            $numtochar{$x[$i].",3"} = $y[$i];
    31            $chartonum{$x[$i].",".$y[$i]} = 3;
    32          }
    33        } else {
    34          if (not defined $chartonum{$x[$i].",".$y[$i]} and
    35              not defined $numtochar{$x[$i].",2"}) {
    36            $numtochar{$x[$i].",2"} = $y[$i];
    37            $chartonum{$x[$i].",".$y[$i]} = 2;
    38          } # end of if not defined
    39        } # end of else i.e. t[0] ne t[1]
    40        $mainhash{$x[$i].",".$chartonum{$x[$i].",".$y[$i]}}++;
    41      } # end of foreach
    42    } # end of $. > 1
    43  }
    44  close(INFILE) or die "Can't close $file1: $!";
    45
    46  # print the header
    47  printf("%-12s%-12s%-12s%-12s%-12s%-s\n","\"ID\"","\"Label\"","\"StYPE\"","\"Ntype\"","\"Stype_No\"","\"log\"");
    48  # now start processing the "input2" file
    49  $infile2 = "input2";
    50  open(INFILE, $infile2) or die "Can't open $infile2: $!";
    51  while (<INFILE>) {
    52    if ($. > 1) {
    53      chomp;
    54      s/"//g;
    55      s/[ ]+/ /g;
    56      # print $_,"\n";
    57      @z = split/ /;
    58      if (!defined $prev or $z[0] ne $prev) {$num = 1} else {$num++};
    59      $prev = $z[0];
    60      printf("%-12s%-12s%-12s%-12s%-12s%-s\n",
    61             "\"$z[0]\"",
    62             "\"$z[1]\"",
    63             defined $numtochar{$z[0].",".$num} ? "\"".$numtochar{$z[0].",".$num}."\"" : "\"NULL\"",
    64             exists $numtochar{$z[0].",".$num} ? $num : "\"null\"",
    65             defined $mainhash{$z[0].",".$num} ? $mainhash{$z[0].",".$num} : "\"null\"", 
    66             $z[2]
    67            );
    68    }
    69  }
    70  close(INFILE) or die "Can't close $infile2: $!";
    71
$ 
$ perl combine_2.pl
"ID"        "Label"     "StYPE"     "Ntype"     "Stype_No"  "log"
"S1"        "xxx"       "T/T"       1           6           2.8
"S1"        "xxx"       "A/G"       2           2           3
"S1"        "xxx"       "G/G"       3           1           4
"S2"        "yyy"       "A/A"       1           1           6.8
"S2"        "yyy"       "A/G"       2           2           7
"S2"        "yyy"       "G/G"       3           6           7.4
"S2"        "yyy"       "NULL"      "null"      "null"      8
"S3"        "zzz"       "A/A"       1           3           12
"S3"        "zzz"       "A/G"       2           3           14
"S3"        "zzz"       "G/G"       3           3           16
"S3"        "zzz"       "NULL"      "null"      "null"      18
"S3"        "zzz"       "NULL"      "null"      "null"      20
$ 
$

Use the Data :: Dumper module to check the values of the 3 hashes right after "input1" is done processing at line 45.

But frankly, given the complexity of calculations involved, I'd rather look for some Perl Bioinformatics modules that have subroutines to do this.
Or check BioPerl, or books like "Beginning/Mastering Perl for Bioinformatics" at amazon.com.

HTH,
tyler_durden

PS - I'm assuming those A, C, G, T are the nucleotide bases of a DNA strand, and these files are related to Bioinformatics.

Last edited by durden_tyler; 03-16-2010 at 02:50 PM..
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

joining 2 files

Hi, I have two files that I need to find difference between. Do I use diff or join? If join, how do I use it? thanks, webtekie (1 Reply)
Discussion started by: webtekie
1 Replies

2. Shell Programming and Scripting

Help with joining two files

Greetings, all. I've got a project that requires I join two data files together, then do some processing and output. Everything must be done in a shell script, using standard unix tools. The files look like the following: File_1 Layout: Acct#,Subacct#,Descrip Sample: ... (3 Replies)
Discussion started by: rjlohman
3 Replies

3. Shell Programming and Scripting

joining two or more files

i have three files file a has contents 123 234 238 file b has contents 189 567 567 and file c has contents qwe ert ery (1 Reply)
Discussion started by: tomjones
1 Replies

4. Shell Programming and Scripting

Joining Three Files

Hi guys, I have three files which needs to be joined to a single file. File 1: Col a, Col b, Col c File 2: Col 1a, Col 1b File 3: Col 2a, Col 2b Output: Col 1a, Col 2a, Col a, Col b, Col c. All the files are comma delimited. I need to join Col b with Col 1b and need to... (17 Replies)
Discussion started by: mac4rfree
17 Replies

5. UNIX for Dummies Questions & Answers

Joining two files

I have two comma separated files. I want to join those filesa nd put the result in separate file. smaple data are: file1: A1,1,100 A2,1,200 B1,2,100 B2,2,200 file2 1,50 1,25 1,25 1,100 1,100 2,50 2,50 (10 Replies)
Discussion started by: pandeesh
10 Replies

6. Shell Programming and Scripting

Joining two files into one

Hi experts, I'm quite newbie here!! I have two seperate files. Contents of file like below File 1: 6213019212001 8063737 File:2 15703784 I want to join these two files into one where content will be File 3: 6213019212001 8063737 15703784 Regards, Ray Seilden (1 Reply)
Discussion started by: RayanS
1 Replies

7. Shell Programming and Scripting

Help with joining files and adding headers to files

Hi, I have about 20 tab delimited text files that have non sequential numbering such as: UCD2.summary.txt UCD45.summary.txt UCD56.summery.txt The first column of each file has the same number of lines and content. The next 2 column have data points: i.e UCD2.summary.txt: a 8.9 ... (8 Replies)
Discussion started by: rrdavis
8 Replies

8. Shell Programming and Scripting

Joining 2 Files

File "A" (column names: Nickname Number GB) Nickname Number GB PROD_DB0034 100A 16 ASMIL1B_DATA_003 100B 16 PSPROD_0000 1014 36 PSPROD_0001 100D 223 ..... File "B" (column names: TYPE DEVICE NUMBER SIZE) TYPE DEVICE NUMBER SIZE 1750500 hdisk2 100A 16384 1750500 hdisk3 ... (4 Replies)
Discussion started by: Daniel Gate
4 Replies

9. Shell Programming and Scripting

Please help me in joining two files

I have two files with the below contents : sampleoutput3.txt 20150202;hostname1 20150223;hostname2 20150716;hostname3 sampleoutput1.txt hostname;packages_out_of_date;errata_out_of_date; hostname1;11;0; hostnamea;12;0; hostnameb;11;0; hostnamec;95;38; hostnamed;440;358;... (2 Replies)
Discussion started by: rahul2662
2 Replies
All times are GMT -4. The time now is 11:19 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy