Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Getting non unique lines from concatenated files Post 302510249 by bartus11 on Saturday 2nd of April 2011 11:19:55 AM
Old 04-02-2011
The main idea behind this code is to introduce another level of hash, that will contain starting number of the number's range.
Code:
#!/usr/bin/perl
open I, "$ARGV[0]";
while (<I>){
  @F=split;
  $start=$F[1] if $F[1]-1!=$prev;                      # extract range starting number by comparing value of second column with it's value from previous line
  $r{$F[0]}{$start}{$F[1]}=($F[4]=="-1")?$F[2]:$F[4];  # if 5th column is equal "-1" then Ref value is taken from 3rd field, otherwise take it from 5th column
  $g{$F[0]}{$start}{$F[1]}=($F[4]=="-1")?$F[3]:$F[5];  # the same for Gen, just other columns
  $prev=$F[1];                                         # save 1st column value for comparison with next line
}
END{                                                   # this part is in the essence the same as the old code, it just contains another "for" loop(red), that goes through the starting numbers for the ranges.
  for $i (keys %r){
    for $j (keys %{$r{$i}}){
      @x=sort{$a <=> $b} keys %{$r{$i}{$j}};
      print "$i\n";
      print "$x[0]-$x[$#x]\n";
      print "Ref:\n";
      for $k (@x){
        print "$r{$i}{$j}{$k}";
      }
      print "\n\n";
      print "Gen:\n";
      for $k (@x){
        print "$g{$i}{$j}{$k}";
      }
      print "\n\n";
    }
  }
}

Below you can see how the "%r" hash looks like after reading all lines:
Code:
%r = {
          'SK1.chr10' => {
                           '3181' => {
                                       '3181' => 'C',
                                       '3193' => 'G',
                                       '3182' => 'C',
                                       '3189' => 'T',
                                       '3188' => 'T',
                                       '3194' => 'C',
                                       '3185' => 'A',
                                       '3183' => 'T',
                                       '3190' => 'G',
                                       '3184' => 'G',
                                       '3191' => 'A',
                                       '3187' => 'G',
                                       '3192' => 'T',
                                       '3186' => 'C'
                                     },
                           '5503' => {
                                       '5503' => 'C',
                                       '5504' => 'A',
                                       '5506' => 'A',
                                       '5505' => 'A'
                                     }
                         }
        };


Last edited by bartus11; 04-02-2011 at 12:28 PM..
This User Gave Thanks to bartus11 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Lines Concatenated with awk

Hello, I have a bash shell script and I use awk to print certain columns of one file and direct the output to another file. If I do a less or cat on the file it looks correct, but if I email the file and open it with Outlook the lines outputted by awk are concatenated. Here is my awk line:... (6 Replies)
Discussion started by: xadamz23
6 Replies

2. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

3. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

4. Shell Programming and Scripting

Compare multiple files and print unique lines

Hi friends, I have multiple files. For now, let's say I have two of the following style cat 1.txt cat 2.txt output.txt Please note that my files are not sorted and in the output file I need another extra column that says the file from which it is coming. I have more than 100... (19 Replies)
Discussion started by: jacobs.smith
19 Replies

5. UNIX for Dummies Questions & Answers

getting unique lines from 2 files

hi i have used comm -13 <(sort 1.txt) <(sort 2.txt) option to get the unique lines that are present in file 2 but not in file 1. but some how i am getting the entire file 2. i would expect few but not all uncommon lines fro my dat. is there anything wrong with the way i used the command? my... (1 Reply)
Discussion started by: anurupa777
1 Replies

6. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

7. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Hello everyone, Maybe somebody could help me with an awk script. I have this input (field separator is comma ","): 547894982,M|N|J,U|Q|P,98,101,0,1,1 234900027,M|N|J,U|Q|P,98,101,0,1,1 234900023,M|N|J,U|Q|P,98,54,3,1,1 234900028,M|H|J,S|Q|P,98,101,0,1,1 234900030,M|N|J,U|F|P,98,101,0,1,1... (2 Replies)
Discussion started by: Ophiuchus
2 Replies

8. Shell Programming and Scripting

Look up 2 files and print the concatenated output

file 1 Sun Mar 17 00:01:33 2013 submit , Name="1234" Sun Mar 17 00:01:33 2013 submit , Name="1344" Sun Mar 17 00:01:33 2013 submit , Name="1124" .. .. .. .. Sun Mar 17 00:01:33 2013 submit , Name="8901" file 2 Sun Mar 17 00:02:47 2013 1234 execute SUCCEEDED Sun Mar 17... (24 Replies)
Discussion started by: aravindj80
24 Replies

9. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
Discussion started by: cokedude
7 Replies

10. UNIX for Beginners Questions & Answers

Print number of lines for files in directory, also print number of unique lines

I have a directory of files, I can show the number of lines in each file and order them from lowest to highest with: wc -l *|sort 15263 Image.txt 16401 reference.txt 40459 richtexteditor.txt How can I also print the number of unique lines in each file? 15263 1401 Image.txt 16401... (15 Replies)
Discussion started by: spacegoose
15 Replies
All times are GMT -4. The time now is 07:33 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy