Getting non unique lines from concatenated files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Getting non unique lines from concatenated files
# 36  
Old 03-24-2011
@Bartus11
THANK YOU VERY MUCH like always. I was expeting part of the previous code to repeat to get to the value of genotype. I need to better understand the loops. Just one question .... why did you use a hash and not just a simple array to hold the values??

Cheers Smilie ... hv a nice day
# 37  
Old 03-24-2011
Because array can only be indexed by numbers and it is not very useful with genotypes being letters Smilie
This User Gave Thanks to bartus11 For This Post:
# 38  
Old 03-24-2011
@Bartus:

I have the same task but the file format is different now.

Code:
levure5_SNP_Consensus_Calls.txt:SK1.scplasm1    6153    22    22    T    C    0.000000    h4,h10,h9,     5752    5588    7    5588    3    20    0    -1    
levure6_SNP_Consensus_Calls.txt:SK1.scplasm1    6153    22    22    T    C    0.000000    h4,h10,h9,     4046    3862    10    3862    5    18    0    -1    
levure7_SNP_Consensus_Calls.txt:SK1.scplasm1    6153    22    22    T    C    0.000000    h4,h10,h9,     2264    2184    2    2184    3    21    0    -1    
levure8_SNP_Consensus_Calls.txt:SK1.scplasm1    6153    22    22    T    C    0.000000    h4,h10,h9,     2606    2537    2    2537    4    21    0    -1    
levure5_SNP_Consensus_Calls.txt:SK1.chr10    213804    13    13    G    A    0.999999    h3,     296    232    48    232    15    15    0    -1    
levure6_SNP_Consensus_Calls.txt:SK1.chr10    213804    13    13    G    A    1.000000    h3,     240    183    46    183    15    15    0    -1    
levure7_SNP_Consensus_Calls.txt:SK1.chr10    213804    13    13    G    A    0.000000    h3,     96    77    14    77    12    17    0    -1    
levure8_SNP_Consensus_Calls.txt:SK1.chr10    213804    13    13    G    A    1.000000    h3,     133    106    23    106    19    20    0    -1    
levure5_SNP_Consensus_Calls.txt:SK1.chrm    58283    10    10    C    T    0.000000    h4,h10,h9,     232    219    0    219    0    14    0    -1    
levure6_SNP_Consensus_Calls.txt:SK1.chrm    58283    10    10    C    T    0.000000    h4,h10,h9,     298    267    0    267    0    12    0    -1

I need to count number of A, T, G and C in 5th field and report as for example of above my expected output is:
Code:
T=4
G=4
C=2

Can you please provide both awk and perl versions, so that I can understand the difference?

---------- Post updated at 11:15 AM ---------- Previous update was at 11:05 AM ----------

I have used a primitive way of doing it foreach case like the following:

Code:
awk '{print $5}' file | sort | grep "A" | wc

But I'm sure there is a better way.

Cheers Smilie
# 39  
Old 03-24-2011
Code:
perl -lane '$a{$F[4]}++;END{for $i (keys %a){print "$i=$a{$i}"}}' file

Code:
awk '{a[$5]++}END{for (i in a){print i"="a[i]}}' file

This User Gave Thanks to bartus11 For This Post:
# 40  
Old 03-24-2011
@Bartus11

Hi, I have a slightly different question related to the code you provided. I tried to male it executable by doing chmod +x executable_file.pl, but when
I do

Code:
perl -p -i 'executable_file.pl' file

I get the following


Code:
String found where operator expected at executable_file.pl line 1, near "nle '$h{((split "=",(split ";",(split "[\t ]+",$_)[8])[0])[1])}++;END{for $i (keys %h){print "$i=$h{$i}"}}'"
        (Do you need to predeclare nle?)
syntax error at gff_genotype_base_counter.pl line 1, near "nle '$h{((split "=",(split ";",(split "[\t ]+",$_)[8])[0])[1])}++;END{for $i (keys %h){print "$i=$h{$i}"}}'"
Execution of executable_file.pl aborted due to compilation errors.


... can you suggest how I can make your the code executable as I need to do this over several files.

Thanks for your input Smilie
# 41  
Old 03-24-2011
The most simple way is to put that line in a shell script:
Code:
[root@linux ~]# cat script.sh
#!/bin/sh
perl -lane '$a{$F[4]}++;END{for $i (keys %a){print "$i=$a{$i}"}}' $1

Run it as:
Code:
[root@linux ~]# ./script.sh file

This User Gave Thanks to bartus11 For This Post:
# 42  
Old 03-24-2011
nothing happens and it waits for new input !! Smilie

---------- Post updated at 04:01 PM ---------- Previous update was at 03:58 PM ----------

No its doing it ... I made a small mistake ....sorry to bother u Smilie thanks a lot Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print number of lines for files in directory, also print number of unique lines

I have a directory of files, I can show the number of lines in each file and order them from lowest to highest with: wc -l *|sort 15263 Image.txt 16401 reference.txt 40459 richtexteditor.txt How can I also print the number of unique lines in each file? 15263 1401 Image.txt 16401... (15 Replies)
Discussion started by: spacegoose
15 Replies

2. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
Discussion started by: cokedude
7 Replies

3. Shell Programming and Scripting

Look up 2 files and print the concatenated output

file 1 Sun Mar 17 00:01:33 2013 submit , Name="1234" Sun Mar 17 00:01:33 2013 submit , Name="1344" Sun Mar 17 00:01:33 2013 submit , Name="1124" .. .. .. .. Sun Mar 17 00:01:33 2013 submit , Name="8901" file 2 Sun Mar 17 00:02:47 2013 1234 execute SUCCEEDED Sun Mar 17... (24 Replies)
Discussion started by: aravindj80
24 Replies

4. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Hello everyone, Maybe somebody could help me with an awk script. I have this input (field separator is comma ","): 547894982,M|N|J,U|Q|P,98,101,0,1,1 234900027,M|N|J,U|Q|P,98,101,0,1,1 234900023,M|N|J,U|Q|P,98,54,3,1,1 234900028,M|H|J,S|Q|P,98,101,0,1,1 234900030,M|N|J,U|F|P,98,101,0,1,1... (2 Replies)
Discussion started by: Ophiuchus
2 Replies

5. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

6. UNIX for Dummies Questions & Answers

getting unique lines from 2 files

hi i have used comm -13 <(sort 1.txt) <(sort 2.txt) option to get the unique lines that are present in file 2 but not in file 1. but some how i am getting the entire file 2. i would expect few but not all uncommon lines fro my dat. is there anything wrong with the way i used the command? my... (1 Reply)
Discussion started by: anurupa777
1 Replies

7. Shell Programming and Scripting

Compare multiple files and print unique lines

Hi friends, I have multiple files. For now, let's say I have two of the following style cat 1.txt cat 2.txt output.txt Please note that my files are not sorted and in the output file I need another extra column that says the file from which it is coming. I have more than 100... (19 Replies)
Discussion started by: jacobs.smith
19 Replies

8. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

9. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

10. Shell Programming and Scripting

Lines Concatenated with awk

Hello, I have a bash shell script and I use awk to print certain columns of one file and direct the output to another file. If I do a less or cat on the file it looks correct, but if I email the file and open it with Outlook the lines outputted by awk are concatenated. Here is my awk line:... (6 Replies)
Discussion started by: xadamz23
6 Replies
Login or Register to Ask a Question