Getting non unique lines from concatenated files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Getting non unique lines from concatenated files
# 71  
Old 03-28-2011
Thanks Bartus .... Smilie yeah does what I want, was nice and simple to understand too ... correct me if I'm wrong .... so 4 different regex matches, matching digits are assigned to $1-4 and these and these then used to in the sed construct to replace DP4 subfeilds with their corresponding values stored in $1-4 ....right? .. But this also broke down $F[7] creating other fields not present in $F[7] of the original file ... yeah? ..... What if I had to report these new fields at the end of each line without breaking the line pattern in the original file ... in other words how do I move these fields to the end of the line or write them to a new file. ??? Could you please enlighten on this ?
Cheers and have a nice evening Smilie
# 72  
Old 03-28-2011
To append those fields at the end of each line:
Code:
perl -ple '/DP4=(\d+),(\d+),(\d+),(\d+)/;$_.="\tfwd_ref_allele=$1\trev_ref_allele=$2\tfwd_non_ref_allele=$3\trev_non_ref_allele=$4\t"' file

To print just those fields (so you can redirect the output to another file):
Code:
perl -nle '/DP4=(\d+),(\d+),(\d+),(\d+)/;print "fwd_ref_allele=$1\trev_ref_allele=$2\tfwd_non_ref_allele=$3\trev_non_ref_allele=$4\t"' file

This User Gave Thanks to bartus11 For This Post:
# 73  
Old 03-29-2011
Thank you Bartus .... that was so simple Smilie .... shame on me ... sorry I don't know many things

Cheers ... will be back with more Smilie
++

---------- Post updated 03-29-11 at 06:13 AM ---------- Previous update was 03-28-11 at 04:10 PM ----------

Hello Bartus and others,
I have another question. So now I have a file which has two feilds. Using the two feilds as keys I want to extract the lines corresponding to these two matches from another file which has other details about them. So for example:
Code:
chr01    223568
chr02    457944
chr02    693666
chr03    326757
chr03    327269
chrm    58283
scplasm1    6153

So using first line as example above which has chr01 and 223568, I want to extract the line from other files containing this pattern, and the same goes for the rest of the lines. Can you please enlighten on how I can iterate over lines of the above file and possibly use regex matches to extract pattern containing lines from other files ??
I'll appreciate your feedback and if you do provide a code could you please provide comments?
Cheers and have a nice day Smilie
# 74  
Old 03-29-2011
Try:
Code:
perl -e 'open A, "file1";open B, "file2";chomp(@A=<A>);@B=<B>;for $i (@A){print grep /$i/, @B}'

This User Gave Thanks to bartus11 For This Post:
# 75  
Old 03-29-2011
This is not doing anything unfortunately .... so to clarify for example, given
Code:
chr01    223568

in file1, I want to extract the information from file2, file3 file4 .... in which the lines can look different like in file2
Code:
chr01    levure5    SNP    223568    223568    0.000000    0.447    1|1|1|1|0|1    genotype=R;reference=A;coverage=38;refAlleleCounts=18;refAlleleStarts=12;refAlleleMeanQV=17;novelAlleleCounts=17;novelAlleleStarts=12;novelAlleleMeanQV=20;diColor1=12;diColor2=30;het=1;flag=;gene;ID=YAR050W;Name=YAR050W;gene=FLO1;Alias=FLO1,FLO4,FLO2    38    17

or in file3
Code:
levure5_SNP_Consensus_Calls.txt:SK1.chr01    223568    12    30    A    R    0.000000         38    18    18    17    17    20    1    1    bayes

or another line pattern in another file

I marked in red the pattern searches because depending on the file to extract information from they can be in different fields
I'm not being able to find a way to do it
Cheers Smilie ...

---------- Post updated at 07:40 AM ---------- Previous update was at 07:31 AM ----------

By the way, I tested and your provided code works well for extracting from files which have the same line format as file1 containing the keys, but my problem is different as described above ...
Cheers Smilie
# 76  
Old 03-29-2011
Code:
perl -nae 'BEGIN{open I, "file2";@I=<I>}{print grep {/$F[0]/&&/$F[1]/} @I}' file1

This User Gave Thanks to bartus11 For This Post:
# 77  
Old 03-29-2011
Thanks alot Bartus ...
now this does what I wanted to do .... So basically if I understood well and correct me if I'm wrong, you open and assign data extraction file (file2) to an array @I, and then grep that array with patterns in $F[0] and $F[1] of file1 ... right ??
Thanks for your help and feedback Smilie ... Good day
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print number of lines for files in directory, also print number of unique lines

I have a directory of files, I can show the number of lines in each file and order them from lowest to highest with: wc -l *|sort 15263 Image.txt 16401 reference.txt 40459 richtexteditor.txt How can I also print the number of unique lines in each file? 15263 1401 Image.txt 16401... (15 Replies)
Discussion started by: spacegoose
15 Replies

2. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
Discussion started by: cokedude
7 Replies

3. Shell Programming and Scripting

Look up 2 files and print the concatenated output

file 1 Sun Mar 17 00:01:33 2013 submit , Name="1234" Sun Mar 17 00:01:33 2013 submit , Name="1344" Sun Mar 17 00:01:33 2013 submit , Name="1124" .. .. .. .. Sun Mar 17 00:01:33 2013 submit , Name="8901" file 2 Sun Mar 17 00:02:47 2013 1234 execute SUCCEEDED Sun Mar 17... (24 Replies)
Discussion started by: aravindj80
24 Replies

4. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Hello everyone, Maybe somebody could help me with an awk script. I have this input (field separator is comma ","): 547894982,M|N|J,U|Q|P,98,101,0,1,1 234900027,M|N|J,U|Q|P,98,101,0,1,1 234900023,M|N|J,U|Q|P,98,54,3,1,1 234900028,M|H|J,S|Q|P,98,101,0,1,1 234900030,M|N|J,U|F|P,98,101,0,1,1... (2 Replies)
Discussion started by: Ophiuchus
2 Replies

5. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

6. UNIX for Dummies Questions & Answers

getting unique lines from 2 files

hi i have used comm -13 <(sort 1.txt) <(sort 2.txt) option to get the unique lines that are present in file 2 but not in file 1. but some how i am getting the entire file 2. i would expect few but not all uncommon lines fro my dat. is there anything wrong with the way i used the command? my... (1 Reply)
Discussion started by: anurupa777
1 Replies

7. Shell Programming and Scripting

Compare multiple files and print unique lines

Hi friends, I have multiple files. For now, let's say I have two of the following style cat 1.txt cat 2.txt output.txt Please note that my files are not sorted and in the output file I need another extra column that says the file from which it is coming. I have more than 100... (19 Replies)
Discussion started by: jacobs.smith
19 Replies

8. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

9. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

10. Shell Programming and Scripting

Lines Concatenated with awk

Hello, I have a bash shell script and I use awk to print certain columns of one file and direct the output to another file. If I do a less or cat on the file it looks correct, but if I email the file and open it with Outlook the lines outputted by awk are concatenated. Here is my awk line:... (6 Replies)
Discussion started by: xadamz23
6 Replies
Login or Register to Ask a Question