Getting non unique lines from concatenated files

03-28-2011

Registered User

164, 1

Join Date: Mar 2011

Last Activity: 6 August 2015, 12:14 AM EDT

Posts: 164

Thanks Given: 119

Thanked 1 Time in 1 Post

Thanks Bartus ....

yeah does what I want, was nice and simple to understand too ... correct me if I'm wrong .... so 4 different regex matches, matching digits are assigned to $1-4 and these and these then used to in the sed construct to replace DP4 subfeilds with their corresponding values stored in $1-4 ....right? .. But this also broke down $F[7] creating other fields not present in $F[7] of the original file ... yeah? ..... What if I had to report these new fields at the end of each line without breaking the line pattern in the original file ... in other words how do I move these fields to the end of the line or write them to a new file. ??? Could you please enlighten on this ?
Cheers and have a nice evening

pawannoel

View Public Profile for pawannoel

Find all posts by pawannoel

03-28-2011

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

To append those fields at the end of each line:

Code:

perl -ple '/DP4=(\d+),(\d+),(\d+),(\d+)/;$_.="\tfwd_ref_allele=$1\trev_ref_allele=$2\tfwd_non_ref_allele=$3\trev_non_ref_allele=$4\t"' file

To print just those fields (so you can redirect the output to another file):

Code:

perl -nle '/DP4=(\d+),(\d+),(\d+),(\d+)/;print "fwd_ref_allele=$1\trev_ref_allele=$2\tfwd_non_ref_allele=$3\trev_non_ref_allele=$4\t"' file

This User Gave Thanks to bartus11 For This Post:

bartus11

View Public Profile for bartus11

Find all posts by bartus11

03-29-2011

Registered User

164, 1

Join Date: Mar 2011

Last Activity: 6 August 2015, 12:14 AM EDT

Posts: 164

Thanks Given: 119

Thanked 1 Time in 1 Post

Thank you Bartus .... that was so simple

.... shame on me ... sorry I don't know many things

Cheers ... will be back with more

++

---------- Post updated 03-29-11 at 06:13 AM ---------- Previous update was 03-28-11 at 04:10 PM ----------

Hello Bartus and others,
I have another question. So now I have a file which has two feilds. Using the two feilds as keys I want to extract the lines corresponding to these two matches from another file which has other details about them. So for example:

Code:

chr01    223568
chr02    457944
chr02    693666
chr03    326757
chr03    327269
chrm    58283
scplasm1    6153

So using first line as example above which has chr01 and 223568, I want to extract the line from other files containing this pattern, and the same goes for the rest of the lines. Can you please enlighten on how I can iterate over lines of the above file and possibly use regex matches to extract pattern containing lines from other files ??
I'll appreciate your feedback and if you do provide a code could you please provide comments?
Cheers and have a nice day

pawannoel

View Public Profile for pawannoel

Find all posts by pawannoel

03-29-2011

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

Try:

Code:

perl -e 'open A, "file1";open B, "file2";chomp(@A=<A>);@B=<B>;for $i (@A){print grep /$i/, @B}'

This User Gave Thanks to bartus11 For This Post:

bartus11

View Public Profile for bartus11

Find all posts by bartus11

03-29-2011

Registered User

164, 1

Join Date: Mar 2011

Last Activity: 6 August 2015, 12:14 AM EDT

Posts: 164

Thanks Given: 119

Thanked 1 Time in 1 Post

This is not doing anything unfortunately .... so to clarify for example, given

Code:

chr01    223568

in file1, I want to extract the information from file2, file3 file4 .... in which the lines can look different like in file2

Code:

chr01    levure5    SNP    223568    223568    0.000000    0.447    1|1|1|1|0|1    genotype=R;reference=A;coverage=38;refAlleleCounts=18;refAlleleStarts=12;refAlleleMeanQV=17;novelAlleleCounts=17;novelAlleleStarts=12;novelAlleleMeanQV=20;diColor1=12;diColor2=30;het=1;flag=;gene;ID=YAR050W;Name=YAR050W;gene=FLO1;Alias=FLO1,FLO4,FLO2    38    17

or in file3

Code:

levure5_SNP_Consensus_Calls.txt:SK1.chr01    223568    12    30    A    R    0.000000         38    18    18    17    17    20    1    1    bayes

or another line pattern in another file

I marked in red the pattern searches because depending on the file to extract information from they can be in different fields
I'm not being able to find a way to do it
Cheers

...

---------- Post updated at 07:40 AM ---------- Previous update was at 07:31 AM ----------

By the way, I tested and your provided code works well for extracting from files which have the same line format as file1 containing the keys, but my problem is different as described above ...
Cheers

pawannoel

View Public Profile for pawannoel

Find all posts by pawannoel

03-29-2011

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

Code:

perl -nae 'BEGIN{open I, "file2";@I=<I>}{print grep {/$F[0]/&&/$F[1]/} @I}' file1

This User Gave Thanks to bartus11 For This Post:

bartus11

View Public Profile for bartus11

Find all posts by bartus11

03-29-2011

Registered User

164, 1

Join Date: Mar 2011

Last Activity: 6 August 2015, 12:14 AM EDT

Posts: 164

Thanks Given: 119

Thanked 1 Time in 1 Post

Thanks alot Bartus ...
now this does what I wanted to do .... So basically if I understood well and correct me if I'm wrong, you open and assign data extraction file (file2) to an array @I, and then grep that array with patterns in $F[0] and $F[1] of file1 ... right ??
Thanks for your help and feedback

... Good day

pawannoel

View Public Profile for pawannoel

Find all posts by pawannoel

UNIX for Dummies Questions & Answers

Getting non unique lines from concatenated files

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print number of lines for files in directory, also print number of unique lines

Discussion started by: spacegoose

2. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

Discussion started by: cokedude

3. Shell Programming and Scripting

Look up 2 files and print the concatenated output

Discussion started by: aravindj80

4. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Discussion started by: Ophiuchus

5. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

Discussion started by: anurupa777

6. UNIX for Dummies Questions & Answers

getting unique lines from 2 files

Discussion started by: anurupa777

7. Shell Programming and Scripting

Compare multiple files and print unique lines

Discussion started by: jacobs.smith

8. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Discussion started by: krishnix

9. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Discussion started by: shekhar_v4

10. Shell Programming and Scripting

Lines Concatenated with awk

Discussion started by: xadamz23