Getting non unique lines from concatenated files

03-19-2011

Registered User

164, 1

Join Date: Mar 2011

Last Activity: 6 August 2015, 12:14 AM EDT

Posts: 164

Thanks Given: 119

Thanked 1 Time in 1 Post

says

Code:

-bash: seq: command not found

any other solution ?

---------- Post updated at 07:32 PM ---------- Previous update was at 06:57 PM ----------

Hi
I found a thread on Apple forum that seq is not available on 10.4.11 but they say something about using 'jot' ...

Code:

Apple - Support - Discussions - no seq (command-line sequencer) in Mac ...

I tried replacing seq with jot in your code, but it gave me the results for file_10 unique only, although your code made .tmp files but because the result of only file_10 unique was output, its only performed rm -f for file_10.tmp

Do you think this is because of jot or ?? ... if you try jot in ur code do you get some results as me or all results for all files?

Have a nice weekend ...

++

pawannoel

View Public Profile for pawannoel

Find all posts by pawannoel

03-20-2011

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

OK.. so try this solution:

Code:

perl -0e 'BEGIN{$N=10;}for $i (1..$N){for $j (1..$i-1,$i+1..$N){open I,"<file_$j";$a.=<I>}open O,">file_${i}.tmp";print O $a;$a=""}';
perl -le 'BEGIN{$N=10;}for $i (1..$N){print "file_$i unique\n";system "bash -c \"comm -23 <(sort file_$i) <(sort file_$i.tmp);rm -f file_$i.tmp\"";print "\n##############\n"}'

This User Gave Thanks to bartus11 For This Post:

bartus11

View Public Profile for bartus11

Find all posts by bartus11

03-20-2011

Registered User

164, 1

Join Date: Mar 2011

Last Activity: 6 August 2015, 12:14 AM EDT

Posts: 164

Thanks Given: 119

Thanked 1 Time in 1 Post

Hi Bartus,

Thank you very much for this powerful code ... its does exactly what I want and allows comaprison of 2 or more files just by changing $N. But sorry I always have more questions ! Is there a way in which I can choose which files to compare? Let me explain: at the moment if I change $N=2 it compares file_1 and file_2, $N=3 will compare file_1, file_2 and file_3, $N=4 will compare file_1, file_2, file_3 and file_4, and so on.....
What if I wanted to compare only file_1, file_3 and file_7 OR file_2 and file_10 or any other pattern of files of choice ? Is is possible ? I will greatly appreciate your help and if you could try to comment on the code to make it understandable to me that would be just awesome.

Thanks again and have a nice Sunday

Cheers

pawannoel

View Public Profile for pawannoel

Find all posts by pawannoel

03-20-2011

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

I combined those two Perl lines. You can specify files to be compared in the red part of the code:

Code:

perl -l -0e 'BEGIN{@f=(file_3,file_4,file_1);$N=$#f}for $i (0..$N){for $j (0..$i-1,$i+1..$N){open I,"<$f[$j]";$a.=<I>}open O,">files${i}.tmp";print O $a;$a=""};
for $i (0..$N){print "$f[$i] unique\n";system "bash -c \"comm -23 <(sort $f[$i]) <(sort files$i.tmp);rm -f files$i.tmp\"";print "\n##############\n"}'

This User Gave Thanks to bartus11 For This Post:

bartus11

View Public Profile for bartus11

Find all posts by bartus11

03-20-2011

Registered User

164, 1

Join Date: Mar 2011

Last Activity: 6 August 2015, 12:14 AM EDT

Posts: 164

Thanks Given: 119

Thanked 1 Time in 1 Post

Honestly speaking thank you very much .... I'm amazed by the power of scripting ..... its great to have control over the desired output just by changing few things in the code .... now I can make any combination of files and test ... hurray
I know I asked you before but could you please explain what the code is doing? ..... I could use elements of your code to learn and do other things .... I would really like to learn
Thank you once again
Hv a nice day ahead.

pawannoel

View Public Profile for pawannoel

Find all posts by pawannoel

03-20-2011

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

Code:

perl -l -0e 'BEGIN{
@f=(file_3,file_4,file_1);   # define array containing list of files
$N=$#f                       # assign number of files to $N
}
for $i (0..$N){              # iterate over indexes of @f array
for $j (0..$i-1,$i+1..$N){   # once again iterate over indexes of @f array, but excluding the index that $i is holding
open I,"<$f[$j]";            # open file stored under index $j for reading. 
$a.=<I>                      # append that file's contents to $a variable. It is achieved by using -0 option (perl -l -0e), which results in whole file being loaded during single reading.
}
open O,">files${i}.tmp";     # open file "files[number].tmp" for writing
print O $a;                  # write contents of $a variable to that file. So now files[number].tmp contains concatenated contents of all files, excluding the one that is stored under "number" index.
$a=""};                      # clear $a contents
for $i (0..$N){              # iterate over indexes of @f array
print "$f[$i] unique\n";     # print message
system "bash -c \"comm -23 <(sort $f[$i]) <(sort files$i.tmp); # execute external "comm" command using contents of file stored under $i index, and concatenated contents of all the other files (files$i.tmp).
rm -f files$i.tmp\"";        # it is part of system statement too
print "\n##############\n"   # and this is obvious I think
}'

This User Gave Thanks to bartus11 For This Post:

bartus11

View Public Profile for bartus11

Find all posts by bartus11

03-21-2011

Registered User

164, 1

Join Date: Mar 2011

Last Activity: 6 August 2015, 12:14 AM EDT

Posts: 164

Thanks Given: 119

Thanked 1 Time in 1 Post

Thank you very much .... I can now try and experiment

... I will have more questions for sure

Cheers

---------- Post updated 03-21-11 at 06:08 AM ---------- Previous update was 03-20-11 at 10:55 AM ----------

Hi Bartus11,

Your previous code was helpful in finding the unique lines when comapring 2 or more files
How can I change the code to give me the lines which are common in 2 or more files ?
Can you enlighten me on this ?

Thank you and have a nice day

pawannoel

View Public Profile for pawannoel

Find all posts by pawannoel

UNIX for Dummies Questions & Answers

Getting non unique lines from concatenated files

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print number of lines for files in directory, also print number of unique lines

Discussion started by: spacegoose

2. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

Discussion started by: cokedude

3. Shell Programming and Scripting

Look up 2 files and print the concatenated output

Discussion started by: aravindj80

4. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Discussion started by: Ophiuchus

5. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

Discussion started by: anurupa777

6. UNIX for Dummies Questions & Answers

getting unique lines from 2 files

Discussion started by: anurupa777

7. Shell Programming and Scripting

Compare multiple files and print unique lines

Discussion started by: jacobs.smith

8. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Discussion started by: krishnix

9. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Discussion started by: shekhar_v4

10. Shell Programming and Scripting

Lines Concatenated with awk

Discussion started by: xadamz23