Getting non unique lines from concatenated files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Getting non unique lines from concatenated files
# 15  
Old 03-19-2011
says
Code:
-bash: seq: command not found

any other solution ?

---------- Post updated at 07:32 PM ---------- Previous update was at 06:57 PM ----------

Hi
I found a thread on Apple forum that seq is not available on 10.4.11 but they say something about using 'jot' ...
I tried replacing seq with jot in your code, but it gave me the results for file_10 unique only, although your code made .tmp files but because the result of only file_10 unique was output, its only performed rm -f for file_10.tmp

Do you think this is because of jot or ?? ... if you try jot in ur code do you get some results as me or all results for all files?

Have a nice weekend ...

++
# 16  
Old 03-20-2011
OK.. so try this solution:
Code:
perl -0e 'BEGIN{$N=10;}for $i (1..$N){for $j (1..$i-1,$i+1..$N){open I,"<file_$j";$a.=<I>}open O,">file_${i}.tmp";print O $a;$a=""}';
perl -le 'BEGIN{$N=10;}for $i (1..$N){print "file_$i unique\n";system "bash -c \"comm -23 <(sort file_$i) <(sort file_$i.tmp);rm -f file_$i.tmp\"";print "\n##############\n"}'

This User Gave Thanks to bartus11 For This Post:
# 17  
Old 03-20-2011
Hi Bartus,

Thank you very much for this powerful code ... its does exactly what I want and allows comaprison of 2 or more files just by changing $N. But sorry I always have more questions ! Is there a way in which I can choose which files to compare? Let me explain: at the moment if I change $N=2 it compares file_1 and file_2, $N=3 will compare file_1, file_2 and file_3, $N=4 will compare file_1, file_2, file_3 and file_4, and so on.....
What if I wanted to compare only file_1, file_3 and file_7 OR file_2 and file_10 or any other pattern of files of choice ? Is is possible ? I will greatly appreciate your help and if you could try to comment on the code to make it understandable to me that would be just awesome.

Thanks again and have a nice Sunday Smilie

Cheers
# 18  
Old 03-20-2011
I combined those two Perl lines. You can specify files to be compared in the red part of the code:
Code:
perl -l -0e 'BEGIN{@f=(file_3,file_4,file_1);$N=$#f}for $i (0..$N){for $j (0..$i-1,$i+1..$N){open I,"<$f[$j]";$a.=<I>}open O,">files${i}.tmp";print O $a;$a=""};
for $i (0..$N){print "$f[$i] unique\n";system "bash -c \"comm -23 <(sort $f[$i]) <(sort files$i.tmp);rm -f files$i.tmp\"";print "\n##############\n"}'

This User Gave Thanks to bartus11 For This Post:
# 19  
Old 03-20-2011
Honestly speaking thank you very much .... I'm amazed by the power of scripting ..... its great to have control over the desired output just by changing few things in the code .... now I can make any combination of files and test ... hurray
I know I asked you before but could you please explain what the code is doing? ..... I could use elements of your code to learn and do other things .... I would really like to learn
Thank you once again
Hv a nice day ahead.
Smilie
# 20  
Old 03-20-2011
Code:
perl -l -0e 'BEGIN{
@f=(file_3,file_4,file_1);   # define array containing list of files
$N=$#f                       # assign number of files to $N
}
for $i (0..$N){              # iterate over indexes of @f array
for $j (0..$i-1,$i+1..$N){   # once again iterate over indexes of @f array, but excluding the index that $i is holding
open I,"<$f[$j]";            # open file stored under index $j for reading. 
$a.=<I>                      # append that file's contents to $a variable. It is achieved by using -0 option (perl -l -0e), which results in whole file being loaded during single reading.
}
open O,">files${i}.tmp";     # open file "files[number].tmp" for writing
print O $a;                  # write contents of $a variable to that file. So now files[number].tmp contains concatenated contents of all files, excluding the one that is stored under "number" index.
$a=""};                      # clear $a contents
for $i (0..$N){              # iterate over indexes of @f array
print "$f[$i] unique\n";     # print message
system "bash -c \"comm -23 <(sort $f[$i]) <(sort files$i.tmp); # execute external "comm" command using contents of file stored under $i index, and concatenated contents of all the other files (files$i.tmp).
rm -f files$i.tmp\"";        # it is part of system statement too
print "\n##############\n"   # and this is obvious I think
}'

This User Gave Thanks to bartus11 For This Post:
# 21  
Old 03-21-2011
Thank you very much .... I can now try and experiment Smilie ... I will have more questions for sure Smilie
Cheers

---------- Post updated 03-21-11 at 06:08 AM ---------- Previous update was 03-20-11 at 10:55 AM ----------

Hi Bartus11,

Your previous code was helpful in finding the unique lines when comapring 2 or more files
How can I change the code to give me the lines which are common in 2 or more files ?
Can you enlighten me on this ?

Thank you and have a nice day Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print number of lines for files in directory, also print number of unique lines

I have a directory of files, I can show the number of lines in each file and order them from lowest to highest with: wc -l *|sort 15263 Image.txt 16401 reference.txt 40459 richtexteditor.txt How can I also print the number of unique lines in each file? 15263 1401 Image.txt 16401... (15 Replies)
Discussion started by: spacegoose
15 Replies

2. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
Discussion started by: cokedude
7 Replies

3. Shell Programming and Scripting

Look up 2 files and print the concatenated output

file 1 Sun Mar 17 00:01:33 2013 submit , Name="1234" Sun Mar 17 00:01:33 2013 submit , Name="1344" Sun Mar 17 00:01:33 2013 submit , Name="1124" .. .. .. .. Sun Mar 17 00:01:33 2013 submit , Name="8901" file 2 Sun Mar 17 00:02:47 2013 1234 execute SUCCEEDED Sun Mar 17... (24 Replies)
Discussion started by: aravindj80
24 Replies

4. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Hello everyone, Maybe somebody could help me with an awk script. I have this input (field separator is comma ","): 547894982,M|N|J,U|Q|P,98,101,0,1,1 234900027,M|N|J,U|Q|P,98,101,0,1,1 234900023,M|N|J,U|Q|P,98,54,3,1,1 234900028,M|H|J,S|Q|P,98,101,0,1,1 234900030,M|N|J,U|F|P,98,101,0,1,1... (2 Replies)
Discussion started by: Ophiuchus
2 Replies

5. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

6. UNIX for Dummies Questions & Answers

getting unique lines from 2 files

hi i have used comm -13 <(sort 1.txt) <(sort 2.txt) option to get the unique lines that are present in file 2 but not in file 1. but some how i am getting the entire file 2. i would expect few but not all uncommon lines fro my dat. is there anything wrong with the way i used the command? my... (1 Reply)
Discussion started by: anurupa777
1 Replies

7. Shell Programming and Scripting

Compare multiple files and print unique lines

Hi friends, I have multiple files. For now, let's say I have two of the following style cat 1.txt cat 2.txt output.txt Please note that my files are not sorted and in the output file I need another extra column that says the file from which it is coming. I have more than 100... (19 Replies)
Discussion started by: jacobs.smith
19 Replies

8. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

9. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

10. Shell Programming and Scripting

Lines Concatenated with awk

Hello, I have a bash shell script and I use awk to print certain columns of one file and direct the output to another file. If I do a less or cat on the file it looks correct, but if I email the file and open it with Outlook the lines outputted by awk are concatenated. Here is my awk line:... (6 Replies)
Discussion started by: xadamz23
6 Replies
Login or Register to Ask a Question