how to take out common of two lines in a file


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users how to take out common of two lines in a file
# 1  
Old 11-16-2007
Question how to take out common of two lines in a file

I use sed and awk. I am not a big expert but to some extent. I have file like this

PFA0165c ctg_6843
PFA0335w ctg_6843 ctg_6871 ctg_6977 ctg_6654 ctg_7052 ctg_6899 ctg_6840 ctg_7202 ctg_6638
PFA0155c ctg_6877 ctg_7169 ctg_7179 ctg_6843 ctg_6871

Now I want output like this

PFA0165c PFA0335w ctg_6843
PFA0165c PFA0335w PFA0155c ctg_6843

It means 1st columns of a line should be appended to that of next line. And in front of that common of these two lines should be printed. First white space is tab and subsequent single spaces in each line. Common word may be anywhere in line, like ctg_6843 is in 5th column in 3rd line.
# 2  
Old 11-18-2007
Question

Sorry, I just can't understand what are wanting to do Smilie Smilie

I think you are looking for elements that appear in more than one line but I get confused after that.

Could you try explaining again? Perhaps a few more examples might help me see what you mean...
# 3  
Old 11-21-2007
thanks smiling dragon

I thank u for taking interest in this problem.

The input file is like this, first whitespace is tab and subsequent are single spaces.
Here are 3 lines of the file.

PFA0165c ctg_6843
PFA0335w ctg_6843 ctg_6871 ctg_6977 ctg_6654 ctg_7052 ctg_6899 ctg_6840 ctg_7202 ctg_6638
PFA0155c ctg_6877 ctg_7169 ctg_7179 ctg_6843 ctg_6871

I want comparison like this

Compare line1 with line 2 and take out the common
Compare line 2 with line 3 and take out tthe common
Compare line 3 with line 4 and take out the common
- - - - - - - - - - - - - - - - - -- - -- -- -- -
Compare line (n-1) with line n and take out the common


First field of every line is unique and it is tab separated from rest of the line, so in awk u can declare an array a[$1]=$2 with FS="\t". So the only problem is to compare $2 of two adjacent lines.

Now I want to print out
first field of line 1 and line 2 and the common
first field of line 2 and line 3 and the common
- - - - - - - - - - - - - - - - - -- - -- --
first field of line (n-1) and line n and the common


Hence the output will be like this
PFA0165c PFA0335w ctg_6843
PFA0335w PFA0155c ctg_6843 ctg_6871
# 4  
Old 11-21-2007
I think I understand now, for any given line, you want to print the first element, followed by the first element of the line below, followed by any items common to both lines - right?
# 5  
Old 11-21-2007
Because this requires a few things to stay in memory, it looks like it would lend itself well to awk or perl. As my awk is rather weak, I'll try perl:
Code:
$prevleader="";
$previtems="";
while(<>) {
  if (/^([^\s]+)\s+(.*)$/) {
    $leader=$1;
    $items=$2;
    if ($prevleader) {
      print "$prevleader $leader";
      foreach $item (split(/\s/,$items) {
        if ($previtems =~ /\s${item}\s/) {
          print " $item";
        }
      }
      print "\n";
      $prevleader=$leader;
      $previtems=$items;
    }
  }
}

Not tested but it should do the trick or get you close.
I suspect awk can do it better though :/
# 6  
Old 11-22-2007
thanks a lot smiling dragon
# 7  
Old 11-23-2007
awk version

Code:
awk -f test.awk testfile.dat

where test.awk contains,
Code:
NR==1 {old_cnt=split($0,old_arr,"[ \t]");}

NR!=1 {
new_cnt=split($0,new_arr,"[ \t]");
for(i=2;i<=old_cnt;i++)
 for(j=2;j<=new_cnt;j++)
 {
  if(old_arr[i]==new_arr[j]) {cmn=cmn " "  old_arr[i]}
 }

printf("%s %s ",old_arr[1],new_arr[1]);
out_cnt=split(cmn,out_arr," ");
for(i=1;i<=out_cnt;i++)
 printf("%s ",out_arr[i]);
printf("\n");
old_cnt=new_cnt;
for(i=1;i<=new_cnt;i++) old_arr[i]=new_arr[i];
cmn=" ";
}

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find common lines between all of the files in one folder

Could it be possible to find common lines between all of the files in one folder? Just like comm -12 . So all of the files two at a time. I would like all of the outcomes to be written to a different files, and the file names could be simply numbers - 1 , 2 , 3 etc. All of the file names contain... (19 Replies)
Discussion started by: Eve
19 Replies

2. Shell Programming and Scripting

Find common lines with one file and with all of the files in another folder

Hi! I would like to comm -12 with one file and with all of the files in another folder that has a 100 files or more (that file is not in that folder) to find common text lines. I would like to have each case that they have common lines to be written to a different output file and the names of the... (6 Replies)
Discussion started by: Eve
6 Replies

3. UNIX for Dummies Questions & Answers

Filter lines common in two files

Thanks everyone. I got that problem solved. I require one more help here. (Yes, UNIX definitely seems to be fun and useful, and I WILL eventually learn it for myself. But I am now on a different project and don't really have time to go through all the basics. So, I will really appreciate some... (6 Replies)
Discussion started by: latsyrc
6 Replies

4. Shell Programming and Scripting

How to add two common lines on entire file

Hi Hi i have 500 lines of file,each line i need to add another two lines also need to separate with one line for every 3 lines after adding two lines.How to achieve this using shell? For example: Input file : dn: uid=apple,dc=example,dc=com dn: uid=ball,dc=example,dc=com output:... (4 Replies)
Discussion started by: buzzme
4 Replies

5. Shell Programming and Scripting

Merge multiple lines in same file with common key using awk

I've been a Unix admin for nearly 30 years and never learned AWK. I've seen several similar posts here, but haven't been able to adapt the answers to my situation. AWK is so damn cryptic! ;) I have a single file with ~900 lines (CSV list). Each line starts with an ID, but with different stuff... (6 Replies)
Discussion started by: protosd
6 Replies

6. Shell Programming and Scripting

Get common lines from multiple files

FileA chr1 31237964 NP_001018494.1 PUM1 M340L chr1 31237964 NP_055491.1 PUM1 M340L chr1 33251518 NP_037543.1 AK2 H191D chr1 33251518 NP_001616.1 AK2 H191D chr1 57027345 NP_001004303.2 C1orf168 P270S FileB chr1 ... (9 Replies)
Discussion started by: genehunter
9 Replies

7. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 5th column.(tab separated columns) . 3rd and 4th columns corresponds to the row which has highest value for the 5th column. Sample... (2 Replies)
Discussion started by: jaysean
2 Replies

8. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 3rd column.(tab separated columns) Sample input: file1: 111 222 0.1 333 444 0.5 555 666 0.4 file 2: 111 222 0.7 555 666... (5 Replies)
Discussion started by: jaysean
5 Replies

9. UNIX for Dummies Questions & Answers

Copying common lines to a new file

How do i copy the common lines between 2 files to another file? for example file1: asdfg sdf gdsf file2: asdfgh asd sdf xcv file3: sdf (3 Replies)
Discussion started by: khestoi
3 Replies

10. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies
Login or Register to Ask a Question