Print the overlapping entries in 2 files to separate file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Print the overlapping entries in 2 files to separate file
# 8  
Old 01-22-2014
ya the overlap happens with neighbor lines only
# 9  
Old 01-22-2014
Your overlap definition is incorrect IMHO.
I have implemented this shell function
Code:
# $1=lo[file1]
# $2=hi[file1]
# $3=lo[file2]
# $4=hi[file2]
overlap(){
[ $1 -le $4 -a $4 -le $2 -o $3 -le $2 -a $2 -le $4 ]
}

Please execute it and test with e.g.
Code:
overlap  22 25  24 25  && echo overlap

--
I have never implemented a merge, still struggling with it, need another time slot to continue.
It is certainly doable in shell.
Opening two files and read from either is done like this
Code:
{
read line1
echo "$line1"
read line2 <&3
echo "$line2"
} <input1 3<input2


Last edited by MadeInGermany; 01-22-2014 at 06:58 PM..
# 10  
Old 01-23-2014
Hi, this is what i am calling as overlap
Code:
F1:   |-----|
F2:      |-----|

F1:   |-----|
F2:     |--|

F1:   |-----|
F2: |-----|

F1:   |-----|
F2:  |---------|

# 11  
Old 01-27-2014
The following seems to perform a correct merge.
The overlap function has -le (less or equal) that means "equal boundaries is an overlap". Otherwise must be -lt (less than).
I left some debug code in. If you think it behaves wrongly, activate the debug=echo and disable the debug=:.
Code:
#!/bin/sh
#
set -f # no globbing
#
#debug=echo
debug=:
#
# $1=lo[file1]
# $2=hi[file1]
# $3=lo[file2]
# $4=hi[file2]
overlap(){
[ $1 -le $4 -a $4 -le $2 -o $3 -le $2 -a $2 -le $4 ]
}
#
{
readfrom=1
while :
do
  if [ $readfrom -eq 1 ]
  then
    read line || {
      readfrom=2
      read line <&3 || break
    }
  else
    read line <&3 || {
      readfrom=1
      read line || break
    }
  fi
  set -- $line
  if [ -n "$hip" ]
  then
    if overlap $lop $hip $2 $3
    then
      echo "$line"
      lop=$2; hip=$3
    else
$debug no overlap $lop,$hip $2,$3
      if [ -n "$saved" ]
      then
$debug restore $save
        saved=""
        set -- $save
        overlap $lop $hip $2 $3 || echo ""
        echo "$save"
        lop=$2; hip=$3
      else
        if [ $readfrom -eq 1 ]
        then
          readfrom=2
        else
          readfrom=1
        fi
$debug change to input$readfrom
      fi
$debug save $line
      save=$line
      saved=1
    fi
  else
    echo "$line"
    lop=$2; hip=$3
  fi
done
if [ -n "$saved" ]
then
$debug restore $save
  set -- $save
  overlap $lop $hip $2 $3 || echo ""
  echo "$save"
fi
} <input1 3<input2

This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to print line is values between two fields in separate file

I am trying to use awk to find all the $3 values in file2 that are between $2 and $3 in file1. If a value in $3 of file2 is between the file1 fields then it is printed along with the $6 value in file1. Both file1 and file2 are tab-delimited as well as the desired output. If there is nothing to... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. Shell Programming and Scripting

Identify the overlapping and non overlapping regions

file1 chr pos1 pos2 pos3 pos4 1)chr1 1000 2000 3000 4000 2)chr1 1380 1480 6800 7800 3)chr1 6700 7700 1200 2200 4)chr2 8500 9500 5670 6670 file2 chr pos1 pos2 pos3 pos4 1)chr2 8500 9500 5000 6000 2)chr1 6700 7700 1200 2200 3)chr1 1380 1480 6700 7700 4)chr1 1000 2000 4900 5900 I... (2 Replies)
Discussion started by: data_miner
2 Replies

3. Programming

Read text from file and print each character in separate line

performing this code to read from file and print each character in separate line works well with ASCII encoded text void preprocess_file (FILE *fp) { int cc; for (;;) { cc = getc (fp); if (cc == EOF) break; printf ("%c\n", cc); } } int main(int... (1 Reply)
Discussion started by: khaled79
1 Replies

4. Shell Programming and Scripting

Compare 2 files and print matches and non-matches in separate files

Hi all, I have two files, chap.txt and complex.txt. chap.txt looks like this: a d l m r k complex.txt looks like this: a c d e l m n j a d l p q r c p r m ......... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

5. UNIX for Dummies Questions & Answers

Merge two files with non-overlapping identities

Hi All, I wish to merge two files: file1: with header rsSNP-ID Chromosome Chr-Pos rs171 1 175261679 rs242 1 20869461 rs538 1 6160958 file2: without header disease:AAT deficiency:M0525101 rs1243168 20109307 1 disease:AAT deficiency:M0525101 rs4900229 20109307 1... (3 Replies)
Discussion started by: luoruicd
3 Replies

6. UNIX for Dummies Questions & Answers

Awk: Print out overlapping chunks of file - rows 0-20,10-30,20-40 etc.

First time poster, but the forum has saved my bacon more times than... Lots. Anyway, I have a text file, and wanted to use Awk (or any other sensible program) to print out overlapping sections, or arbitrary length. To describe by example, for file 1 2 3 4 5 etc... I want the out put... (3 Replies)
Discussion started by: matfald
3 Replies

7. Shell Programming and Scripting

awk print header as text from separate file with getline

I would like to print the output beginning with a header from a seperate file like this: awk 'BEGIN{FS="_";print ((getline < "header.txt")>0)} { if (! ($0 ~ /EL/ ) print }" input.txtWhat am i doing wrong? (4 Replies)
Discussion started by: sdf
4 Replies

8. Shell Programming and Scripting

awk/sed script to print each line to a separate named file

I have a large 3479 line .csv file, the content of which looks likes this: 1;0;177;170;Guadeloupe;x 2;127;171;179;Antigua and Barbuda;x 3;170;144;2;Umpqua;x 4;170;126;162;Coos Bay;x ... 1205;46;2;244;Unmak Island;x 1206;47;2;248;Yunaska Island;x 1207;0;2;240;north sea;x... (5 Replies)
Discussion started by: kalelovil
5 Replies

9. Shell Programming and Scripting

extract nth line of all files and print in output file on separate lines.

Hello UNIX experts, I have 124 text files in a directory. I want to extract the 45678th line of all the files sequentialy by file names. The extracted lines should be printed in the output file on seperate lines. e.g. The input Files are one.txt, two.txt, three.txt, four.txt The cat of four... (1 Reply)
Discussion started by: yogeshkumkar
1 Replies

10. Shell Programming and Scripting

Break a file into separate files

Hello I am facing a scenario where I have a file with XML content and I am running shell script over it. But the problem is the XML is getting updated with new services. In the below scenario, my script takes values from the xml file from one service name say ABCD. Since there are multiple, it is... (8 Replies)
Discussion started by: chiru_h
8 Replies
Login or Register to Ask a Question