Common lines from files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Common lines from files
# 1  
Old 08-28-2010
Question Common lines from files

Hello guys,

I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 5th column.(tab separated columns) . 3rd and 4th columns corresponds to the row which has highest value for the 5th column.

Sample input:

file1:
Code:
  111 222 ABC PQR 0.1
333 444 xxx yyy 0.5
555 666 PQR DEF 0.4

file 2:

Code:
111 222 abc xyz 0.7
555 666 def pqr 0.3
777 888 rst mno 0.4

sample output:

Code:
  111 222 abc xyz 0.7
555 666 PQR DEF 0.4

This is being done for all the files in the same format in a directory. I have the script, but it does not consider the conditions for 3rd and 4th columns.

Code:
  awk 'NR==FNR{a[$1" "$2]=$3;next;}($1" "$2 in a){if(a[$1" "$2] > $3) print $1, $2,a[$1" "$2]; else print;}'

Please help. Thanks in advance.
# 2  
Old 08-28-2010
Code:
awk 'NR==FNR{a[$1" "$2]=$3" "$4;b[$1" "$2]=$5;next}$1" "$2 in a{c[$1" "$2];if(b[$1" "$2]<$5){b[$1" "$2]=$5;a[$1" "$2]=$3" "$4}}
END{for (i in c)print i,a[i],b[i]}' file1 file2

# 3  
Old 08-28-2010
MySQL

Code:
# cat file1
111     222     ABC     PQR     0.1
333     444     xxx     yyy     0.5
555     666     PQR     DEF     0.4
 
# cat file2
111     222     abc     xyz     0.7
555     666     def     pqr     0.3
777     888     rst     mno     0.4

Code:
# ./justdoit
111     222     abc     xyz     0.7
555     666     PQR     DEF     0.4

Code:
## justdoit ##
#!/bin/bash
cnt=$(sed -n '$=' file1) ; cntx=$(sed -n '$=' file2)
if [[ $cnt -gt $cntx ]] ; then
  count=$cntx
 else
  count=$cnt
fi
rm -f justtmp
x=1
while [ $(( count -= 1 )) -gt -1 ]
 do
  a=$(sed -n "$x s/^\([^\t]*\)\t\([^\t]*\).*/\1 \2/p" file1)  # first and second tab chars
  for i in $(seq 1 3)
   do
    a1=$(sed -n "$i s/^\([^\t]*\)\t\([^\t]*\).*/\1 \2/p" file2)
    if [[ $a == $a1 ]] ; then
      line1=$(echo " `sed -n "$x s/.*\t\(.*\)$/\1/p" file1` * 1000" | bc | sed 's/\..*//')
      line2=$(echo " `sed -n "$i s/.*\t\(.*\)$/\1/p" file2` * 1000" | bc | sed 's/\..*//')
       if [[ $line1 -lt $line2 ]] ; then
        sed -n "$x s/^\([^\t]*\)\t\([^\t]*\).*/\1\t\2/p" file1 > tmpx
        sed -n "$i s/^[^\t]*\t[^\t]*\t//p" file2 > tmpX
        paste -d"\t" tmpx tmpX >> justtmp
       else
        sed -n "$i s/^\([^\t]*\)\t\([^\t]*\).*/\1\t\2/p" file2 > tmpx
        sed -n "$x s/^[^\t]*\t[^\t]*\t//p" file1 > tmpX
        paste -d"\t" tmpx tmpX >> justtmp
       fi
    fi
   done
   let x=$x+1
 done
more justtmp

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk: output lines with common field to separate files

Hi, A beginner one. my input.tab (tab-separated): h1 h2 h3 h4 h5 item1 grpA 2 3 customer1 item2 grpB 4 6 customer1 item3 grpA 5 9 customer1 item4 grpA 0 0 customer2 item5 grpA 9 1 customer2 objective: output a file for each customer ($5) with the item number ($1) only if $2 matches... (2 Replies)
Discussion started by: beca123456
2 Replies

2. Shell Programming and Scripting

Find common lines between all of the files in one folder

Could it be possible to find common lines between all of the files in one folder? Just like comm -12 . So all of the files two at a time. I would like all of the outcomes to be written to a different files, and the file names could be simply numbers - 1 , 2 , 3 etc. All of the file names contain... (19 Replies)
Discussion started by: Eve
19 Replies

3. Shell Programming and Scripting

Find common lines with one file and with all of the files in another folder

Hi! I would like to comm -12 with one file and with all of the files in another folder that has a 100 files or more (that file is not in that folder) to find common text lines. I would like to have each case that they have common lines to be written to a different output file and the names of the... (6 Replies)
Discussion started by: Eve
6 Replies

4. Shell Programming and Scripting

Finding out the common lines in two files using 4 fields with the help of awk and UNIX

Dear All, I have 2 files. If field 1, 2, 4 and 5 matches in both file1 and file2, I want to print the whole line of file1 and file2 one after another in my output file. File1: sc2/80 20 . A T 86 F=5;U=4 sc2/60 55 . G T ... (1 Reply)
Discussion started by: NamS
1 Replies

5. UNIX for Dummies Questions & Answers

Filter lines common in two files

Thanks everyone. I got that problem solved. I require one more help here. (Yes, UNIX definitely seems to be fun and useful, and I WILL eventually learn it for myself. But I am now on a different project and don't really have time to go through all the basics. So, I will really appreciate some... (6 Replies)
Discussion started by: latsyrc
6 Replies

6. Shell Programming and Scripting

Find common lines between multiple files

Hello everyone A few years Ago the user radoulov posted a fancy solution for a problem, which was about finding common lines (gene variation names) between multiple samples (files). The code was: awk 'END { for (R in rec) { n = split(rec, t, "/") if (n > 1) dup = dup ?... (5 Replies)
Discussion started by: bibb
5 Replies

7. Shell Programming and Scripting

Get common lines from multiple files

FileA chr1 31237964 NP_001018494.1 PUM1 M340L chr1 31237964 NP_055491.1 PUM1 M340L chr1 33251518 NP_037543.1 AK2 H191D chr1 33251518 NP_001616.1 AK2 H191D chr1 57027345 NP_001004303.2 C1orf168 P270S FileB chr1 ... (9 Replies)
Discussion started by: genehunter
9 Replies

8. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 3rd column.(tab separated columns) Sample input: file1: 111 222 0.1 333 444 0.5 555 666 0.4 file 2: 111 222 0.7 555 666... (5 Replies)
Discussion started by: jaysean
5 Replies

9. Shell Programming and Scripting

Drop common lines at head/tail of a large set of files

Hi! I have a large set of pairs of text files (each pair in their own subdirectory) and each pair shares head/tail (a couple of first and last lines) but differs in the middle part. I need to delete the heads/tails and keep only the middle portions in which they differ. The lengths of heads/tails... (1 Reply)
Discussion started by: dobryden
1 Replies

10. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies
Login or Register to Ask a Question