Find common numbers and print yes or no

Login or Register for Dates, Times and to Reply

Thread Tools Search this Thread
# 1  
Find common numbers and print yes or no


I have 2 files with following data

First file,

    Autophagy-related protein 16-1 OS=Homo sapiens GN=ATG16L1 PE=1 SV=2,
  Maximum coiled-coil residue probability: 0.657 in position 163.
  Maximum dimeric residue probability:     0.288 in position 163.
  Maximum trimeric residue probability:    0.369 in position 163.
Coil    0.63@  91- 118:c,3    0.60@ 154- 190:c,3

Second file

AC   Q676U5; A3EXK9; A3EXL0; B6ZDH0; Q6IPN1; Q6UXW4; Q6ZVZ5; Q8NCY2;
AC   Q96JV5; Q9H619;
FT   COILED       78    230       Potential.
FT   VARIANT     300    300       T -> A (associated with susceptibility to
FT   VARIANT     307    307       E -> K (in dbSNP:rs1866878).

If number afte sp in first file "Q676U5" matches with the first number after AC in second file "Q676U5"

it should check for second file

"variant" and the number after this if lies within
the numeric range mentioned in first file after

@ 91- 118 @ 154- 190

then expected output should be accordingly that

Q676U5 : No

because number after variant in second file 300 and 307 do not lie in the range of @ 91- 118 @ 154- 190

so expected output is No after the matched first number.

In the same way we can match entries with other number and put the yes or no if the number after variant in second file lies in range afte@in fist files.
# 2  
With over 150 posts, I would expect that you have some idea of how to do this. What have you tried so far?

Your specification of how to determine whether an entry in file1 matches an entry in file2 is very weak. Please clarify the requirements by answering all of the following questions:
  1. Does the first line of an entry in file1 always start with <space>sp?
    1. If not, what else can come before the sp besides a <space> character?
  2. Does the first line in an entry in file1 always use | as the field separator?
  3. Do any other lines in an entry in file1 contain an | character?
  4. Is [sp] in file1 always lowercase letters?
  5. How do we find the ranges to be checked?
    1. How do we recognize that a range is present?
    2. Are lines with ranges the only lines in file1 that contain the @ character?
    3. Does the line that contains the ranges always start with Coil in column 1 (uppercase C and lowercase oil)?
    4. Are there always exactly two ranges to be matched against?
    5. Do ranges always immediately follow an @ character?
  6. What constitutes a successful match on the ranges?
    1. Does just one variant have to match any of the given ranges, or does each variant have to match one of the ranges?
    2. Does the 1st variant have to fall within the 1st range, the 2nd variant have to fall with the 2nd range, etc.?
    3. Will there always be the same number of variants as there are ranges in matched records in file1 and file2?
  7. In file2, is a 2nd contiguous line starting with AC a continuation of the previous line, or is it a separate AC instance? (I.e.I if the 1st line in file1 had been: sp|Q96JV5|A16L1_HUMAN, instead of: sp|Q676U5|A16L1_HUMAN, should it have still matched the same entry in file2?)
  8. In file2, is VARIANT case sensitive?
  9. In file2, will VARIANT only appear on a line starting with FT?
  10. In file1 is there any separator between entries?
  11. In file2 is there any separator between entries?
  12. Approximately how large are file1 and file2?

Last edited by Don Cragun; 09-23-2012 at 04:18 PM.. Reason: trying to fix a list formatting problem
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #790
Difficulty: Easy
The AND gate is a basic digital logic gate that implements logical conjunction.
True or False?

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find Special/Null/Control Chars and Print Line Numbers

Hi All, This might be a basic question... I need to write a script to find all/any Speacial/Null/Control Chars and Print Line Numbers from an input file. Output something like Null Characters in File Name at : Line Numbers Line = Print the line Control Characters in File Name at : Line... (2 Replies)
Discussion started by: Kevin Tivoli
2 Replies

2. UNIX for Dummies Questions & Answers

Print events from two lines with a common identifier

Hi Unix Gurus, I have a long text file, where alarms events are logged and alarm clear event are logged. Both events alarm and alarm clear has common identifier as{xxxxxxxxxx} I need to analyse the time-period for which the alarm sustained. i.e Output: timestamp from both event has... (3 Replies)
Discussion started by: vanand420
3 Replies

3. UNIX for Dummies Questions & Answers

Find common numbers from two very large files using awk or the like

I've got two files that each contain a 16-digit number in positions 1-16. The first file has 63,120 entries all sorted numerically. The second file has 142,479 entries, also sorted numerically. I want to read through each file and output the entries that appear in both. So far I've had no... (13 Replies)
Discussion started by: Scottie1954
13 Replies

4. Homework & Coursework Questions

program to find and print a Fibonacci sequence of numbers. --Errors

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: I am trying to convert a C language program over to Sparc Assembley and I am getting Undefined first referenced... (4 Replies)
Discussion started by: kenjiro310
4 Replies

5. UNIX for Dummies Questions & Answers

Print numbers and associated text belonging to an interval of numbers

##### (0 Replies)
Discussion started by: lucasvs
0 Replies

6. Shell Programming and Scripting

finding common numbers (contents) across 2 or 3 files

I have 3 files which are tab delimited and have numbers in it. file 1 1 2 3 4 5 6 7 File 2 3 5 7 8 File 3 1 (4 Replies)
Discussion started by: Lucky Ali
4 Replies

7. Shell Programming and Scripting

Unix help to find blank lines in a file and print numbers on that line

Hi, I would like to know how to solve one of my problems using expert unix commands. I have a file with occasional blank lines; for example; dertu frthu fghtu frtty frtgy frgtui frgtu ghrye frhutp frjuf I need to edit the file so that the file looks like this; (10 Replies)
Discussion started by: Lucky Ali
10 Replies

8. Shell Programming and Scripting

Perl ? - How to find and print the lowest and highest numbers punched in by the user?

. . . . . . (3 Replies)
Discussion started by: some124one
3 Replies

9. UNIX for Dummies Questions & Answers

Get un common numbers from two files

Hi, I have two files: abc : 50040 123123 31703 cde: 104 97 50040 123123 31703 36609 50534 (3 Replies)
Discussion started by: jingi1234
3 Replies

Featured Tech Videos