Printing into two files under difference situation


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Printing into two files under difference situation
# 1  
Old 10-08-2013
Printing into two files under difference situation

I want to printing into two files under difference situation.
For example, file 1 name.txt
Code:
>gma-miR172a    Glyma02g28845
>gma-miR1513a-3p        Glyma02g15840
>gma-miR166a-5p Glyma02g15840
>gma-miR1530    Glyma02g15130
>gma-miR1507a   Glyma02g01841

File 2 a.gff
Code:
Glyma01g07930   Glyma02g13330
Glyma01g07390   Glyma02g13120

I want to check each line of file a.gff; if the two element"Glyma01g07930" and "Glyma02g13330" are both existing in file 1, print this line into file a_yes.gff; otherwise print the line into a_no.gff.

However, I have bunch of file 2 to check, like a.gff,b.gff,c.gff.....
How can I do that?
Both linux command line and perl script would be appreciated. Thank you.

Last edited by Scott; 10-09-2013 at 01:13 AM.. Reason: Code tags
# 2  
Old 10-08-2013
I find myself doing this task pretty frequently from time to time, it's a bit brain-bending to think about it at first but actually is relatively straightforward:
Code:
For each gff file:
  For each line in the gff file:
    If all entries in the line are in name.txt someplace, add line to "yes" file
    Otherwise add the line to the "no" file

So:
Code:
for file in *.gff
do
  fileprefix=`echo "$file" | sed 's/\.gff$//'`
  cat ${file} | while read line
  do
    include="yes"
    for entry in $line
    do
      grep "$entry" name.txt > /dev/null || include=""
    done
    if [ -n "$include" ]
    then
      echo "${line}" >> "${fileprefix}_yes.gff"
    else
      echo "${line}" >> "${fileprefix}_no.gff"
    fi
  done
done

Not tested but should be at least pretty close.
# 3  
Old 10-08-2013
Thank you so much!But is it perl or shell script? I am sorry for the easy questions. I usually working on Perl, but it is not like the regular one. How can I run it? Run under the linux command line?

---------- Post updated at 07:55 PM ---------- Previous update was at 07:37 PM ----------

thank you so much for the reply, but I am really a newbie in coding, could u please add some comment on each line so that it is easy for me to refer and adjust. Thank you so much and I really appreciate all the help.
# 4  
Old 10-08-2013
Quote:
Originally Posted by grace_shen
Thank you so much!But is it perl or shell script? I am sorry for the easy questions. I usually working on Perl, but it is not like the regular one. How can I run it? Run under the linux command line?

---------- Post updated at 07:55 PM ---------- Previous update was at 07:37 PM ----------

thank you so much for the reply, but I am really a newbie in coding, could u please add some comment on each line so that it is easy for me to refer and adjust. Thank you so much and I really appreciate all the help.
It's unix commandline, but also works in a shell script.
If you make a new file, put:
Code:
#!/bin/sh

As the first line, then the code below in it. You'll be able to run it as a script.
Otherwise, can just copy paste the entire thing into your shell on the commandline at it'll work.

Version with comments:
Code:
#Run the following for every file in the current directory that ends with ".gff" - note that it might
# get confused if you have gff files with a space in the filename.
for file in *.gff
do
  # Find the filename prefix by removing the ".gff" part of the filename
  fileprefix=`echo "$file" | sed 's/\.gff$//'`
  # Go through the .gff file line by line
  cat ${file} | while read line
  do
    # by default, assume we want to include all lines in the "yes" file
    include="yes"
    # For each word on the given line, see if it's present in "name.txt"
    for entry in $line
    do
      # If it is not present, then we know we need to put this line in the "no" file instead
      grep "$entry" name.txt > /dev/null || include=""
    done
    # If we get to the end and $include still has something in it ("yes") then we know we found all the entries on this line
    # So we add it to the "yes" file, otherwise the "no" file
    if [ -n "$include" ]
    then
      echo "${line}" >> "${fileprefix}_yes.gff"
    else
      echo "${line}" >> "${fileprefix}_no.gff"
    fi
  done
  # move onto the next line in the current .gff file
done
# move onto the next gff file

I suspect that perl would be able to do this a little more effiicently by loading the name.txt file into memory, allowing faster checks. But I usually only reach for perl if shell can't do it within a reasonable amount of time/effort.
# 5  
Old 10-10-2013
You may also want to try this:
Code:
EXT=("YES" "NO")
for FN in *.gff
  do while read line
       do grep -Eqi "${line// /|}" name.txt
          echo "$line" >> ${FN%.gff}_${EXT[$?]}.gff
       done <"$FN"
  done

It depends on the two columns in *.gff being separated by one single blank only; if there's always and constantly three, try ${line// /|} for grep

Last edited by RudiC; 10-10-2013 at 03:33 PM.. Reason: pretty print
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Comparing two files and list the difference with common first line content of both files

I have two file as given below which shows the ACL permissions of each file. I need to compare the source file with target file and list down the difference as specified below in required output. Can someone help me on this ? Source File ************* # file: /local/test_1 # owner: own #... (4 Replies)
Discussion started by: sarathy_a35
4 Replies

2. Shell Programming and Scripting

Compare line and printing difference

Hi, I want to compare two files and print out their differences e.g: t1.txt a,b,c,d t2.txt a,b,c,d,e,f Output e,f Currently I do this long about way tr ',' '\n' <t1.txt >t1.tmp tr ',' '\n' <t2.txt >t2.tmp diff t1.tmp t2.tmp > t12.tmp I have to this comparison for 100 files, so... (3 Replies)
Discussion started by: wahi80
3 Replies

3. Shell Programming and Scripting

Cat files situation

Hello, I am PhD student (Biomedical sciences) and very new to Linux. I need some help with the following task : I have files in the following format for their names : An_A1_nnn_R1.txt; An_A1_nnm_R1.txt; An_A1_nnoo_R1.txt An_A2_nnn_R1.txt; An_A2_nnm_R1.txt; An_A2_nno_R1.txt ... (8 Replies)
Discussion started by: Julio Finalet
8 Replies

4. Shell Programming and Scripting

Columns comparision of two large size files and printing the difference

Hi Experts, My requirement is to compare the second field/column in two files, if the second column is same in both the files then compare the first field. If the first is not matching then print the first and second fields of both the files. first file (a .txt) < 1210018971FF0000,... (6 Replies)
Discussion started by: krao
6 Replies

5. Shell Programming and Scripting

Comparing Columns and printing the difference from a particular file

Gurus, I have one file which is having multiple columns and also this file is not always contain the exact columns; sometimes it contains 5 columns or 12 columns. Now, I need to find the difference from that particular file. Here is the sample file: param1 | 10 | 20 | 30 | param2 | 10 |... (6 Replies)
Discussion started by: buzzusa
6 Replies

6. UNIX for Dummies Questions & Answers

help : crisis situation !!

Hi I had deleted important files from my company server :( the server is HPUX and i don't know how to undo rm command or how to restore the files .. iam appreciate for any help Thanx ... (5 Replies)
Discussion started by: Eisa
5 Replies

7. Programming

strange situation in file

Hi All, I am writing some data's into a file from C++ program. The files which i am writing is of fixed length . say 232 in length per line. I am writing as . my c code is as ... (0 Replies)
Discussion started by: arunkumar_mca
0 Replies

8. Shell Programming and Scripting

sed situation

Hi, I'm looking for someone who can think in sed. Basically, I need the trailing characters on every line in a file to be deleted. These characters are all in capitals, and always follow a number, but they often vary in number For instance, on the line: 2006_10_9_p20_TALK I'd want to... (4 Replies)
Discussion started by: Laurel Maury
4 Replies

9. UNIX for Advanced & Expert Users

current situation

hello..what is the current situation or lastest version of UNIX?? Is there any where i can read more about it?? (2 Replies)
Discussion started by: joanne6298
2 Replies

10. UNIX for Dummies Questions & Answers

A Challenging situation for the MODERATORS

Well, I hope this way you will respond to my inquiries. I have 4 unix servers,with static ips (though i dont think this is an issue)....i can telnet and rlogin from one to the other....if i FTP from on et othe other and try to execute : cd /user return /user : no such file or... (1 Reply)
Discussion started by: BAM
1 Replies
Login or Register to Ask a Question