Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Printing into two files under difference situation Post 302861363 by Smiling Dragon on Tuesday 8th of October 2013 06:05:04 PM
Old 10-08-2013
I find myself doing this task pretty frequently from time to time, it's a bit brain-bending to think about it at first but actually is relatively straightforward:
Code:
For each gff file:
  For each line in the gff file:
    If all entries in the line are in name.txt someplace, add line to "yes" file
    Otherwise add the line to the "no" file

So:
Code:
for file in *.gff
do
  fileprefix=`echo "$file" | sed 's/\.gff$//'`
  cat ${file} | while read line
  do
    include="yes"
    for entry in $line
    do
      grep "$entry" name.txt > /dev/null || include=""
    done
    if [ -n "$include" ]
    then
      echo "${line}" >> "${fileprefix}_yes.gff"
    else
      echo "${line}" >> "${fileprefix}_no.gff"
    fi
  done
done

Not tested but should be at least pretty close.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

A Challenging situation for the MODERATORS

Well, I hope this way you will respond to my inquiries. I have 4 unix servers,with static ips (though i dont think this is an issue)....i can telnet and rlogin from one to the other....if i FTP from on et othe other and try to execute : cd /user return /user : no such file or... (1 Reply)
Discussion started by: BAM
1 Replies

2. UNIX for Advanced & Expert Users

current situation

hello..what is the current situation or lastest version of UNIX?? Is there any where i can read more about it?? (2 Replies)
Discussion started by: joanne6298
2 Replies

3. Shell Programming and Scripting

sed situation

Hi, I'm looking for someone who can think in sed. Basically, I need the trailing characters on every line in a file to be deleted. These characters are all in capitals, and always follow a number, but they often vary in number For instance, on the line: 2006_10_9_p20_TALK I'd want to... (4 Replies)
Discussion started by: Laurel Maury
4 Replies

4. Programming

strange situation in file

Hi All, I am writing some data's into a file from C++ program. The files which i am writing is of fixed length . say 232 in length per line. I am writing as . my c code is as ... (0 Replies)
Discussion started by: arunkumar_mca
0 Replies

5. UNIX for Dummies Questions & Answers

help : crisis situation !!

Hi I had deleted important files from my company server :( the server is HPUX and i don't know how to undo rm command or how to restore the files .. iam appreciate for any help Thanx ... (5 Replies)
Discussion started by: Eisa
5 Replies

6. Shell Programming and Scripting

Comparing Columns and printing the difference from a particular file

Gurus, I have one file which is having multiple columns and also this file is not always contain the exact columns; sometimes it contains 5 columns or 12 columns. Now, I need to find the difference from that particular file. Here is the sample file: param1 | 10 | 20 | 30 | param2 | 10 |... (6 Replies)
Discussion started by: buzzusa
6 Replies

7. Shell Programming and Scripting

Columns comparision of two large size files and printing the difference

Hi Experts, My requirement is to compare the second field/column in two files, if the second column is same in both the files then compare the first field. If the first is not matching then print the first and second fields of both the files. first file (a .txt) < 1210018971FF0000,... (6 Replies)
Discussion started by: krao
6 Replies

8. Shell Programming and Scripting

Cat files situation

Hello, I am PhD student (Biomedical sciences) and very new to Linux. I need some help with the following task : I have files in the following format for their names : An_A1_nnn_R1.txt; An_A1_nnm_R1.txt; An_A1_nnoo_R1.txt An_A2_nnn_R1.txt; An_A2_nnm_R1.txt; An_A2_nno_R1.txt ... (8 Replies)
Discussion started by: Julio Finalet
8 Replies

9. Shell Programming and Scripting

Compare line and printing difference

Hi, I want to compare two files and print out their differences e.g: t1.txt a,b,c,d t2.txt a,b,c,d,e,f Output e,f Currently I do this long about way tr ',' '\n' <t1.txt >t1.tmp tr ',' '\n' <t2.txt >t2.tmp diff t1.tmp t2.tmp > t12.tmp I have to this comparison for 100 files, so... (3 Replies)
Discussion started by: wahi80
3 Replies

10. UNIX for Beginners Questions & Answers

Comparing two files and list the difference with common first line content of both files

I have two file as given below which shows the ACL permissions of each file. I need to compare the source file with target file and list down the difference as specified below in required output. Can someone help me on this ? Source File ************* # file: /local/test_1 # owner: own #... (4 Replies)
Discussion started by: sarathy_a35
4 Replies
tabix(1)						       Bioinformatics tools							  tabix(1)

NAME
bgzip - Block compression/decompression utility tabix - Generic indexer for TAB-delimited genome position files SYNOPSIS
bgzip [-cdhB] [-b virtualOffset] [-s size] [file] tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]] DESCRIPTION
Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com- mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is able to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally. OPTIONS OF TABIX
-p STR Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together with any of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff] -s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval. [1] -b INT Column of start chromosomal position. [4] -e INT Column of end chromosomal position. The end column can be the same as the start column. [5] -S INT Skip first INT lines in the data file. [0] -c CHAR Skip lines started with character CHAR. [#] -0 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based. -h Print the header/meta lines. -B The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line. -f Force to overwrite the index file if it is present. -l List the sequence names stored in the index file. EXAMPLE
(grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz; tabix -p gff sorted.gff.gz; tabix sorted.gff.gz chr1:10,000,000-20,000,000; NOTES
It is straightforward to achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data- bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly works with a lot of widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats. Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases do not. The GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is fast. The same indexing algorithm is known to work effi- ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but not the least, tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web services will be able to get a slice without downloading the entire file. AUTHOR
Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote file access and in-memory caching. SEE ALSO
samtools(1) tabix-0.2.0 11 May 2010 tabix(1)
All times are GMT -4. The time now is 02:33 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy