Get common lines from multiple files Post: 302437733

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of...

2. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 3rd column.(tab separated columns) Sample input: file1: 111 222 0.1 333 444 0.5 555 666 0.4 file 2: 111 222 0.7 555 666...

3. Shell Programming and Scripting

Common lines from files

Hello guys, I need a script to get the common lines from two files with a criteria that if the first two columns match then I keep the maximum value of the 5th column.(tab separated columns) . 3rd and 4th columns corresponds to the row which has highest value for the 5th column. Sample...

4. Shell Programming and Scripting

Merge multiple lines in same file with common key using awk

I've been a Unix admin for nearly 30 years and never learned AWK. I've seen several similar posts here, but haven't been able to adapt the answers to my situation. AWK is so damn cryptic! ;) I have a single file with ~900 lines (CSV list). Each line starts with an ID, but with different stuff...

5. Shell Programming and Scripting

Find common lines between multiple files

Hello everyone A few years Ago the user radoulov posted a fancy solution for a problem, which was about finding common lines (gene variation names) between multiple samples (files). The code was: awk 'END { for (R in rec) { n = split(rec, t, "/") if (n > 1) dup = dup ?...

6. Shell Programming and Scripting

Compare multiple files, and extract items that are common to ALL files only

I have this code awk 'NR==FNR{a=$1;next} a' file1 file2 which does what I need it to do, but for only two files. I want to make it so that I can have multiple files (for example 30) and the code will return only the items that are in every single one of those files and ignore the ones...

7. UNIX for Dummies Questions & Answers

Filter lines common in two files

Thanks everyone. I got that problem solved. I require one more help here. (Yes, UNIX definitely seems to be fun and useful, and I WILL eventually learn it for myself. But I am now on a different project and don't really have time to go through all the basics. So, I will really appreciate some...

8. Shell Programming and Scripting

Join common patterns in multiple lines into one line

Hi I have a file like 1 2 1 2 3 1 5 6 11 12 10 2 7 5 17 12 I would like to have an output as 1 2 3 5 6 10 7 11 12 17 any help would be highly appreciated Thanks

9. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

10. Shell Programming and Scripting

Find common lines between all of the files in one folder

Could it be possible to find common lines between all of the files in one folder? Just like comm -12 . So all of the files two at a time. I would like all of the outcomes to be written to a different files, and the file names could be simply numbers - 1 , 2 , 3 etc. All of the file names contain...

LEARN ABOUT DEBIAN

tabix

tabix(1)						       Bioinformatics tools							  tabix(1)

NAME

       bgzip - Block compression/decompression utility

       tabix - Generic indexer for TAB-delimited genome position files

SYNOPSIS

       bgzip [-cdhB] [-b virtualOffset] [-s size] [file]

       tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]

DESCRIPTION

       Tabix  indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com-
       mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is
       able  to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over
       network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally.

OPTIONS OF TABIX

       -p STR	 Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together  with  any
		 of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff]

       -s INT	 Column  of  sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval.
		 [1]

       -b INT	 Column of start chromosomal position. [4]

       -e INT	 Column of end chromosomal position. The end column can be the same as the start column. [5]

       -S INT	 Skip first INT lines in the data file. [0]

       -c CHAR	 Skip lines started with character CHAR. [#]

       -0	 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based.

       -h	 Print the header/meta lines.

       -B	 The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will
		 be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line.

       -f	 Force to overwrite the index file if it is present.

       -l	 List the sequence names stored in the index file.

EXAMPLE

       (grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz;

       tabix -p gff sorted.gff.gz;

       tabix sorted.gff.gz chr1:10,000,000-20,000,000;

NOTES

       It  is  straightforward	to  achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data-
       bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly  works  with  a
       lot  of	widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats.
       Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases	do
       not.  The  GenCode annotation GTF can be compressed down to 4%.	Thirdly, tabix is fast. The same indexing algorithm is known to work effi-
       ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but  not  the
       least,  tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web
       services will be able to get a slice without downloading the entire file.

AUTHOR

       Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote  file  access
       and in-memory caching.

SEE ALSO

       samtools(1)

tabix-0.2.0							    11 May 2010 							  tabix(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Discussion started by: The Observer

2. Shell Programming and Scripting

Common lines from files

Discussion started by: jaysean

3. Shell Programming and Scripting

Common lines from files

Discussion started by: jaysean

4. Shell Programming and Scripting

Merge multiple lines in same file with common key using awk

Discussion started by: protosd

5. Shell Programming and Scripting

Find common lines between multiple files

Discussion started by: bibb

6. Shell Programming and Scripting

Compare multiple files, and extract items that are common to ALL files only

Discussion started by: castrojc

7. UNIX for Dummies Questions & Answers

Filter lines common in two files

Discussion started by: latsyrc

8. Shell Programming and Scripting

Join common patterns in multiple lines into one line

Discussion started by: Harrisham

9. Shell Programming and Scripting

Join columns across multiple lines in a Text based on common column using BASH

Discussion started by: nv186000

10. Shell Programming and Scripting

Find common lines between all of the files in one folder

Discussion started by: Eve

LEARN ABOUT DEBIAN

tabix