Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Adding a column to a text file based on mathematical manipulation Post 302690099 by in2nix4life on Wednesday 22nd of August 2012 12:42:00 PM
Old 08-22-2012
If the file is fixed in this format, then here's an easy way:

Code:
awk '{print $1"\t"$2-1"\t"$2"\t"$3}' file 

chr1	788821	            788822	           rs11240777
chr1	1008566	1008567	rs9442372
chr1	1148139	1148140	rs3813199
chr1	1148493	1148494	rs6603781

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Adding a column to a text based on file name

Dear all, Does anyone know how I could to add a column of numbers (1s, or 2s, or..., or 6s) to two-column text files (tab-delimited), where the specific number to be added varies as a function of the file naming? Currently, each of my text files has two columns, so the column with the... (12 Replies)
Discussion started by: rlapate
12 Replies

2. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the first column

I have a tab delimited text file where the first column can take on three different values : 100, 150, 250. I want to extract all the rows where the first column is 100 and put them into a separate text file and so on. This is what my text file looks like now: 100 rs3794811 0.01 0.3434... (1 Reply)
Discussion started by: evelibertine
1 Replies

3. Shell Programming and Scripting

Help with File processing - Adding predefined text to particular record based on condition

I am generating a output: Name Count_1 Count_2 abc 12 12 def 15 14 ghi 16 16 jkl 18 18 mno 7 5 I am sending the output in html email, I want to add the code: <font color="red"> NAME COLUMN record </font> for the Name... (8 Replies)
Discussion started by: karumudi7
8 Replies

4. Shell Programming and Scripting

sorting based on a specified column in a text file

I have a tab delimited file with 5 columns 79 A B 20.2340 6.1488 8.5086 1.3838 87 A B 0.1310 0.0382 0.0054 0.1413 88 A B 46.1651 99.0000 21.8107 0.2203 89 A B 0.1400 0.1132 0.0151 0.1334 114 A B 0.1088 0.0522 0.0057 0.1083 115 A B... (2 Replies)
Discussion started by: Lucky Ali
2 Replies

5. UNIX for Dummies Questions & Answers

Adding tags to a specific column of a space delimited text file

I have a space delimited text file with two columns. I would like to add NA to the first column of the text file. Input: 19625 10.4791768259 19700 10.8146489183 19701 10.9084026759 19702 10.9861346978 19703 10.9304364984 Output: NA19625 10.4791768259 NA19700 10.8146489183... (1 Reply)
Discussion started by: evelibertine
1 Replies

6. UNIX for Dummies Questions & Answers

How to cut from a text file based on value of a specific column?

Hi, I have a tab delimited text file from which I want to cut out specific columns. If the second column equals one, I want to cut out columns 1 and 5 and 6. If the second column equals two, I want to cut out columns 1 and 5 and 7. How do I go about doing that? Thanks! (4 Replies)
Discussion started by: evelibertine
4 Replies

7. UNIX for Dummies Questions & Answers

Adding a column to a text file with row numbers

Hi, I would like to add a new column containing the row numbers to a text file. How do I go about doing that? Thanks! Example input: A X B Y C D Output: A X 1 B Y 2 C D 3 (5 Replies)
Discussion started by: evelibertine
5 Replies

8. UNIX for Dummies Questions & Answers

Mathematical manipulation of a text file

I have a tab delimited file with 4 columns. If the value in the first column, equals the value in the second column, I'd like to have the 4th column multiplied by 2 then add 1. If the value in the first column differs from the value in the second, I'd like to have the 4th column multiplied by 2... (5 Replies)
Discussion started by: evelibertine
5 Replies

9. UNIX for Dummies Questions & Answers

Solaris - Filter columns in text file and adding new column

Hello, I am very now to this, hope you can help, I am looking into editing a file in Solaris, with dinamic collums (lenght varies) and I need 2 things to be made, the fist is to filter the first column and third column from the file bellow file.txt, and create a new file with the 2 filtered... (8 Replies)
Discussion started by: jpbastos
8 Replies

10. Shell Programming and Scripting

Adding values of a column based on another column

Hello, I have a data such as this: ENSGALG00000000189 329 G A 4 2 0 ENSGALG00000000189 518 T C 5 1 0 ENSGALG00000000189 1104 G A 5 1 0 ENSGALG00000000187 3687 G T 5 1 0 ENSGALG00000000187 4533 A T 4 2 0 ENSGALG00000000233 5811 T C 4 2 0 ENSGALG00000000233 5998 C A 5 1 0 I want to... (3 Replies)
Discussion started by: Homa
3 Replies
tabix(1)						       Bioinformatics tools							  tabix(1)

NAME
bgzip - Block compression/decompression utility tabix - Generic indexer for TAB-delimited genome position files SYNOPSIS
bgzip [-cdhB] [-b virtualOffset] [-s size] [file] tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]] DESCRIPTION
Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com- mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is able to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally. OPTIONS OF TABIX
-p STR Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together with any of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff] -s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval. [1] -b INT Column of start chromosomal position. [4] -e INT Column of end chromosomal position. The end column can be the same as the start column. [5] -S INT Skip first INT lines in the data file. [0] -c CHAR Skip lines started with character CHAR. [#] -0 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based. -h Print the header/meta lines. -B The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line. -f Force to overwrite the index file if it is present. -l List the sequence names stored in the index file. EXAMPLE
(grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz; tabix -p gff sorted.gff.gz; tabix sorted.gff.gz chr1:10,000,000-20,000,000; NOTES
It is straightforward to achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data- bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly works with a lot of widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats. Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases do not. The GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is fast. The same indexing algorithm is known to work effi- ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but not the least, tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web services will be able to get a slice without downloading the entire file. AUTHOR
Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote file access and in-memory caching. SEE ALSO
samtools(1) tabix-0.2.0 11 May 2010 tabix(1)
All times are GMT -4. The time now is 06:18 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy