Sponsored Content
Top Forums Shell Programming and Scripting Print every 5 4th column values as separate row with different first column Post 302771410 by Don Cragun on Wednesday 20th of February 2013 02:47:45 PM
Old 02-20-2013
This seems to do what you want:
Code:
awk 'NR % 5 == 1 {
        # 1st line in a set of 5 lines.
        # Set the saved output string to the 4th field in this line.
        o = $4
        next
}       
NR % 5 {# 2nd through 4th line in a set of 5 lines.
        # Add the 4th field in this line to the saved output string.
        o = o " " $4
        next
}       
{       # 5th line in a set of 5 lines.
        # Print the result of processing this set of 5 lines.
        printf("peak%d %s %s\n", ++oc, o, $4)
}       
END {   # If the input file had less than 5 lines in the last set, print the
        # partial set.
        if(NR % 5) printf("peak%d %s\n", ++oc, o)
}' file

If you are using a Solaris/SusOS system, use /usr/xpg4/bin/awk or nawk instead of awk.
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to read the column and print the values under that column

hi all:b:, how to read the column and print the values under that column ...?? file1 have something like this cat file1 ======= column1, column2,date,column3,column4..... 1, 23 , 12/02/2008,...... 2, 45, 14/05/2008,..... 3, 56, 16/03/2008,..... cat file2 =======... (6 Replies)
Discussion started by: gemini106
6 Replies

2. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies

3. Shell Programming and Scripting

Remove duplicates within row and separate column

Hi all I have following kind of input file ESR1 PA156 leflunomide PA450192 leflunomide CHST3 PA26503 docetaxel Pa4586; thalidomide Pa34958; decetaxel docetaxel docetaxel I want to remove duplicates and I want to separate anything before and after PAxxxx entry into columns or... (1 Reply)
Discussion started by: manigrover
1 Replies

4. UNIX for Dummies Questions & Answers

awk to print first row with forth column and last row with fifth column in each file

file with this content awk 'NR==1 {print $4} && NR==2 {print $5}' file The error is shown with syntax error; what can be done (4 Replies)
Discussion started by: cdfd123
4 Replies

5. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

6. Shell Programming and Scripting

Print row on 4th column to all row

Dear All, I have input : SEG901 5173 9005 5740 SEG902 5227 5284 SEG903 5284 5346 SEG904 5346 9010 SEG905 5400 5456 SEG906 5456 5511 SEG907 5511 9011 SEG908 5572 9015 SEG909 5622 9020 SEG910 5678 5739 SEG911 5739 5796 SEG912 5796 9025 ... (3 Replies)
Discussion started by: attila
3 Replies

7. Shell Programming and Scripting

Read 4th column and print those many rows

Hi, My input file chr1 3217769 3217789 2952725-5 255 + chr1 3260455 3260475 2434087-6 255 - My desired output chr1 3217769 3217789 2952725-1 255 + chr1 3217769 3217789 2952725-2 255 + chr1 3217769 3217789 2952725-3 255 + chr1 3217769 3217789 2952725-4 255 +... (7 Replies)
Discussion started by: jacobs.smith
7 Replies

8. Shell Programming and Scripting

Changing values only in 3rd column and 4th column

#cat file testing test! nipw asdkjasjdk ok! what !ok host server1 check_ssh_disk!102.56.1.101!30!50!/ other host server 2 des check_ssh_disk!192.6.1.10!40!30!/ #grep check file| awk -F! '{print $3,$4}'|awk '{gsub($1,"",$1)}1' 50 30 # Output: (6 Replies)
Discussion started by: kenshinhimura
6 Replies

9. Shell Programming and Scripting

Print first row of column a, last row of column b if column a has the same value

I have a table with this structure: cola colb colc 1 19 lemon 20 31 lemon 32 100 lemon 159 205 cherries 210 500 cherries and need to parse it into this format: cola colb colc 1 100 lemon 159 500 cherries So I need the first row of cola and the last row of colb if colc has the... (3 Replies)
Discussion started by: coppuca
3 Replies

10. UNIX for Beginners Questions & Answers

If pattern in column 3 matches pattern in column 2 (any row), print value in column 1

Hi all, I have searched and searched, but I have not found a solution that quite fits what I am trying to do. I have a long list of data in three columns. Below is a sample: 1,10,8 2,12,10 3,13,12 4,14,14 5,15,16 6,16,18 Please use code tags What I need to do is as follows: If a... (4 Replies)
Discussion started by: bleedingturnip
4 Replies
tabix(1)						       Bioinformatics tools							  tabix(1)

NAME
bgzip - Block compression/decompression utility tabix - Generic indexer for TAB-delimited genome position files SYNOPSIS
bgzip [-cdhB] [-b virtualOffset] [-s size] [file] tabix [-0lf] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]] DESCRIPTION
Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file in.tab.bgz.tbi when region is absent from the com- mand-line. The input data file must be position sorted and compressed by bgzip which has a gzip(1) like interface. After indexing, tabix is able to quickly retrieve data lines overlapping regions specified in the format "chr:beginPos-endPos". Fast data retrieval also works over network if URI is given as a file name and in this case the index file will be downloaded if it is not present locally. OPTIONS OF TABIX
-p STR Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This option should not be applied together with any of -s, -b, -e, -c and -0; it is not used for data retrieval because this setting is stored in the index file. [gff] -s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the index file and thus not used in data retrieval. [1] -b INT Column of start chromosomal position. [4] -e INT Column of end chromosomal position. The end column can be the same as the start column. [5] -S INT Skip first INT lines in the data file. [0] -c CHAR Skip lines started with character CHAR. [#] -0 Specify that the position in the data file is 0-based (e.g. UCSC files) rather than 1-based. -h Print the header/meta lines. -B The second argument is a BED file. When this option is in use, the input file may not be sorted or indexed. The entire input will be read sequentially. Nonetheless, with this option, the format of the input must be specificed correctly on the command line. -f Force to overwrite the index file if it is present. -l List the sequence names stored in the index file. EXAMPLE
(grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz; tabix -p gff sorted.gff.gz; tabix sorted.gff.gz chr1:10,000,000-20,000,000; NOTES
It is straightforward to achieve overlap queries using the standard B-tree index (with or without binning) implemented in all SQL data- bases, or the R-tree index in PostgreSQL and Oracle. But there are still many reasons to use tabix. Firstly, tabix directly works with a lot of widely used TAB-delimited formats such as GFF/GTF and BED. We do not need to design database schema or specialized binary formats. Data do not need to be duplicated in different formats, either. Secondly, tabix works on compressed data files while most SQL databases do not. The GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is fast. The same indexing algorithm is known to work effi- ciently for an alignment with a few billion short reads. SQL databases probably cannot easily handle data at this scale. Last but not the least, tabix supports remote data retrieval. One can put the data file and the index at an FTP or HTTP server, and other users or even web services will be able to get a slice without downloading the entire file. AUTHOR
Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker and modified by Heng Li for remote file access and in-memory caching. SEE ALSO
samtools(1) tabix-0.2.0 11 May 2010 tabix(1)
All times are GMT -4. The time now is 07:08 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy