Parse and display specific columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parse and display specific columns
# 1  
Old 03-01-2012
Parse and display specific columns

Hi.. I am in help of displaying this specific case. I have multiple files where i have to display accordingly.

Input file

 
##INFO1
##INFO2
##INFO3
#CHROM POS INFO 57.sorted.bam 58.sorted.bam 59.sorted.bam 34.sorted.bam 55.sorted.bam
12 59 DP=157;VDB=0.0005;AF1=0.203;AC1=19;DP4=72,57,9,9; 0/0:0,42,253:14:0:45 0/0:0,0,0:0:0:4 0/0:0,0,0:0:0:4 0/0:0,0,0:0:0:4 0/0:0,0,0:0:0:4
12 68 DP=156;VDB=0.0002;AF1=0.4985;G3=8.399e-06,1,1.307e-05; 0/1:36,0,168:13:0:39 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3
12 72 DP=160;VDB=0.0003;AF1=0.5132;G3=3.712e-06,1,1.79e-05; 0/1:63,0,202:14:0:66 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3
12 99 DP=164;VDB=0.0002;AF1=0.5142;G3=3.45e-06,1,1.806e-05; 0/1:61,0,183:14:0:64 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3
12 100 DP=164;VDB=0.0002;AF1=0.5124;G3=3.882e-06,1,1.781e-05; 0/1:65,0,182:14:0:68 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3

First i have to remove the ##INFO lines and just display the columns
Second , i need to parse all the columns which ends with .bam .. there are hundreds of such columns. In these columns, there are 5 values seperated by ":" i wanted to display only first values i.e for example if a field looks like "0/0:0,42,253:14:0:45" , i wanted to display only "0/0"

Output

 
#CHROM POS INFO 57.sorted.bam 58.sorted.bam 59.sorted.bam 34.sorted.bam 55.sorted.bam
12 59 DP=157;VDB=0.0005;AF1=0.203;AC1=19;DP4=72,57,9,9; 0/0 0/0 0/0 0/0 0/0
12 68 DP=156;VDB=0.0002;AF1=0.4985;G3=8.399e-06,1,1.307e-05; 0/1 0/1 0/1 0/1 0/1
12 72 DP=160;VDB=0.0003;AF1=0.5132;G3=3.712e-06,1,1.79e-05; 0/1 0/1 0/1 0/1 0/1
12 99 DP=164;VDB=0.0002;AF1=0.5142;G3=3.45e-06,1,1.806e-05; 0/1 0/1 0/1 0/1 0/1
12 100 DP=164;VDB=0.0002;AF1=0.5124;G3=3.882e-06,1,1.781e-05; 0/1 0/1 0/1 0/1 0/1
# 2  
Old 03-01-2012
Code:
#!/usr/bin/perl -w 
 
while(<>){                             
next if(substr $_,0,2) eq '##'; 
    @a=split(/\s+/,$_);   
                           
for($i=0;$i<@a;$i++){   
    $a[$i]=~s/^(\d+\/\d+):.+/$1/; 
    if($i!=$#a){print"$a[$i]    "} 
    else{print"$a[$i]\n"}  
} 
}

syntax: ./script filename

Last edited by vbe; 03-01-2012 at 02:17 PM.. Reason: rm donation stuff
tip78
# 3  
Old 03-01-2012
Code:
# /^##/ {next}            # Skip lines starting with ##
# !/^#/ { ... } 1         # Process all lines not starting with # and print EVERY line
# gsub(/"[^ :]*/, "");    # Delete all instances of :..., stopping at spaces or colons

$ awk '/^##/{next}; !/^#/ { gsub(/:[^ :]*/, "") } 1' data

#CHROM POS INFO 57.sorted.bam 58.sorted.bam 59.sorted.bam 34.sorted.bam 55.sorted.bam
12 59 DP=157;VDB=0.0005;AF1=0.203;AC1=19;DP4=72,57,9,9; 0/0 0/0 0/0 0/0 0/0
12 68 DP=156;VDB=0.0002;AF1=0.4985;G3=8.399e-06,1,1.307e-05; 0/1 0/1 0/1 0/1 0/1
12 72 DP=160;VDB=0.0003;AF1=0.5132;G3=3.712e-06,1,1.79e-05; 0/1 0/1 0/1 0/1 0/1
12 99 DP=164;VDB=0.0002;AF1=0.5142;G3=3.45e-06,1,1.806e-05; 0/1 0/1 0/1 0/1 0/1
12 100 DP=164;VDB=0.0002;AF1=0.5124;G3=3.882e-06,1,1.781e-05; 0/1 0/1 0/1 0/1 0/1

$

# 4  
Old 03-01-2012
perl script worked great.. thankyou
for some reason awk is giving out errors
# 5  
Old 03-01-2012
Since you didn't say what errors I'm left trying to guess, but maybe your version of awk doesn't have gsub. Try 'nawk', that's where systems that don't have a good default awk hide the real version.
# 6  
Old 03-01-2012
Quote:
Originally Posted by empyrean
perl script worked great.. thankyou
for some reason awk is giving out errors
can be done in a single command line actualy
just didn't notice that there's no more : anywhere
Code:
perl -nle 'next if(substr $_,0,2) eq "##";s/:\d.+?(\s|$)/ /g;print' datafile


Last edited by tip78; 03-01-2012 at 05:45 PM..
tip78
# 7  
Old 03-20-2012
Also Need help in filtering a file

Good day geeks,
I also want to filter out some details in a specific file shown below:

==================================================================================================
Month: Mar Failure Ratio Key Performance Indicators (KPIs) NodeType: sgsn_g

Day Time Attach PDP Activation Intra SGSN RAU ISRAU Paging Cut-off
==================================================================================================
20 12.00 0.1% 0.0% 5.5% 0.0% 9.7% 0.1%
--------------------------------------------------------------------------------------------------
20 11.00 0.2% 0.0% 5.4% 0.3% 9.5% 0.2%
--------------------------------------------------------------------------------------------------
20 10.00 0.3% 0.0% 5.2% 0.2% 8.5% 0.1%
--------------------------------------------------------------------------------------------------
20 09.00 0.2% 0.0% 4.8% 1.0% 6.6% 0.3%
--------------------------------------------------------------------------------------------------
20 08.00 0.3% 0.0% 4.2% 0.5% 4.2% 0.6%
--------------------------------------------------------------------------------------------------
Average: 0.2% 0.0% 5.0% 0.4% 7.7% 0.3%


What i want to do is just to display only the headers (Month: Mar Failure Ratio Key Performance Indicators (KPIs) NodeType: sgsn_g
Day Time Attach PDP Activation Intra SGSN RAU ISRAU Paging Cut-off) and also the last part (Average). So that the output will come out as this:

Month: Mar Failure Ratio Key Performance Indicators (KPIs) NodeType: sgsn_g

Day Time Attach PDP Activation Intra SGSN RAU ISRAU Paging Cut-off


Average: 0.2% 0.0% 5.0% 0.4% 7.7% 0.3%



Geeks, am not lazy but i have been trying to use GREP, CUT but to no avail, i just started Shell Programming and i need to get this done fast. Please any help will be appreciated and if you have any good guide to Shell Programming please do give me an hint.

Thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

XML files with spaces in the tag name, parse & display?

Greetings all, I have an XML file that is being generated from my application, here is a sample of the first tag (That I am trying to remove and display in a list..) Example- <tag one= "data" data="1234" updateTime="1300"> <tag one= "data1" data="1234" updateTime="1300"> <tag... (5 Replies)
Discussion started by: jeffs42885
5 Replies

2. Shell Programming and Scripting

awk to parse file and display result based on text

I am trying using awk to open an input file and check a column 2/field $2 and if there is a warning then that is displayed (variantchecker): G not found at position 459, found A instead. The attached Sample1.txt is that file. If in that column/field there is a black space, then the text after... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Request: How to Parse dynamic SQL query to pad extra columns to match the fixed number of columns

Hello All, I have a requirement in which i will be given a sql query as input in a file with dynamic number of columns. For example some times i will get 5 columns, some times 8 columns etc up to 20 columns. So my requirement is to generate a output query which will have 20 columns all the... (7 Replies)
Discussion started by: vikas_trl
7 Replies

4. UNIX for Dummies Questions & Answers

Printing lines with specific strings at specific columns

Hi I have a file which is tab-delimited. Now, I'd like to print the lines which have "chr6" string in both first and second columns. Could anybody help? (3 Replies)
Discussion started by: a_bahreini
3 Replies

5. UNIX for Dummies Questions & Answers

Quick UNIX command to display specific lines in the middle of a file from/to specific word

This could be a really dummy question. I have a log text file. What unix command to extract line from specific string to another specific string. Is it something similar to?: more +/"string" file_name Thanks (4 Replies)
Discussion started by: aku
4 Replies

6. Shell Programming and Scripting

Can't figure out how to find specific characters in specific columns

I am trying to find a specific set of characters in a long file. I only want to find the characters in column 265 for 4 bytes. Is there a search for that? I tried cut but couldn't get it to work. Ex. I want to find '9999' in column 265 for 4 bytes. If it is in there, I want it to print... (12 Replies)
Discussion started by: Drenhead
12 Replies

7. Shell Programming and Scripting

How do I parse file with multiple different columns ?

I have a tool which generates results in a file at every minute and which has following columns. I need to create a script checks this file constantly and if Column ( QOM ) has value more then "30S" it should send an email. Can anybody help ? Thansk a lot. Time MxML MxQD G P OIC OUC MDC... (11 Replies)
Discussion started by: jayeshpatel
11 Replies

8. Shell Programming and Scripting

Parse a file to display lines containing a word

Hi! I'm trying to create a shell script to parse a file which might have multiple lines matching a pattern (i.e. containing some word). I need to return all lines matching the pattern, but stripping the contents of that line until the pattern is matched For example, if my input file was ... (4 Replies)
Discussion started by: orno
4 Replies

9. UNIX for Dummies Questions & Answers

how to display specific lines of a specific file

are there any basic commands that can display lines 99 - 101 of the /etc/passwd file? I'm thinking use of head and tail, but I forget what numbers to use and where to put /etc/passwd in the command. (2 Replies)
Discussion started by: raidkridley
2 Replies

10. UNIX for Dummies Questions & Answers

Need to display the output and parse it after execution is finshed.

Hello all, When I run pkgadd -d gcc-2.95.3-sol8-sparc-local I get the following error cpio: Cannot write "reloc/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/cc1", errno 28, No space left on device cpio: Cannot write "reloc/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/cc1chill", errno 28, No space... (0 Replies)
Discussion started by: rakeshou
0 Replies
Login or Register to Ask a Question