Visit Our UNIX and Linux User Community


Grep and print only certain columns from a row


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Grep and print only certain columns from a row
# 1  
Old 08-22-2013
Grep and print only certain columns from a row

Hi Friends,

This is my input

Code:
chr1 100 200 + gene_name "alpha"; protein_name "alpha"; level 2; tag "basic"; info "known";
chr1 245 290 + gene_name "alpha-1"; protein_name "alpha-2"; level 9; tag "basic"; info "uknown";
chr1 310 320 + gene_name "alpha"; protein_name "alpha-4"; level 2; info "known";
chr1 355 490 + gene_name "alpha-1"; protein_name "alpha-120"; tag "basic"; info "valid";

The above input file has varying field separators and has more than 1 million rows. If I want certain columns, I know that I can use awk to print only certain columns. But, my input file has varying number of columns too. So, I can't do it.

My request here is to print only certain parts of a row by using grep until the semicolon. So, I need chr, start, stop, symbol, gene_name, protein_name and info from each row.

My output will be

Code:
chr1 100 200 + gene_name "alpha"; protein_name "alpha"; info "known";
chr1 245 290 + gene_name "alpha-1"; protein_name "alpha-2"; info "uknown";
chr1 310 320 + gene_name "alpha"; protein_name "alpha-4"; info "known";
chr1 355 490 + gene_name "alpha-1"; protein_name "alpha-120"; info "valid";

How do I print only the grepped contents until the semicolon?

Thanks for any suggestions.
# 2  
Old 08-22-2013
Code:
awk -F ";" '{for(i=1;i<=NF;i++){if($i ~ /chr|start|stop|symbol|gene_name|protein_name|info/){s=s?s";"$i:$i}}print s;s=""}' file

This User Gave Thanks to pamu For This Post:
# 3  
Old 08-22-2013
Quote:
Originally Posted by pamu
Code:
awk -F ";" '{for(i=1;i<=NF;i++){if($i ~ /chr|start|stop|symbol|gene_name|protein_name|info/){s=s?s";"$i:$i}}print s;s=""}' file

Dear Pamu,

Thanks a lot for your time.

I apologize for forgetting another small glitch in the input.

Between the chr, start and stop, there are two more columns.

I looked for unique names in those two columns and the second column has

Code:
HAVANA
ENSEMBL

The third column has

Code:
exon
CDS
start_codon
stop_codon

So, my input file is like this

Code:
chr1 HAVANA exon 100 200 + gene_name "alpha"; protein_name "alpha"; level 2; tag "basic"; info "known";
chr1 ENSEMBLE start_codon 245 290 + gene_name "alpha-1"; protein_name "alpha-2"; level 9; tag "basic"; info "uknown";
chr1 HAVANA CDS 310 320 + gene_name "alpha"; protein_name "alpha-4"; level 2; info "known";
chr1 ENSEMBLE stop_codon 355 490 + gene_name "alpha-1"; protein_name "alpha-120"; tag "basic"; info "valid";

Then my input becomes

Code:
chr1 HAVANA exon 100 200 + gene_name "alpha"; protein_name "alpha"; info "known";
chr1 ENSEMBLE start_codon 245 290 + gene_name "alpha-1"; protein_name "alpha-2"; info "uknown";
chr1 HAVANA CDS 310 320 + gene_name "alpha"; protein_name "alpha-4"; info "known";
chr1 ENSEMBLE stop_codon 355 490 + gene_name "alpha-1"; protein_name "alpha-120"; info "valid";

I edited the code with
Code:
[HAVANA:ENSEMBLE]

, but it didn't work.

I apologize for forgetting those two columns. Thanks again.
# 4  
Old 08-22-2013
Try
Code:
awk -F";" -v OFS=";" '{print $1, $2, $5}' file
chr1 HAVANA exon 100 200 + gene_name "alpha"; protein_name "alpha"; info "known"
chr1 ENSEMBLE start_codon 245 290 + gene_name "alpha-1"; protein_name "alpha-2"; info "uknown"
chr1 HAVANA CDS 310 320 + gene_name "alpha"; protein_name "alpha-4";
chr1 ENSEMBLE stop_codon 355 490 + gene_name "alpha-1"; protein_name "alpha-120";


Last edited by RudiC; 08-22-2013 at 05:25 PM.. Reason: Forgot the OFS

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Print row on 4th column to all row

Dear All, I have input : SEG901 5173 9005 5740 SEG902 5227 5284 SEG903 5284 5346 SEG904 5346 9010 SEG905 5400 5456 SEG906 5456 5511 SEG907 5511 9011 SEG908 5572 9015 SEG909 5622 9020 SEG910 5678 5739 SEG911 5739 5796 SEG912 5796 9025 ... (3 Replies)
Discussion started by: attila
3 Replies

2. Shell Programming and Scripting

Read row number from 1 file and print that row of second file

Hi. How can I read row number from one file and print that corresponding record present at that row in another file. eg file1 1 3 5 7 9 file2 11111 22222 33333 44444 55555 66666 77777 88888 99999 (3 Replies)
Discussion started by: Abhiraj Singh
3 Replies

3. Shell Programming and Scripting

Get row number from file1 and print that row of file2

Hi. How can we print those rows of file2 which are mentioned in file1. first character of file1 is a row number.. for eg file1 1:abc 3:ghi 6:pqr file2 a abc b def c ghi d jkl e mno f pqr ... (6 Replies)
Discussion started by: Abhiraj Singh
6 Replies

4. UNIX for Dummies Questions & Answers

awk to print first row with forth column and last row with fifth column in each file

file with this content awk 'NR==1 {print $4} && NR==2 {print $5}' file The error is shown with syntax error; what can be done (4 Replies)
Discussion started by: cdfd123
4 Replies

5. Shell Programming and Scripting

Print the row element till the next row element appear in a column

Hi all I have file with columns F3 pathway CPS F2 H2 H4 H5 H6 no pathway CMP H7 H8 H9 H10 My expected output is F3 pathway CPS F2 pathway CPS (10 Replies)
Discussion started by: Priyanka Chopra
10 Replies

6. UNIX for Dummies Questions & Answers

Select 2 columns and transpose row by row

Hi, I have a tab-delimited file as follows: 1 1 2 2 3 3 4 4 a a b b c c d d 5 5 6 6 7 7 8 8 e e f f g g h h 9 9 10 10 11 11 12 12 i i j j k k l l 13 13 14 14 15 15 16 16 m m n n o o p p The output I need is: 1 1 a a 5 5 e e 9 9 i i 13... (5 Replies)
Discussion started by: mvaishnav
5 Replies

7. Shell Programming and Scripting

grep/awk to only print lines with two columns in a file

Hey, Need some help for command to print only lines with two columns in a file abc 111 cde 222 fgh ijk 2 klm 12 23 nop want the ouput to be abc 111 cde 222 ijk 2 Thanks a lot in advance!!! (3 Replies)
Discussion started by: leo.maveriick
3 Replies

8. Shell Programming and Scripting

awk print specific columns one row at a time

Hello, I have the following piece of code: roleName =`cat $inputFile | awk -F';' '{ print $1 }'` roleDescription =`cat $inputFile | awk -F';' '{ print $2 }'` roleAuthProfile =`cat $inputFile | awk -F';' '{ print $3 }'` mappedUserID (5 Replies)
Discussion started by: pr0tocoldan
5 Replies

9. Shell Programming and Scripting

Print columns from each row

I have awk command to print column 8 awk '/select/ {print $8}' which will print column 8 But I need to print 3, 5 and 8 column in a row and each column should be de-limited by "\t" Hope anyone help me quickly. (2 Replies)
Discussion started by: elamurugu
2 Replies

10. Shell Programming and Scripting

shell script(Preferably awk or sed) to print selected number of columns from each row

Hi Experts, The question may look very silly by seeing the title, but please have a look at it clearly. I have a text file where the first 5 columns in each row were supposed to be attributes of a sample(like sample name, number, status etc) and the next 25 columns are parameters on which... (3 Replies)
Discussion started by: ks_reddy
3 Replies

Featured Tech Videos