01-26-2015
Parsing a file and pulling out specific columns
Hi,
I am having some difficulty pulling out specific columns using awk. I think what I am doing is iterating through the various columns looking for a match and asking awk to print if a match is found.
Here are a few lines from my input:
HTML Code:
NC_015011.2 Gnomon gene 18691 26481 . + . ID=gene0;Dbxref=GeneID:100538868;Name=LOC100538868;gbkey=Gene;gene=LOC100538868;partial=true;start_range=.,18691
NC_015011.2 Gnomon mRNA 18691 26481 . + . ID=rna0;Parent=gene0;Dbxref=GeneID:100538868,Genbank:XM_010707932.1;Name=XM_010707932.1;gbkey=mRNA;gene=LOC100538868;partial=true;product=hematopoietic lineage cell-specific protein-like;start_range=.,18691;transcript_id=XM_010707932.1
NC_015011.2 Gnomon exon 18691 18743 . + . ID=id1;Parent=rna0;Dbxref=GeneID:100538868,Genbank:XM_010707932.1;gbkey=mRNA;gene=LOC100538868;partial=true;product=hematopoietic lineage cell-specific protein-like;start_range=.,18691;transcript_id=XM_010707932.1
NC_015011.2 Gnomon exon 18865 18994 . + . ID=id2;Parent=rna0;Dbxref=GeneID:100538868,Genbank:XM_010707932.1;gbkey=mRNA;gene=LOC100538868;partial=true;product=hematopoietic lineage cell-specific protein-like;transcript_id=XM_010707932.1
Here is my code, note that I am not interested in the first 8 fields. The 9th field in an info field that does not have a set number of fields (even though the ones shown do) such that a matching technique is more appropriate:
awk -F "\t" '{ print $9 }' mga_ref_Turkey_5.0_NCBI_FINAL_no_GI_no_region.gff3.txt | grep product | awk -F ";" '{ gsub(";","\t",$0);print $0 }' | awk -F "\t" '{for(i=0;i<NF;i++){if($i~/gene\=/){printf $i};if($i~/product\=/){printf $i }};printf "\n"}' | head
Now the output:
HTML Code:
ID=rna0 Parent=gene0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 Name=XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like start_range=.,18691 transcript_id=XM_010707932.1ID=rna0 Parent=gene0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 Name=XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like start_range=.,18691 transcript_id=XM_010707932.1gene=LOC100538868product=hematopoietic lineage cell-specific protein-like
ID=id1 Parent=rna0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like start_range=.,18691 transcript_id=XM_010707932.1ID=id1 Parent=rna0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like start_range=.,18691 transcript_id=XM_010707932.1gene=LOC100538868product=hematopoietic lineage cell-specific protein-like
ID=id2 Parent=rna0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like transcript_id=XM_010707932.1ID=id2 Parent=rna0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like transcript_id=XM_010707932.1gene=LOC100538868product=hematopoietic lineage cell-specific protein-like
What I don't get is why my match/print commands are not printing ONLY the matching fields???
Thanks
---------- Post updated at 09:37 PM ---------- Previous update was at 09:16 PM ----------
Nevermind, figured it out. I need to start my loop at 1, not 0. Grrr.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi Friends,
I want to delete specific columns from a file.
Say my file content is as follows:
"1","a","ww1",1234"
"2","b","wwr3","2222"
"3","c","erre","3333"
Now i want to delete the column 2 and 4 from this file.
That is I want the file content to be:
"1","ww1"
"2","wwr3"... (11 Replies)
Discussion started by: premar
11 Replies
2. Shell Programming and Scripting
Here is a data file, which I believe is in YAML. I am trying to retrieve just the 'addon_domains" section, which doesnt seem to be as easy as I had originally thought. Any help on this would be greatly appreciated!! I have been trying to do this in awk and mostly bash scripting instead of perl... (3 Replies)
Discussion started by: Rhije
3 Replies
3. UNIX for Dummies Questions & Answers
Hi, I'm just wondering how you display a specific set of columns of a specified file in Unix. For example, if you had an AddressBook file that stores the Names, Phone numbers, and Addresses of people the user entered in the following format (the numbers are just to give an idea of what column... (1 Reply)
Discussion started by: logorob
1 Replies
4. Shell Programming and Scripting
HELLO! This is my first post here! By the way, I think it is great that people do this.
My question:
I have two files, one is a .dilm and one is a .txt. It is my understanding that the .dilm file can be treated as a .txt file. I wrote another program where I was able to manipulate it as if it... (3 Replies)
Discussion started by: mehdib
3 Replies
5. Shell Programming and Scripting
Hi,
I have a file like this
a b c
d e f
g h i
j k l
Case1:
I want to transpose the whole file
Output1
a d g j
b e h k
c f i l
Case2
Transpose a specific column - Say 3rd (6 Replies)
Discussion started by: jacobs.smith
6 Replies
6. Shell Programming and Scripting
Hello Unix experts,
I need a help to create a subset file. I know with cut comand, its very easy to select many different columns, or threshold. But here I have a bit problem as in my data file is big. And I don't want to identify the column numbers or names manually. I am trying to find any... (7 Replies)
Discussion started by: smitra
7 Replies
7. Shell Programming and Scripting
Hi All...
I am in need of few columns from a log file.. in .xls file... below is what i have tried.
my log file has 16 colums with " ; " as delimiter, but i need randomn columns 1 2 3 4 5 6 10 11 16 in an excel.
I tried to awk the columns with delimiter ; and it worked, below is the log... (5 Replies)
Discussion started by: nanz143
5 Replies
8. Shell Programming and Scripting
What is the proper syntax to add specific text to a column in a file? Both the input and output below are tab-delineated. What if there are multiple text/fields, such as /CP&/2 /CM&/3 /AA&/4 Thank you :).
sed 's/*/Index&/1' del.txt.hg19_multianno.txt > matrix.del.txt (4 Replies)
Discussion started by: cmccabe
4 Replies
9. Shell Programming and Scripting
Hi All,
I'm having a hard time finding a starting point for my issue. I have a 30k line file (fspsec.txt) that I would like to parse into smaller files based on any character existing in field 1.
ACCOUNTANT LEVEL 1 (ACCT.ACCOUNTANT)
OPERATORS: DOEJO (418)
TOOLS: Branch Maintenance
... (2 Replies)
Discussion started by: aahlrich
2 Replies
10. Shell Programming and Scripting
Hi All
I have a file which has five columns in each rows.
cat file.txt
a|b|c|d|e
1|2|3|4|5
a1|a2|a3|a4|a5
.
.
.
I need to make sure that there are no less than five or more than five columns (in all the rows) by mistake. I tried this :
cat file.txt | awk 'BEGIN{FS="|"};{print... (3 Replies)
Discussion started by: chatwithsaurav
3 Replies