Sponsored Content
Top Forums Shell Programming and Scripting Parse file for fields and specific text Post 302949168 by cmccabe on Tuesday 7th of July 2015 04:24:34 PM
Old 07-07-2015
Parse file for fields and specific text

I have a file of ~500,000 entries in the following:

file.txt
Code:
chr1	11868	12227	ENSG00000223972.5	.	+	HAVANA	exon	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1	12009	12057	ENSG00000223972.5	.	+	HAVANA	exon	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1-001"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; transcript_support_level "NA"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1	12178	12227	ENSG00000223972.5	.	+	HAVANA	exon	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1-001"; exon_number 2; exon_id "ENSE00001671638.2"; level 2; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; transcript_support_level "NA"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";

I am using cygwin on a windows 7 OS trying to parse each line into 5 rows:

In the below
row 1 = $1 of the text
row 2 = $2 of the text
row 3 = $3 of the text
row 4 = gene_name= "..." - quotes removed
row 5 = exon_number "...." - quotes removed

Example of desired output:

Code:
chr1	11868	12227     DDX11L1     1 
chr1	12009	12057     DDX11L1     1  
chr1	12178	12227     DDX11L1     2

I was able to generate the file.txt but can not seem to parse it correctly.

Thank you Smilie.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to parse the specific data from the file

Hi, I need to parse this data FastEthernet0/9,|FastEthernet0/10,|FastEthernet0/11,FastEthernet0/13|, FastEthernet0/12,FastEthernet0/24 . and get only the value like e.g 0/24,0/11. how to do this in shell script. Thanks in Advance. (2 Replies)
Discussion started by: MuthuAlagappan
2 Replies

2. Shell Programming and Scripting

How to read and parse the content of csv file containing # as delimeter into fields using Bash?

#!/bin/bash i=0 cat 1.csv | while read fileline do echo "$fileline" IFS="#" flds=( $fileline ) nrofflds=${#flds} echo "noof fields$nrofflds" fld=0 while do echo "noof counter$fld" echo "$nrofflds" #fld1="${flds}" trying to store the content of line to fields but i... (4 Replies)
Discussion started by: barani75
4 Replies

3. Shell Programming and Scripting

Assigning a specific format to a specific column in a text file using awk and printf

Hi, I have the following text file: 8 T1mapping_flip02 ok 128 108 30 1 665000-000008-000001.dcm 9 T1mapping_flip05 ok 128 108 30 1 665000-000009-000001.dcm 10 T1mapping_flip10 ok 128 108 30 1 665000-000010-000001.dcm 11 T1mapping_flip15 ok 128 108 30... (2 Replies)
Discussion started by: goodbenito
2 Replies

4. Shell Programming and Scripting

Perl: Parse Hex file into fields

Hi, I want to split/parse certain bits of the hex data into another field. Example: Input data is Word1: 4f72abfd Output: Parse bits (5 to 0) into field word1data1=0x00cd=205 decimal Parse bits (7 to 6) into field word1data2=0x000c=12 decimal etc. Word2: efff3d02 Parse bits (13 to... (1 Reply)
Discussion started by: morrbie
1 Replies

5. Shell Programming and Scripting

Capture specific fields in file

Dear Friends, I have a file a.txt 1|3478.12|487|4578.04|4505.5478|rhfj|rehtire|rhj I want to get the field numbers which have decimal values output: Fields: 2,4,5 Plz help (6 Replies)
Discussion started by: i150371485
6 Replies

6. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

7. Shell Programming and Scripting

Parse text file using specific tags

awk -F "" '/<href=>|<href=>|<top>|<top>/ {print $3, OFS=\t}' source.txt > output.txt I'm not quite sure how to parse the attached file, but what I am trying to do is in a output file have the link (href=), name (after the <), and count (<top>) in 3 separate columns. My attempt is the above... (2 Replies)
Discussion started by: cmccabe
2 Replies

8. Shell Programming and Scripting

awk script to parse case with information in two fields of file

The below awk parser works for most data inputs, but I am having trouble with the last one. The problem is in the below rules steps 1 and 2 come from $2 (NC_000013.10:g.20763686_20763687delinsA) and steps 3 and 4 come from $1 (NM_004004.5:c.34_35delGGinsT). Parse Rules: The header is... (0 Replies)
Discussion started by: cmccabe
0 Replies

9. Shell Programming and Scripting

Replacing entire fields with specific text at end or beginning of field

Greetings. I've got a csv file with data along these lines: Spumoni's Pizza Place, Placemats n Things, Just Lamps Counterfeit Dollars by Vinnie, Just Shades, Dollar StoreI want to replace the entire comma-delimited field if it matches something ending in "Place" or beginning with "Dollar",... (2 Replies)
Discussion started by: palmfrond
2 Replies

10. UNIX for Advanced & Expert Users

Script to parse and compare information in two fields of file

Hello, I am working parsing a large input file1(field CFA) I have to compare the the file1 field(CFA byte 88-96) with the content of the file2(It contains only one field) and and insert rows equal in another file. Here is my code and sample input file: ... (7 Replies)
Discussion started by: GERMANOS
7 Replies
PAPS(1) 						      General Commands Manual							   PAPS(1)

NAME
paps - UTF-8 to PostScript converter using Pango SYNOPSIS
paps [options] files... DESCRIPTION
paps reads a UTF-8 encoded file and generates a PostScript language rendering of the file. The rendering is done by creating outline curves through the pango ft2 backend. OPTIONS
These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. --landscape Landscape output. Default is portrait. --columns=cl Number of columns output. Default is 1. --font=desc Set the font description. Default is Monospace 12. --rtl Do rtl layout. --paper ps Choose paper size. Known paper sizes are legal, letter, a4. Default is A4. --bottom-margin=bm Set bottom margin in postscript points (1/72 inch). Default is 36. --top-margin=tm Set top margin. Default is 36. --left-margin=lm Set left margin. Default is 36. --right-margin=rm Set right margin. Default is 36. --help Show summary of options. --header Draw page header for each page. --markup Interpret the text as pango markup. --encoding=ENCODING Assume the documentation encoding is ENCODING. --lpi Set the lines per inch. This determines the line spacing. --cpi Set the characters per inch. This is an alternative method of specifying the font size. --stretch-chars Indicates that characters should be stretched in the y-direction to fill up their vertical space. This is similar to the texttops behaviour. AUTHOR
paps was written by Dov Grobgeld <dov.grobgeld@gmail.com>. This manual page was written by Lior Kaplan <kaplan@debian.org>, for the Debian project (but may be used by others). April 17, 2006 PAPS(1)
All times are GMT -4. The time now is 11:45 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy