Sponsored Content
Top Forums Shell Programming and Scripting Parse file for fields and specific text Post 302949168 by cmccabe on Tuesday 7th of July 2015 04:24:34 PM
Old 07-07-2015
Parse file for fields and specific text

I have a file of ~500,000 entries in the following:

file.txt
Code:
chr1	11868	12227	ENSG00000223972.5	.	+	HAVANA	exon	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; transcript_support_level "1"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1	12009	12057	ENSG00000223972.5	.	+	HAVANA	exon	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1-001"; exon_number 1; exon_id "ENSE00001948541.1"; level 2; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; transcript_support_level "NA"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";
chr1	12178	12227	ENSG00000223972.5	.	+	HAVANA	exon	.	gene_id "ENSG00000223972.5"; transcript_id "ENST00000450305.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "transcribed_unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1-001"; exon_number 2; exon_id "ENSE00001671638.2"; level 2; ont "PGO:0000005"; ont "PGO:0000019"; tag "basic"; transcript_support_level "NA"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000002844.2";

I am using cygwin on a windows 7 OS trying to parse each line into 5 rows:

In the below
row 1 = $1 of the text
row 2 = $2 of the text
row 3 = $3 of the text
row 4 = gene_name= "..." - quotes removed
row 5 = exon_number "...." - quotes removed

Example of desired output:

Code:
chr1	11868	12227     DDX11L1     1 
chr1	12009	12057     DDX11L1     1  
chr1	12178	12227     DDX11L1     2

I was able to generate the file.txt but can not seem to parse it correctly.

Thank you Smilie.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to parse the specific data from the file

Hi, I need to parse this data FastEthernet0/9,|FastEthernet0/10,|FastEthernet0/11,FastEthernet0/13|, FastEthernet0/12,FastEthernet0/24 . and get only the value like e.g 0/24,0/11. how to do this in shell script. Thanks in Advance. (2 Replies)
Discussion started by: MuthuAlagappan
2 Replies

2. Shell Programming and Scripting

How to read and parse the content of csv file containing # as delimeter into fields using Bash?

#!/bin/bash i=0 cat 1.csv | while read fileline do echo "$fileline" IFS="#" flds=( $fileline ) nrofflds=${#flds} echo "noof fields$nrofflds" fld=0 while do echo "noof counter$fld" echo "$nrofflds" #fld1="${flds}" trying to store the content of line to fields but i... (4 Replies)
Discussion started by: barani75
4 Replies

3. Shell Programming and Scripting

Assigning a specific format to a specific column in a text file using awk and printf

Hi, I have the following text file: 8 T1mapping_flip02 ok 128 108 30 1 665000-000008-000001.dcm 9 T1mapping_flip05 ok 128 108 30 1 665000-000009-000001.dcm 10 T1mapping_flip10 ok 128 108 30 1 665000-000010-000001.dcm 11 T1mapping_flip15 ok 128 108 30... (2 Replies)
Discussion started by: goodbenito
2 Replies

4. Shell Programming and Scripting

Perl: Parse Hex file into fields

Hi, I want to split/parse certain bits of the hex data into another field. Example: Input data is Word1: 4f72abfd Output: Parse bits (5 to 0) into field word1data1=0x00cd=205 decimal Parse bits (7 to 6) into field word1data2=0x000c=12 decimal etc. Word2: efff3d02 Parse bits (13 to... (1 Reply)
Discussion started by: morrbie
1 Replies

5. Shell Programming and Scripting

Capture specific fields in file

Dear Friends, I have a file a.txt 1|3478.12|487|4578.04|4505.5478|rhfj|rehtire|rhj I want to get the field numbers which have decimal values output: Fields: 2,4,5 Plz help (6 Replies)
Discussion started by: i150371485
6 Replies

6. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

7. Shell Programming and Scripting

Parse text file using specific tags

awk -F "" '/<href=>|<href=>|<top>|<top>/ {print $3, OFS=\t}' source.txt > output.txt I'm not quite sure how to parse the attached file, but what I am trying to do is in a output file have the link (href=), name (after the <), and count (<top>) in 3 separate columns. My attempt is the above... (2 Replies)
Discussion started by: cmccabe
2 Replies

8. Shell Programming and Scripting

awk script to parse case with information in two fields of file

The below awk parser works for most data inputs, but I am having trouble with the last one. The problem is in the below rules steps 1 and 2 come from $2 (NC_000013.10:g.20763686_20763687delinsA) and steps 3 and 4 come from $1 (NM_004004.5:c.34_35delGGinsT). Parse Rules: The header is... (0 Replies)
Discussion started by: cmccabe
0 Replies

9. Shell Programming and Scripting

Replacing entire fields with specific text at end or beginning of field

Greetings. I've got a csv file with data along these lines: Spumoni's Pizza Place, Placemats n Things, Just Lamps Counterfeit Dollars by Vinnie, Just Shades, Dollar StoreI want to replace the entire comma-delimited field if it matches something ending in "Place" or beginning with "Dollar",... (2 Replies)
Discussion started by: palmfrond
2 Replies

10. UNIX for Advanced & Expert Users

Script to parse and compare information in two fields of file

Hello, I am working parsing a large input file1(field CFA) I have to compare the the file1 field(CFA byte 88-96) with the content of the file2(It contains only one field) and and insert rows equal in another file. Here is my code and sample input file: ... (7 Replies)
Discussion started by: GERMANOS
7 Replies
XkbAddGeomKey(3)						   XKB FUNCTIONS						  XkbAddGeomKey(3)

NAME
XkbAddGeomKey - Add one key at the end of an existing row of keys SYNOPSIS
XkbKeyPtr XkbAddGeomKey (XkbRowPtr row); ARGUMENTS
- row row to be updated DESCRIPTION
Xkb provides functions to add a single new element to the top-level keyboard geometry. In each case the num_ * fields of the corresponding structure is incremented by 1. These functions do not change sz_* unless there is no more room in the array. Some of these functions fill in the values of the element's structure from the arguments. For other functions, you must explicitly write code to fill the structure's elements. The top-level geometry description includes a list of geometry properties. A geometry property associates an arbitrary string with an equally arbitrary name. Programs that display images of keyboards can use geometry properties as hints, but they are not interpreted by Xkb. No other geometry structures refer to geometry properties. Keys are grouped into rows. XkbAddGeomKey adds one key to the end of the specified row. The key is allocated and zeroed. XkbAddGeomKey returns NULL if row is empty or if it was not able to allocate space for the key. To allocate space for an arbitrary number of keys to a row, use XkbAllocGeomKeys. STRUCTURES
typedef struct _XkbKey { /* key in a row */ XkbKeyNameRec name; /* key name */ short gap; /* gap in mm/10 from previous key in row */ unsigned char shape_ndx; /* index of shape for key */ unsigned char color_ndx; /* index of color for key body */ } XkbKeyRec, *XkbKeyPtr; SEE ALSO
XkbAllocGeomKeys(3) X Version 11 libX11 1.5.0 XkbAddGeomKey(3)
All times are GMT -4. The time now is 10:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy