Using awk to parse a file with mixed formats in columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using awk to parse a file with mixed formats in columns
# 1  
Old 11-21-2013
[Solved] Using awk to parse a file with mixed formats in columns

Greetings

I have a file formatted like this:
Code:
rhino	grey	weight=1003;height=231;class=heaviest;histology=9,0,0,8
bird	white	weight=23;height=88;class=light;histology=7,5,1,0,0
turtle	green	weight=40;height=9;class=light;histology=6,0,2,0
rhino	grey	weight=1100;height=221;class=heaviest;histology=9,0,0,8
bird	white	weight=23;height=88;class=light;histology=7,5,1,0,0
bird	grey	weight=9;height=8;class=light;histology=7,5,1,0,0
turtle	green	weight=40;height=23;class=light;histology=6,0,2,0
turtle	purple	weight=6;height=2;class=light;histology=6,0,2,0

The file has three columns, tab delimited, with the third combining multiple data-types (e.g. "height," "weight" etc.) . It is important that this format be preserved for down-stream analysis, however I would like to parse the data based on a single data-type in the third column -- e.g. all rows for which height ≥ 200. Until now I have been changing the format into a standard 6-column tab delimited file:
Code:
rhino	grey	1003	231	heaviest	9,0,0,8
bird	white	23	88	light	7,5,1,0,0
turtle	green	40	9	light	6,0,2,0
rhino	grey	1100	221	heaviest	9,0,0,8
bird	white	23	88	light	7,5,1,0,0
bird	grey	9	8	light	7,5,1,0,0
turtle	green	40	23	light	6,0,2,0
turtle	purple	6	2	light	6,0,2,0

But it is a pain to get it back into the other format this way.

Is it possible to parse this information the way I wish using awk?

I apologize if this should really be in the UNIX for Dummies section.
# 2  
Old 11-21-2013
As long as all of your input lines are in the same format and have data in the 3rd field in the same order, the easy way to do this is just to use tab, equal sign, and semicolon as field separators and look for field 6 with a value >= 200:
Code:
awk -F '[\t=;]' '$6 >= 200' file

which (if file contains your sample data) produces:
Code:
rhino	grey	weight=1003;height=231;class=heaviest;histology=9,0,0,8
rhino	grey	weight=1100;height=221;class=heaviest;histology=9,0,0,8

This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 12-09-2013
Thanks, works like a charm, your time is very much appreciated!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Parse apache log file with three different time formats

Hi, I want to parse below file and Write a function to extract the logs between two given timestamp. Apache (Unix) Log Samples - MonitorWare The challenge here is there are three date and time format. First :- 07/Mar/2004:16:05:49 Second :- Sun Mar 7 16:02:00 2004 Third :- 29-Mar... (6 Replies)
Discussion started by: sahil_shine
6 Replies

2. Shell Programming and Scripting

awk , parse data from stacked columns

I am trying to rearrange stacked columns. Below is an excerpt of my data file: You can see there are two data columns stacked vertically. ITERATION ... (1 Reply)
Discussion started by: sav0
1 Replies

3. Shell Programming and Scripting

Perl to parse a variety of formats

The below perl script parses a variety of formats. If I use the numeric text file as input the script works correctly. However using the alpha text file as input there is a black output file. The portion in bold splits the field to parse f or NC_000023.10:g.153297761C>A into a variable $common but... (3 Replies)
Discussion started by: cmccabe
3 Replies

4. Shell Programming and Scripting

Request: How to Parse dynamic SQL query to pad extra columns to match the fixed number of columns

Hello All, I have a requirement in which i will be given a sql query as input in a file with dynamic number of columns. For example some times i will get 5 columns, some times 8 columns etc up to 20 columns. So my requirement is to generate a output query which will have 20 columns all the... (7 Replies)
Discussion started by: vikas_trl
7 Replies

5. Shell Programming and Scripting

awk - mixed for and if to select particular lines in a data file

Hi all, I am new to AWK and I am trying to solve a problem that is probably easy for an expert. Suppose I have the following data file input.txt: 20 35 43 20 23 54 20 62 21 20.5 43 12 20.5 33 11 20.5 89 87 21 33 20 21 22 21 21 56 87 I want to select from all lines having the... (4 Replies)
Discussion started by: naska
4 Replies

6. Shell Programming and Scripting

How do I parse file with multiple different columns ?

I have a tool which generates results in a file at every minute and which has following columns. I need to create a script checks this file constantly and if Column ( QOM ) has value more then "30S" it should send an email. Can anybody help ? Thansk a lot. Time MxML MxQD G P OIC OUC MDC... (11 Replies)
Discussion started by: jayeshpatel
11 Replies

7. Shell Programming and Scripting

parse a mixed alphanumeric string from within a string

Hi, I would like to be able to parse out a substring matching a basic pattern, which is a character followed by 3 or 4 digits (for example S1234 out of a larger string). The main string would just be a filename, like Thisis__the FileName_S1234_ToParse.txt. The filename isn't fixed, but the... (2 Replies)
Discussion started by: keaneMB
2 Replies

8. Shell Programming and Scripting

Parse a file with awk?

Hi guys (and gals). I need some help. I'm running an IVR purely on Asterisk where I capture the DTMFs. After pulsing each DTMF I have Asterisk write to a file with whatever was dialed (mostly used for record-keeping) and at the end of the survey I write all variables in a single line to a... (2 Replies)
Discussion started by: tulf210
2 Replies

9. Shell Programming and Scripting

Parse file using awk and work in awk output

hi guys, i want to parse a file using public function, the file contain raw data in the below format i want to get the output like this to load it to Oracle DB MARWA1,BSS:26,1,3,0,0,0,0,0.00,22,22,22.00 MARWA2,BSS:26,1,3,0,0,0,0,0.00,22,22,22.00 this the file raw format: Number of... (6 Replies)
Discussion started by: dagigg
6 Replies

10. Shell Programming and Scripting

parse file into tab separated columns

Hello, I am trying to parse a file that resembles the last three groupings into something looking like the first two lines. I've fiddled with sed and awk a bit, but can't get anything to work properly. I need them separated by some delimiter. The file is some 23,000 lines of the stuff.... ... (9 Replies)
Discussion started by: dkozel
9 Replies
Login or Register to Ask a Question