Complex text parsing with speed/performance problem (awk solution?) Post: 302781085

Sponsored Content

Top Forums Shell Programming and Scripting Complex text parsing with speed/performance problem (awk solution?) Post 302781085 by Michael Stora on Friday 15th of March 2013 01:29:07 PM

03-15-2013

Registered User

Complex text parsing with speed/performance problem (awk solution?)

I have 1.6 GB (and growing) of files with needed data between the 11th and 34th line (inclusive) of the second column of comma delimited files. There is also a lot of stray white space in the file that needs to be trimmed. They have DOS-like end of lines.

I need to transpose the 11th through 34th lines of col2 from each data file and append them as new rows to an existing file. I also need to add several variables to the front and back of each output line which will be parsed/calculated from the data file names and file metadata.

Input:
...,...
xxx, 9
xxx. 10
xxx, 11 <--need 11th through 34th row in col2.
...,...
xxx, 34
xxx, 35
xxx, 36
...,...

Output:
var1,var2,var3,var4,var5,var6,11,12,13,...,32,33,34,/original/directory/path/of/data/file/,original_data_file_name

Then the entire file including rows previously in it need to be sorted by several of the columns, and duplicate lines removed (excluding some columns from the duplicate determination).

My dos2unix|head|foot|cut|tr(remove whitespace)|tr(change eol to comma)|echo(vars,std_in,vars) works but is way too slow!

I'm thinking there is a way to do the selecting, whitespace removal, transpose with padding of variables on both ends of the output line in one awk command which should speed things up a whole lot, but I am not that good at awk.

Mike

Michael Stora

View Public Profile for Michael Stora

Find all posts by Michael Stora

6 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk parsing problem

I need help with a problem that I have not been able to figure out. I have a file that is about 650K lines. Records are seperated by blank lines, fields seperated by new lines. I was trying to make a report that would add up 2 fields and associate them with a CP. example output would be...

2. Shell Programming and Scripting

Parsing a complex log file

I have a log file that has many SQL statements/queries/blocks and their resultant output (success or failure) added to each of them. I need to pick up all the statements which caused errors and write them to a separate file. On most cases, the SQL statement is a single line, like DROP . And if...

3. Shell Programming and Scripting

Difficult problem: Complex text file manipulation in bash script.

I don't know if this is a big issue or not, but I'm having difficulties. I apoligize for the upcoming essay :o. I'm writing a script, similar to a paint program that edits images, but in the form of ANSI block characters. The program so far is working. I managed to save the image into a file,...

4. Shell Programming and Scripting

Complex awk problem

hello, i have a complex awk problem... i have two tables, one with a value (0 to 1) and it's corresponding p-value, like this: 1. table: ______________________________ value p-value ... ... 0.254 0.003 0.245 0.005 0.233 0.006 ... ... ______________________________ and a...

5. Shell Programming and Scripting

Text string parsing in awk

I have a awk script that parses many millions of lines so performance is critical. At one point I am extracting some variables from a space delimited string. alarm = $11; len = split(alarm,a," "); ent = a; chem = a; for (i = 5; i<= len; i++) {chem = chem " " a}It works but is slow. Adding the...

6. Shell Programming and Scripting

awk parsing problem

Hello fellow unix geeks, I am having a small dilemna trying to parse a log file I have. Below is a sample of what it will look like: MY_TOKEN1(group) TOKEN(other)|SSID1 MY_TOKEN2(group, group2)|SSID2 What I need to do is only keep the MY_TOKEN pieces and where there are multiple...

LEARN ABOUT DEBIAN

guards

GUARDS(1)						User Contributed Perl Documentation						 GUARDS(1)

NAME

       guards - select from a list of files guarded by conditions

SYNOPSIS

       guards [--prefix=dir] [--path=dir2:dir2:...] [--default=0|1]	   [-v|--invert-match] [--list|--check] [--config=file]        symbol ...

DESCRIPTION

       The script reads a configuration file that may contain so-called guards, file names, and comments, and writes those file names that satisfy
       all guards to standard output. The script takes a list of symbols as its arguments. Each line in the configuration file is processed
       separately. Lines may start with a number of guards. The following guards are defined:

	   +xxx Include the file(s) on this line if the symbol xxx is defined.

	   -xxx Exclude the file(s) on this line if the symbol xxx is defined.

	   +!xxx Include the file(s) on this line if the symbol xxx is not defined.

	   -!xxx Exclude the file(s) on this line if the symbol xxx is not defined.

	   - Exclude this file. Used to avoid spurious --check messages.

       The guards are processed left to right. The last guard that matches determines if the file is included. If no guard is specified, the
       --default setting determines if the file is included.

       If no configuration file is specified, the script reads from standard input.

       The --check option is used to compare the specification file against the file system. If files are referenced in the specification that do
       not exist, or if files are not enlisted in the specification file warnings are printed. The --path option can be used to specify which
       directory or directories to scan.  Multiple directories are separated by a colon (":") character. The --prefix option specifies the
       location of the files.

AUTHOR

       Andreas Gruenbacher <agruen@suse.de> (SuSE Linux AG)

perl v5.14.2							    2012-03-04								 GUARDS(1)

6 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk parsing problem

Discussion started by: timj123

2. Shell Programming and Scripting

Parsing a complex log file

Discussion started by: exchequer598

3. Shell Programming and Scripting

Difficult problem: Complex text file manipulation in bash script.

Discussion started by: tinman47

4. Shell Programming and Scripting

Complex awk problem

Discussion started by: dietmar13

5. Shell Programming and Scripting

Text string parsing in awk

Discussion started by: Michael Stora

6. Shell Programming and Scripting

awk parsing problem

Discussion started by: dagamier

LEARN ABOUT DEBIAN

guards