cat t2 :
Comma is the field seperator, however string fields will be within double quotes and comma within double quotes should not be treated as field seperator.
I am trying to replace this field seperator to a distinct character like a pipe or \001 and then perform some analysis.
I have used below perl command which is working correctly, but has some problem with performance. My file has about 7 Million rows and this command is taking about 45 mins.
Looking forward for some advise on making this script run faster or if there is alternate approach using unix commands like AWK or SED..
Last edited by Franklin52; 02-17-2011 at 08:57 AM..
Reason: Please use code tags
Hi Guys,
I'm tying to split a line similar to this:YO6-2000-30.htm: (3 properties found).......into separate columns, so effectively I need to check for a -, ., :, a tab and a space in the statement.
Any help would be appreciated
Thanks! (7 Replies)
I need help counting the fields and field separators using Nawk.
I have a file that has multiple lines on it and I need to read the file 1 at a time and then count the fields and field separators and then store those numbers in variables. I then need to delete the first 5 fields and the blank... (3 Replies)
I saw a couple of posts here referencing how to handle more than one input field separator in awk. I figured I would share how I (just!) figured out how to turn this line in a logfile:
90000000000000000000010001 name... (4 Replies)
How do I deal with extracting a portion of a record when multiple field separators are involved.
Let's say I have:
Mike Harrington;(555) 555-5555:250:100:175
Christian Dobbins;(555) 555-2358:155:90:201
Susan Dalsass;(555) 555-6279:250:60:50
Archie McNichol;(555) 555-1348:250:100:175
Jody... (3 Replies)
I have some huge files that are produced daily from a production system written in basic (really). The files are fixed width records, 512 bytes, with newline field separators, newlines if the field is null, and trailing newlines for null fields. The data in the fields can be any ascii... (0 Replies)
Hi Guys,
I have small dilemma which I could do with a little help solving . I currently have text HDD S.M.A.R.T report which I have pasted below:
smartctl 5.39 2008-10-24 22:33 (openSUSE RPM)
Copyright (C) 2002-8 by Bruce Allen, http://smartmontools.sourceforge.net
Device: COMPAQ... (2 Replies)
I have files such as
n02-z30-dsr65-terr0.25-dc0.008-16x12drw-run1.cmd
I am wondering if it is possible to define two field separators "-" and "."
for these strings so that $7 is run1. (5 Replies)
How do I use multiple field separators in awk?
I know that if I use awk -F"", both a and b will be field separators. But what if I need two field separators that both are longer than one letter?
If I want the field separators to be "ab" and "cd", I will not be able to use awk -F"". The ... (2 Replies)
I have a file with two ID columns followed by five columns of counts in fraction form. I'd like to print lines that have a count of at least 4 (so at least 4 in the numerator, e.g. 4/17) in at least one of the five columns.
Input file:
comp51820_c1_seq1 693 0/29 0/50 0/69 0/36 0/31... (6 Replies)
I have a large file that I need to print certain sections out of.
file.txt
/alpha/beta/delta/gamma/425/590/USC00015420.blah.lt.0.01.str:USC00015420Y2017M10BLALT.01 12 13 14 -9 1 -9 -9 -9 -9 -9 1 2 3 4 5 -9 -9
I need to print the "USC00015420" and... (5 Replies)
Discussion started by: ncwxpanther
5 Replies
LEARN ABOUT SUSE
text::parsewords
Text::ParseWords(3pm) Perl Programmers Reference Guide Text::ParseWords(3pm)NAME
Text::ParseWords - parse text into an array of tokens or array of arrays
SYNOPSIS
use Text::ParseWords;
@lists = nested_quotewords($delim, $keep, @lines);
@words = quotewords($delim, $keep, @lines);
@words = shellwords(@lines);
@words = parse_line($delim, $keep, $line);
@words = old_shellwords(@lines); # DEPRECATED!
DESCRIPTION
The &nested_quotewords() and "ewords() functions accept a delimiter (which can be a regular expression) and a list of lines and then
breaks those lines up into a list of words ignoring delimiters that appear inside quotes. "ewords() returns all of the tokens in a
single long list, while &nested_quotewords() returns a list of token lists corresponding to the elements of @lines. &parse_line() does
tokenizing on a single string. The &*quotewords() functions simply call &parse_line(), so if you're only splitting one line you can call
&parse_line() directly and save a function call.
The $keep argument is a boolean flag. If true, then the tokens are split on the specified delimiter, but all other characters (quotes,
backslashes, etc.) are kept in the tokens. If $keep is false then the &*quotewords() functions remove all quotes and backslashes that are
not themselves backslash-escaped or inside of single quotes (i.e., "ewords() tries to interpret these characters just like the Bourne
shell). NB: these semantics are significantly different from the original version of this module shipped with Perl 5.000 through 5.004.
As an additional feature, $keep may be the keyword "delimiters" which causes the functions to preserve the delimiters in each string as
tokens in the token lists, in addition to preserving quote and backslash characters.
&shellwords() is written as a special case of "ewords(), and it does token parsing with whitespace as a delimiter-- similar to most
Unix shells.
EXAMPLES
The sample program:
use Text::ParseWords;
@words = quotewords('s+', 0, q{this is "a test" of quotewords "for you});
$i = 0;
foreach (@words) {
print "$i: <$_>
";
$i++;
}
produces:
0: <this>
1: <is>
2: <a test>
3: <of quotewords>
4: <"for>
5: <you>
demonstrating:
0 a simple word
1 multiple spaces are skipped because of our $delim
2 use of quotes to include a space in a word
3 use of a backslash to include a space in a word
4 use of a backslash to remove the special meaning of a double-quote
5 another simple word (note the lack of effect of the backslashed double-quote)
Replacing "quotewords('s+', 0, q{this is...})" with "shellwords(q{this is...})" is a simpler way to accomplish the same thing.
AUTHORS
Maintainer: Alexandr Ciornii <alexchornyATgmail.com>.
Previous maintainer: Hal Pomeranz <pomeranz@netcom.com>, 1994-1997 (Original author unknown). Much of the code for &parse_line()
(including the primary regexp) from Joerk Behrends <jbehrends@multimediaproduzenten.de>.
Examples section another documentation provided by John Heidemann <johnh@ISI.EDU>
Bug reports, patches, and nagging provided by lots of folks-- thanks everybody! Special thanks to Michael Schwern <schwern@envirolink.org>
for assuring me that a &nested_quotewords() would be useful, and to Jeff Friedl <jfriedl@yahoo-inc.com> for telling me not to worry about
error-checking (sort of-- you had to be there).
perl v5.12.1 2010-04-26 Text::ParseWords(3pm)