Search Results

Search: Posts Made By: gimley
3,635
Posted By Peasant
Is 700 KB a mistake ? Doesn't sound like a...
Is 700 KB a mistake ?
Doesn't sound like a large file to me....

Can you show input and required output (a small portion of of course).

What 'DOS' are you referring to, what awk are you using...
3,635
Posted By stomp
Hi, I just checked your script on a linux...
Hi,

I just checked your script on a linux system without any output with a file with 1/3 million words in it (filesize 2700 KB: I used this file: wordlist.xz (http://www.megabert.de/wordlist.xz))....
1,590
Posted By RudiC
Well, I have to go now. So - one for the road: ...
Well, I have to go now. So - one for the road:
awk -F= 'split($1,T,"|") != split($2,T,"|")' file
zu|ba|i|dA=ज़ु|बै|दा
zu|ba|i|r=ज़ु|बै|र
3,430
Posted By MadeInGermany
Exactly, X[$0]++ holds a number value; i.e. each...
Exactly, X[$0]++ holds a number value; i.e. each new line consumes a number's space.
3,430
Posted By MadeInGermany
In case there is a RAM shortage, the following...
In case there is a RAM shortage, the following variant helps (saves some bytes per line).
awk '!($0 in X) { print; X[$0] }' file > file.dedup
3,430
Posted By Scrutinizer
Hi, I presume you mean you want to dedupe...
Hi,

I presume you mean you want to dedupe the file (because that is what your script does and that is in the title), not necessarily sort it.

You can try the difference between
awk '!X[$0]++'...
1,807
Posted By RudiC
How about awk -F= 'FNR == NR {if (NR > 1)...
How about
awk -F= 'FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}' file1 file2?
For "go", tryawk -F= 'FNR == NR {if (NR > 1)...
1,807
Posted By RudiC
You forgot one essential thing: setting the field...
You forgot one essential thing: setting the field separator to = .
999
Posted By Don Cragun
Note that although your printf happens to work...
Note that although your printf happens to work with the data you're using, it is dangerous to assume that no characters in data you're printing will ever be interpreted as format string control...
999
Posted By RudiC
You weren't too far off. Try FS="[;=]".
You weren't too far off. Try FS="[;=]".
3,219
Posted By jim mcnamara
That may also be why your perl has issues as...
That may also be why your perl has issues as well. UTF8 characters encode all of Unicode 1,112,064 characters, so a UTF8 character may be 8, 16, 24, or 32 bits.

To fix perl will require the...
3,219
Posted By jim mcnamara
As an aside, there is a split command that does...
As an aside, there is a split command that does exactly what you ask.

split -b [size in bytes ] infile [option control outfile naming]

Linux man page:

split(1) - Linux manual page...
1,173
Posted By bakunin
True. Still, as a measure of safety i would rule...
True. Still, as a measure of safety i would rule out trailing or leading spaces:

sed -n '/^[[:blank]]*//;s/[[:blank:]]*$//;/ /!p' > /result/file

I hope this helps.

bakunin
1,173
Posted By rovf
For instance using grep: grep -v '[^ ] [^ ]'...
For instance using grep:

grep -v '[^ ] [^ ]' your_file
1,110
Posted By RudiC
How about sed 's/[[:punct:]]/ &/g' file s...
How about
sed 's/[[:punct:]]/ &/g' file
s 'est
l 'air
d 'homme
l 'issue
bleu -blanc -rouge
(SDF )
a -t -il ?
1,110
Posted By RudiC
Any attempts / ideas / thoughts from your side? ...
Any attempts / ideas / thoughts from your side?

Is the list given complete, or does your request apply to ALL punctuation chars?
2,780
Posted By Don Cragun
To bring what MadeInGermany said directly into...
To bring what MadeInGermany said directly into your problem statement...

If the following characters are the only legal characters on a line written in Sindhi:...
983
Posted By Aia
Run as perl separate.pl gimley.example use...
Run as perl separate.pl gimley.example
use strict;
use warnings;

my $clean = 'clean.gmly';
my $inconsistent = 'inconsistent.gmly';

open my $clean_fh, '>', $clean or die;
open my...
983
Posted By Scrutinizer
Try: awk -F= 'split($1,F," ")!=split($2,F,"...
Try:
awk -F= 'split($1,F," ")!=split($2,F," "){print>f; next}1' f=file.bad file > file.good
1,227
Posted By MadeInGermany
The for loop can be shortened, and a classic...
The for loop can be shortened, and a classic split trick clears an array.
BEGIN { FS = "[=,]"
}
{ o = $1 "=" $2
s[$2]
for(i = 3; i <= NF; i++)
if(!($i in...
1,227
Posted By RudiC
If the order of the indic glosses is...
If the order of the indic glosses is unimporrtant, try also
awk -F= '
{for (MX=n=split($2, T, ","); n>0; n--) C[T[n]]
printf "%s=", $1
DL = ""
for (c in C) ...
1,227
Posted By Don Cragun
Assuming that the order of the order of the indic...
Assuming that the order of the order of the indic glosses has to be kept as they appear in the input (only removing duplicated indic glosses), assuming that you're using a version of awk that...
1,379
Posted By jim mcnamara
Consolidate the fields: awk -F '[,=]' '{...
Consolidate the fields:

awk -F '[,=]' '{ for(i=1; i<NF; i++) { printf("%s=%s\n", $(i), $(NF) )}} ' filename > newfile


This will not work with older versions of awk.
1,379
Posted By RudiC
Why not adapt / improve your own approach: awk...
Why not adapt / improve your own approach:
awk 'BEGIN{FS="="}
{n=split($1,a,",");for (i=1;i<=n;i++) print a[i]"="$2}' file
1,282
Posted By Aia
Please, try the following: perl -ne '/^\w+=.+-/...
Please, try the following:
perl -ne '/^\w+=.+-/ and print'

Or test with any regex engine that suport Perl regex.

/^\w+=.+-/
Showing results 1 to 25 of 145

 
All times are GMT -4. The time now is 06:23 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy