Search Results

Search: Posts Made By: gimley
4,821
Posted By stomp
Hi, I just checked your script on a linux...
Hi,

I just checked your script on a linux system without any output with a file with 1/3 million words in it (filesize 2700 KB: I used this file: wordlist.xz (http://www.megabert.de/wordlist.xz))....
4,821
Posted By Peasant
Is 700 KB a mistake ? Doesn't sound like a...
Is 700 KB a mistake ?
Doesn't sound like a large file to me....

Can you show input and required output (a small portion of of course).

What 'DOS' are you referring to, what awk are you using...
1,898
Posted By RudiC
Well, I have to go now. So - one for the road: ...
Well, I have to go now. So - one for the road:
awk -F= 'split($1,T,"|") != split($2,T,"|")' file
zu|ba|i|dA=ज़ु|बै|दा
zu|ba|i|r=ज़ु|बै|र
3,825
Posted By MadeInGermany
Exactly, X[$0]++ holds a number value; i.e. each...
Exactly, X[$0]++ holds a number value; i.e. each new line consumes a number's space.
3,825
Posted By MadeInGermany
In case there is a RAM shortage, the following...
In case there is a RAM shortage, the following variant helps (saves some bytes per line).
awk '!($0 in X) { print; X[$0] }' file > file.dedup
3,825
Posted By Scrutinizer
Hi, I presume you mean you want to dedupe...
Hi,

I presume you mean you want to dedupe the file (because that is what your script does and that is in the title), not necessarily sort it.

You can try the difference between
awk '!X[$0]++'...
1,873
Posted By RudiC
You forgot one essential thing: setting the field...
You forgot one essential thing: setting the field separator to = .
1,873
Posted By RudiC
How about awk -F= 'FNR == NR {if (NR > 1)...
How about
awk -F= 'FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}' file1 file2?
For "go", tryawk -F= 'FNR == NR {if (NR > 1)...
1,088
Posted By Don Cragun
Note that although your printf happens to work...
Note that although your printf happens to work with the data you're using, it is dangerous to assume that no characters in data you're printing will ever be interpreted as format string control...
1,088
Posted By RudiC
You weren't too far off. Try FS="[;=]".
You weren't too far off. Try FS="[;=]".
3,328
Posted By jim mcnamara
That may also be why your perl has issues as...
That may also be why your perl has issues as well. UTF8 characters encode all of Unicode 1,112,064 characters, so a UTF8 character may be 8, 16, 24, or 32 bits.

To fix perl will require the...
3,328
Posted By jim mcnamara
As an aside, there is a split command that does...
As an aside, there is a split command that does exactly what you ask.

split -b [size in bytes ] infile [option control outfile naming]

Linux man page:

split(1) - Linux manual page...
1,256
Posted By rovf
For instance using grep: grep -v '[^ ] [^ ]'...
For instance using grep:

grep -v '[^ ] [^ ]' your_file
1,256
Posted By bakunin
True. Still, as a measure of safety i would rule...
True. Still, as a measure of safety i would rule out trailing or leading spaces:

sed -n '/^[[:blank]]*//;s/[[:blank:]]*$//;/ /!p' > /result/file

I hope this helps.

bakunin
1,275
Posted By RudiC
How about sed 's/[[:punct:]]/ &/g' file s...
How about
sed 's/[[:punct:]]/ &/g' file
s 'est
l 'air
d 'homme
l 'issue
bleu -blanc -rouge
(SDF )
a -t -il ?
1,275
Posted By RudiC
Any attempts / ideas / thoughts from your side? ...
Any attempts / ideas / thoughts from your side?

Is the list given complete, or does your request apply to ALL punctuation chars?
3,088
Posted By Don Cragun
To bring what MadeInGermany said directly into...
To bring what MadeInGermany said directly into your problem statement...

If the following characters are the only legal characters on a line written in Sindhi:...
1,032
Posted By Scrutinizer
Try: awk -F= 'split($1,F," ")!=split($2,F,"...
Try:
awk -F= 'split($1,F," ")!=split($2,F," "){print>f; next}1' f=file.bad file > file.good
1,032
Posted By Aia
Run as perl separate.pl gimley.example use...
Run as perl separate.pl gimley.example
use strict;
use warnings;

my $clean = 'clean.gmly';
my $inconsistent = 'inconsistent.gmly';

open my $clean_fh, '>', $clean or die;
open my...
1,298
Posted By RudiC
If the order of the indic glosses is...
If the order of the indic glosses is unimporrtant, try also
awk -F= '
{for (MX=n=split($2, T, ","); n>0; n--) C[T[n]]
printf "%s=", $1
DL = ""
for (c in C) ...
1,298
Posted By MadeInGermany
The for loop can be shortened, and a classic...
The for loop can be shortened, and a classic split trick clears an array.
BEGIN { FS = "[=,]"
}
{ o = $1 "=" $2
s[$2]
for(i = 3; i <= NF; i++)
if(!($i in...
1,298
Posted By Don Cragun
Assuming that the order of the order of the indic...
Assuming that the order of the order of the indic glosses has to be kept as they appear in the input (only removing duplicated indic glosses), assuming that you're using a version of awk that...
1,424
Posted By RudiC
Why not adapt / improve your own approach: awk...
Why not adapt / improve your own approach:
awk 'BEGIN{FS="="}
{n=split($1,a,",");for (i=1;i<=n;i++) print a[i]"="$2}' file
1,424
Posted By jim mcnamara
Consolidate the fields: awk -F '[,=]' '{...
Consolidate the fields:

awk -F '[,=]' '{ for(i=1; i<NF; i++) { printf("%s=%s\n", $(i), $(NF) )}} ' filename > newfile


This will not work with older versions of awk.
1,319
Posted By Aia
Please, try the following: perl -ne '/^\w+=.+-/...
Please, try the following:
perl -ne '/^\w+=.+-/ and print'

Or test with any regex engine that suport Perl regex.

/^\w+=.+-/
Showing results 1 to 25 of 145

 
All times are GMT -4. The time now is 09:28 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy