Search Results

Search: Posts Made By: gimley
4,776
Posted By Peasant
Is 700 KB a mistake ? Doesn't sound like a...
Is 700 KB a mistake ?
Doesn't sound like a large file to me....

Can you show input and required output (a small portion of of course).

What 'DOS' are you referring to, what awk are you using...
4,776
Posted By stomp
Hi, I just checked your script on a linux...
Hi,

I just checked your script on a linux system without any output with a file with 1/3 million words in it (filesize 2700 KB: I used this file: wordlist.xz (http://www.megabert.de/wordlist.xz))....
1,879
Posted By RudiC
Well, I have to go now. So - one for the road: ...
Well, I have to go now. So - one for the road:
awk -F= 'split($1,T,"|") != split($2,T,"|")' file
zu|ba|i|dA=ज़ु|बै|दा
zu|ba|i|r=ज़ु|बै|र
3,789
Posted By MadeInGermany
Exactly, X[$0]++ holds a number value; i.e. each...
Exactly, X[$0]++ holds a number value; i.e. each new line consumes a number's space.
3,789
Posted By MadeInGermany
In case there is a RAM shortage, the following...
In case there is a RAM shortage, the following variant helps (saves some bytes per line).
awk '!($0 in X) { print; X[$0] }' file > file.dedup
3,789
Posted By Scrutinizer
Hi, I presume you mean you want to dedupe...
Hi,

I presume you mean you want to dedupe the file (because that is what your script does and that is in the title), not necessarily sort it.

You can try the difference between
awk '!X[$0]++'...
1,869
Posted By RudiC
How about awk -F= 'FNR == NR {if (NR > 1)...
How about
awk -F= 'FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}' file1 file2?
For "go", tryawk -F= 'FNR == NR {if (NR > 1)...
1,869
Posted By RudiC
You forgot one essential thing: setting the field...
You forgot one essential thing: setting the field separator to = .
1,081
Posted By Don Cragun
Note that although your printf happens to work...
Note that although your printf happens to work with the data you're using, it is dangerous to assume that no characters in data you're printing will ever be interpreted as format string control...
1,081
Posted By RudiC
You weren't too far off. Try FS="[;=]".
You weren't too far off. Try FS="[;=]".
3,322
Posted By jim mcnamara
That may also be why your perl has issues as...
That may also be why your perl has issues as well. UTF8 characters encode all of Unicode 1,112,064 characters, so a UTF8 character may be 8, 16, 24, or 32 bits.

To fix perl will require the...
3,322
Posted By jim mcnamara
As an aside, there is a split command that does...
As an aside, there is a split command that does exactly what you ask.

split -b [size in bytes ] infile [option control outfile naming]

Linux man page:

split(1) - Linux manual page...
1,255
Posted By bakunin
True. Still, as a measure of safety i would rule...
True. Still, as a measure of safety i would rule out trailing or leading spaces:

sed -n '/^[[:blank]]*//;s/[[:blank:]]*$//;/ /!p' > /result/file

I hope this helps.

bakunin
1,255
Posted By rovf
For instance using grep: grep -v '[^ ] [^ ]'...
For instance using grep:

grep -v '[^ ] [^ ]' your_file
1,268
Posted By RudiC
How about sed 's/[[:punct:]]/ &/g' file s...
How about
sed 's/[[:punct:]]/ &/g' file
s 'est
l 'air
d 'homme
l 'issue
bleu -blanc -rouge
(SDF )
a -t -il ?
1,268
Posted By RudiC
Any attempts / ideas / thoughts from your side? ...
Any attempts / ideas / thoughts from your side?

Is the list given complete, or does your request apply to ALL punctuation chars?
3,065
Posted By Don Cragun
To bring what MadeInGermany said directly into...
To bring what MadeInGermany said directly into your problem statement...

If the following characters are the only legal characters on a line written in Sindhi:...
1,025
Posted By Aia
Run as perl separate.pl gimley.example use...
Run as perl separate.pl gimley.example
use strict;
use warnings;

my $clean = 'clean.gmly';
my $inconsistent = 'inconsistent.gmly';

open my $clean_fh, '>', $clean or die;
open my...
1,025
Posted By Scrutinizer
Try: awk -F= 'split($1,F," ")!=split($2,F,"...
Try:
awk -F= 'split($1,F," ")!=split($2,F," "){print>f; next}1' f=file.bad file > file.good
1,294
Posted By MadeInGermany
The for loop can be shortened, and a classic...
The for loop can be shortened, and a classic split trick clears an array.
BEGIN { FS = "[=,]"
}
{ o = $1 "=" $2
s[$2]
for(i = 3; i <= NF; i++)
if(!($i in...
1,294
Posted By RudiC
If the order of the indic glosses is...
If the order of the indic glosses is unimporrtant, try also
awk -F= '
{for (MX=n=split($2, T, ","); n>0; n--) C[T[n]]
printf "%s=", $1
DL = ""
for (c in C) ...
1,294
Posted By Don Cragun
Assuming that the order of the order of the indic...
Assuming that the order of the order of the indic glosses has to be kept as they appear in the input (only removing duplicated indic glosses), assuming that you're using a version of awk that...
1,421
Posted By jim mcnamara
Consolidate the fields: awk -F '[,=]' '{...
Consolidate the fields:

awk -F '[,=]' '{ for(i=1; i<NF; i++) { printf("%s=%s\n", $(i), $(NF) )}} ' filename > newfile


This will not work with older versions of awk.
1,421
Posted By RudiC
Why not adapt / improve your own approach: awk...
Why not adapt / improve your own approach:
awk 'BEGIN{FS="="}
{n=split($1,a,",");for (i=1;i<=n;i++) print a[i]"="$2}' file
1,315
Posted By Aia
Please, try the following: perl -ne '/^\w+=.+-/...
Please, try the following:
perl -ne '/^\w+=.+-/ and print'

Or test with any regex engine that suport Perl regex.

/^\w+=.+-/
Showing results 1 to 25 of 145

 
All times are GMT -4. The time now is 02:35 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy