Search Results

Search: Posts Made By: gimley
3,591
Posted By Peasant
Is 700 KB a mistake ? Doesn't sound like a...
Is 700 KB a mistake ?
Doesn't sound like a large file to me....

Can you show input and required output (a small portion of of course).

What 'DOS' are you referring to, what awk are you using...
3,591
Posted By stomp
Hi, I just checked your script on a linux...
Hi,

I just checked your script on a linux system without any output with a file with 1/3 million words in it (filesize 2700 KB: I used this file: wordlist.xz (http://www.megabert.de/wordlist.xz))....
1,582
Posted By RudiC
Well, I have to go now. So - one for the road: ...
Well, I have to go now. So - one for the road:
awk -F= 'split($1,T,"|") != split($2,T,"|")' file
zu|ba|i|dA=ज़ु|बै|दा
zu|ba|i|r=ज़ु|बै|र
3,418
Posted By MadeInGermany
Exactly, X[$0]++ holds a number value; i.e. each...
Exactly, X[$0]++ holds a number value; i.e. each new line consumes a number's space.
3,418
Posted By MadeInGermany
In case there is a RAM shortage, the following...
In case there is a RAM shortage, the following variant helps (saves some bytes per line).
awk '!($0 in X) { print; X[$0] }' file > file.dedup
3,418
Posted By Scrutinizer
Hi, I presume you mean you want to dedupe...
Hi,

I presume you mean you want to dedupe the file (because that is what your script does and that is in the title), not necessarily sort it.

You can try the difference between
awk '!X[$0]++'...
1,800
Posted By RudiC
How about awk -F= 'FNR == NR {if (NR > 1)...
How about
awk -F= 'FNR == NR {if (NR > 1) TA[$1] = $2; next} {TMP = $0; for (t in TA) {$0 = TMP; sub ("\|", t); sub ("#", TA[t]); print}}' file1 file2?
For "go", tryawk -F= 'FNR == NR {if (NR > 1)...
1,800
Posted By RudiC
You forgot one essential thing: setting the field...
You forgot one essential thing: setting the field separator to = .
992
Posted By Don Cragun
Note that although your printf happens to work...
Note that although your printf happens to work with the data you're using, it is dangerous to assume that no characters in data you're printing will ever be interpreted as format string control...
992
Posted By RudiC
You weren't too far off. Try FS="[;=]".
You weren't too far off. Try FS="[;=]".
3,211
Posted By jim mcnamara
That may also be why your perl has issues as...
That may also be why your perl has issues as well. UTF8 characters encode all of Unicode 1,112,064 characters, so a UTF8 character may be 8, 16, 24, or 32 bits.

To fix perl will require the...
3,211
Posted By jim mcnamara
As an aside, there is a split command that does...
As an aside, there is a split command that does exactly what you ask.

split -b [size in bytes ] infile [option control outfile naming]

Linux man page:

split(1) - Linux manual page...
1,169
Posted By bakunin
True. Still, as a measure of safety i would rule...
True. Still, as a measure of safety i would rule out trailing or leading spaces:

sed -n '/^[[:blank]]*//;s/[[:blank:]]*$//;/ /!p' > /result/file

I hope this helps.

bakunin
1,169
Posted By rovf
For instance using grep: grep -v '[^ ] [^ ]'...
For instance using grep:

grep -v '[^ ] [^ ]' your_file
1,104
Posted By RudiC
How about sed 's/[[:punct:]]/ &/g' file s...
How about
sed 's/[[:punct:]]/ &/g' file
s 'est
l 'air
d 'homme
l 'issue
bleu -blanc -rouge
(SDF )
a -t -il ?
1,104
Posted By RudiC
Any attempts / ideas / thoughts from your side? ...
Any attempts / ideas / thoughts from your side?

Is the list given complete, or does your request apply to ALL punctuation chars?
2,772
Posted By Don Cragun
To bring what MadeInGermany said directly into...
To bring what MadeInGermany said directly into your problem statement...

If the following characters are the only legal characters on a line written in Sindhi:...
983
Posted By Aia
Run as perl separate.pl gimley.example use...
Run as perl separate.pl gimley.example
use strict;
use warnings;

my $clean = 'clean.gmly';
my $inconsistent = 'inconsistent.gmly';

open my $clean_fh, '>', $clean or die;
open my...
983
Posted By Scrutinizer
Try: awk -F= 'split($1,F," ")!=split($2,F,"...
Try:
awk -F= 'split($1,F," ")!=split($2,F," "){print>f; next}1' f=file.bad file > file.good
1,225
Posted By MadeInGermany
The for loop can be shortened, and a classic...
The for loop can be shortened, and a classic split trick clears an array.
BEGIN { FS = "[=,]"
}
{ o = $1 "=" $2
s[$2]
for(i = 3; i <= NF; i++)
if(!($i in...
1,225
Posted By RudiC
If the order of the indic glosses is...
If the order of the indic glosses is unimporrtant, try also
awk -F= '
{for (MX=n=split($2, T, ","); n>0; n--) C[T[n]]
printf "%s=", $1
DL = ""
for (c in C) ...
1,225
Posted By Don Cragun
Assuming that the order of the order of the indic...
Assuming that the order of the order of the indic glosses has to be kept as they appear in the input (only removing duplicated indic glosses), assuming that you're using a version of awk that...
1,379
Posted By jim mcnamara
Consolidate the fields: awk -F '[,=]' '{...
Consolidate the fields:

awk -F '[,=]' '{ for(i=1; i<NF; i++) { printf("%s=%s\n", $(i), $(NF) )}} ' filename > newfile


This will not work with older versions of awk.
1,379
Posted By RudiC
Why not adapt / improve your own approach: awk...
Why not adapt / improve your own approach:
awk 'BEGIN{FS="="}
{n=split($1,a,",");for (i=1;i<=n;i++) print a[i]"="$2}' file
1,282
Posted By Aia
Please, try the following: perl -ne '/^\w+=.+-/...
Please, try the following:
perl -ne '/^\w+=.+-/ and print'

Or test with any regex engine that suport Perl regex.

/^\w+=.+-/
Showing results 1 to 25 of 145

 
All times are GMT -4. The time now is 01:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy