So, I added a "group id" to be included in the sort and solved it that way, and I don't pipe to sort individual arrays (which I suspect was a bug)
Just in case my code now looks like this:
and on the output I do
to clean out those sort-related columns
Hello,
I am currently trying to edit an ldif file. The ldif specification states that a newline followed by a space indicates the subsequent line is a continuation of the line. So, in order to search and replace properly and edit the file, I open the file in textwrangler, search for "\r " and... (14 Replies)
I know uniq exists, but am not sure how to remove repeating lines when they are groups of two different lines repeating themselves, without using sort. I need them to be sorted in the original order, just to remove repeats.
cd /media/AUDIO/WAVE/9780743518673/mp3
~/Desktop/mp3-to-m4b... (1 Reply)
Hi Guys,
First post! I've seen a few options but dont know the most efficient:
I have a directory with a 150,000+ text files in it
I want to merge them into files contain 10,000 files with a carriage return in between.
Thanks
P
The following is an example but doesnt limit the... (2 Replies)
Hello,
I'm working with a file that has three columns. The first one represents a certain channel and the third one a timestamp (second one is not important). Example input is as follows:
2513 12 10.771
2513 13 10.771
2513 14 10.771
2513 15 10.771
2644 8 10.771
... (6 Replies)
G'day all,
I'm have tons of image files I need to process, but I don't need to process all of them and it would take a long time to process them all if I don't have to.
The images are arranged in folders like this...
folder1/RawData
folder2/RawData
folder3/RawData
...
folderN/RawData
... (2 Replies)
I have two files.
File 1 is a two-column index file, e.g.
comp11084_c0_seq6:130-468(-) comp12746_c0_seq3:140-478(+)
comp11084_c0_seq3:201-539(-) comp12746_c0_seq2:191-529(+)
File 2 is a sequence file with headers named with the same terms that populate file 1. ... (1 Reply)
Hello to all,
I'm trying to print the value corresponding to the words A, B, C, D, E. These words could appear sometimes and sometimes not inside each group of lines. Each group of lines begins with "ZYX".
My issue with current code is that should print values for 3 groups and only is... (6 Replies)
Hi, I have some data I have taken from the internet in the following scheme:
name
direction
webpage
phone number
open hours
menu url
book url
name
...
Of course the only line that is mandatory is the name wich is the one I want to sort by.
I have the following sed & awk script that... (3 Replies)
Discussion started by: devmsv
3 Replies
LEARN ABOUT MOJAVE
sort
sort(3pm) Perl Programmers Reference Guide sort(3pm)NAME
sort - perl pragma to control sort() behaviour
SYNOPSIS
use sort 'stable'; # guarantee stability
use sort '_quicksort'; # use a quicksort algorithm
use sort '_mergesort'; # use a mergesort algorithm
use sort 'defaults'; # revert to default behavior
no sort 'stable'; # stability not important
use sort '_qsort'; # alias for quicksort
my $current;
BEGIN {
$current = sort::current(); # identify prevailing algorithm
}
DESCRIPTION
With the "sort" pragma you can control the behaviour of the builtin "sort()" function.
In Perl versions 5.6 and earlier the quicksort algorithm was used to implement "sort()", but in Perl 5.8 a mergesort algorithm was also
made available, mainly to guarantee worst case O(N log N) behaviour: the worst case of quicksort is O(N**2). In Perl 5.8 and later,
quicksort defends against quadratic behaviour by shuffling large arrays before sorting.
A stable sort means that for records that compare equal, the original input ordering is preserved. Mergesort is stable, quicksort is not.
Stability will matter only if elements that compare equal can be distinguished in some other way. That means that simple numerical and
lexical sorts do not profit from stability, since equal elements are indistinguishable. However, with a comparison such as
{ substr($a, 0, 3) cmp substr($b, 0, 3) }
stability might matter because elements that compare equal on the first 3 characters may be distinguished based on subsequent characters.
In Perl 5.8 and later, quicksort can be stabilized, but doing so will add overhead, so it should only be done if it matters.
The best algorithm depends on many things. On average, mergesort does fewer comparisons than quicksort, so it may be better when
complicated comparison routines are used. Mergesort also takes advantage of pre-existing order, so it would be favored for using "sort()"
to merge several sorted arrays. On the other hand, quicksort is often faster for small arrays, and on arrays of a few distinct values,
repeated many times. You can force the choice of algorithm with this pragma, but this feels heavy-handed, so the subpragmas beginning with
a "_" may not persist beyond Perl 5.8. The default algorithm is mergesort, which will be stable even if you do not explicitly demand it.
But the stability of the default sort is a side-effect that could change in later versions. If stability is important, be sure to say so
with a
use sort 'stable';
The "no sort" pragma doesn't forbid what follows, it just leaves the choice open. Thus, after
no sort qw(_mergesort stable);
a mergesort, which happens to be stable, will be employed anyway. Note that
no sort "_quicksort";
no sort "_mergesort";
have exactly the same effect, leaving the choice of sort algorithm open.
CAVEATS
As of Perl 5.10, this pragma is lexically scoped and takes effect at compile time. In earlier versions its effect was global and took
effect at run-time; the documentation suggested using "eval()" to change the behaviour:
{ eval 'use sort qw(defaults _quicksort)'; # force quicksort
eval 'no sort "stable"'; # stability not wanted
print sort::current . "
";
@a = sort @b;
eval 'use sort "defaults"'; # clean up, for others
}
{ eval 'use sort qw(defaults stable)'; # force stability
print sort::current . "
";
@c = sort @d;
eval 'use sort "defaults"'; # clean up, for others
}
Such code no longer has the desired effect, for two reasons. Firstly, the use of "eval()" means that the sorting algorithm is not changed
until runtime, by which time it's too late to have any effect. Secondly, "sort::current" is also called at run-time, when in fact the
compile-time value of "sort::current" is the one that matters.
So now this code would be written:
{ use sort qw(defaults _quicksort); # force quicksort
no sort "stable"; # stability not wanted
my $current;
BEGIN { $current = sort::current; }
print "$current
";
@a = sort @b;
# Pragmas go out of scope at the end of the block
}
{ use sort qw(defaults stable); # force stability
my $current;
BEGIN { $current = sort::current; }
print "$current
";
@c = sort @d;
}
perl v5.18.2 2013-11-04 sort(3pm)