The error seem to be localized here where i sort the utt2spk file, which is done like this..
There are several problems here and they are not necessarily related. let me address them one by one:
1) input file as output file
In general you cannot use the file you read from for input as the output file at the same time. You need to write to an intermediate file and then move that to the original place overwriting the original. This - as a side effect - makes the whole process a little bit safer in case something goes wrong. Take the following as a sketch and modify the error handling according to your needs:
2) Note the difference between numerical and alphabetical sorting
In your request you imply your expectation to have the file (partially) sorted numerically. The difference is that alphabetically "a12bc" is after "a123bc" because "3" (4th character in second string) is before "b" in ASCII. But numerically you will want to have "12" before "123". You need to define a numeric sort order by using the "-n" switch of sort. I suggest to read the man page of sort for the details.
3) Internationalisation
This is - according to the POSIX documentation - already done. sort when starting uses the internationalisation variables (LANG, LC_*, ...) to determine the collation sequence applying to the sort. This only applies to special characters, though (like Umlauts in german ["ä", "ö", ...], the spanish enje ["ñ"], etc.). It won't affect the sorting of numbers vs. letters.
I hope this helps.
bakunin
Last edited by bakunin; 12-19-2016 at 03:33 PM..
Reason: typos
So.. To sum it up, you are saying that sort should do this automatically, and my error should be somewhere else?
Actually: no.
To sum it up, i said:
Quote:
Originally Posted by bakunin
You need to define a numeric sort order by using the "-n" switch of sort. I suggest to read the man page of sort for the details.
This (the lack of using the "-n" option), in fact, is what is causing the wrong sort. As i do not know the detailed layout of your input file i cannot explain what exactly you need to specify as sort options, but reading the man page of sort (try the command man sort) should explain to you what you need, given the pointers i gave you.
I don't think that sort will automatically do what you want, you need to give it the information on how to sort. The problem you might suffer is that if you want to numerically sort, the start of the number is a variable length from the start of the string.
If you had a line that the field was numeric from (say) character 11, we could work with that using the -k flag, even if it was a bit complex. Because we can't be sure where the digits start, this will be more complex.
One way might be to process the file and insert a placeholder character (choose something that will never appear naturally in the file) so as can use it to get the numerics in a fixed position. Then we can sort numerically as a secondary key (with the primary sort key stopping before the numeric) and finally strip out the placeholder character.
Put in a more structured form:-
Convert lines that start something like flrp-b-an2 to start like this flrp-b-an@2 (using @ as the placeholder.
Sort the file primary key starting in field 1, character 1 and ending at field 1 character 10 (inclusive)
... and the secondary key being numeric starting field1 character 11 and ending at the end of field 1
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
In situations like this, I use msort. It is in many ways a work-alike for the standard sort, but it has a number of extra features, including the one we use here: a hybrid key, which is composed of alphabetic and numeric. Note that there is just the single command msort that does the work, after the setup and the verification of correctness:
producing:
There is a price to pay for the features -- msort is slower than the standard sort.
The command msort can be found in many repositories, or, as noted in the details below, be also found at the msort home site:
Best wishes ... cheers, drl
Thanks for the response.. I guess i might have added a detail, on how i check if the sorting is done correctly.
I have a function which goes through, and if that function deem it ok, it would be sorted correctly. Of what i think looks it checks lexigraphically
the function is
---------- Post updated at 02:41 PM ---------- Previous update was at 10:33 AM ----------
Quote:
Originally Posted by drl
Hi.
In situations like this, I use msort. It is in many ways a work-alike for the standard sort, but it has a number of extra features, including the one we use here: a hybrid key, which is composed of alphabetic and numeric. Note that there is just the single command msort that does the work, after the setup and the verification of correctness:
producing:
There is a price to pay for the features -- msort is slower than the standard sort.
The command msort can be found in many repositories, or, as noted in the details below, be also found at the msort home site:
Best wishes ... cheers, drl
I am not sure i understand how i should make it sort my input text?.. How slow is it?
I am running a command that is part of a script and this is what I am getting when it is sorted by the command:
command:
ls /tmp/test/*NDMP*.z
/tmp/test/CARS-GOLD-NET_CHROMJOB-01-XZ-ARCHIVE-NDMP.z
/tmp/test/CARS-GOLD-NET_CHROMJOB-01-XZ-NDMP.z... (2 Replies)
URGENT HELP IS NEEDED!!
I am looking to move matching lines (01 - 07) from File1 and 77 tab the matching string from File2, to File3.txt. I am almost done but
- Currently, script is not printing lines to File3.txt in order.
- Also the matching lines are not moving out of File1.txt
... (1 Reply)
Hello
I greped some lines from an xml file and generated a new file.
but some entries are missing my table is unsorted.
e.g.
NAME="Adel" ADDRESS="Donaustr." NUMBER="2" POSTCODE="33333"
NAME="Adel" ADDRESS="Donaustr." NUMBER="2" POSTCODE="33333"
NAME="Adel" NUMBER="2" POSTCODE="33333"... (5 Replies)
I've got a disorganized list of items and quantities for each. I've been using a combination of grep and sort to find out how much to buy of each item. I'm tired of having to constantly using these commands so I've been trying to write a shell script to make it easier, but I can't figure out how... (3 Replies)
I need to use bash to remove duplicates without using sort first.
I can not use:
cat file | sort | uniq
But when I use only
cat file | uniq
some duplicates are not removed. (4 Replies)
I am in the process of sorting an AutoHotkey script's contents so as to make it easier for me to find and view its nearly 200 buzzwords (when I forget which one corresponds with what phrase, which I do now and then).
About half to two-thirds of the script's key phrases correspond to locations... (7 Replies)
Hi,
I have a list of values from associative array from 0,..till 1.0000.
I tried various sort options; sort -g, sort -nr but it still couldnt work. In other words, the numbers are not sorted accordingly.
Please help.
Thanks. (1 Reply)
I did a search on this, and found lots on SORT but no answer to my question.
I have a C program that fetches all of our users from Netware, and I have that it makes a file that I later include in a html as a select tag drop-down menu.
Here is what 1 line looks like:
<option... (5 Replies)