Quote:
Originally Posted by
hanson44
Yes, I think the sort syntax reinforces what I was trying to say. "sort -t, -k 8 fields.txt" sorts "field 8". The sort man page refers to "field separator" and "field number" and --field-separator. "column" is not even mentioned on the sort man page.
For the position within a field, both sort and cut use "character" instead of "column". In other words, cut says --characters where I would have said --columns. To me "characters" is confusing, as could be interpreted to be "--characters=ABC" as if looking for those "characters". The man pages says "select only these characters". Why did they choose --characters instead of --columns for the option name?
Of course, nobody is going to change the option name at this point. I suppose they could at least improve the cut man page, to say "select only these character positions".
If you have a tab character, it may occupy one or more columns. If you have a backspace character, that character and the character it follows may occupy only one column. If you are looking at a Kanji character, a single character may occupy two columns. That is why we chose character rather than column for the tag associated with the -c option to sort.
The cut utility does not perform cuts based on the columns in which characters will be displayed. It can perform cuts based on the number of bytes (-b), the number of characters (-c), or the number of fields (-f). There is no option to cut the characters or bytes that will occupy a particular range of column positions (such as recognizing that the three character sequence <a><backspace><underline> immediately following a <newline> character will all occupy output column number one on some output devices). And the way the sequence of characters <a><tab><backspace><c> translates into output columns may vary considerably based not only on the position within a line where it appears but also on the software or hardware that is interpreting that sequence. Does the <backspace> character backspace over the previous output column or over the previous character (<tab> in this case)? What does the <backspace> character do when it is the first character on a line? Again, counting characters provides a clearly defined operation. If we had used output columns instead of characters (or bytes), the behavior required would not match any known existing implementation of the cut utility.
When using a fixed width character set, rows and columns are solid concepts on a CRT, typewriter, or printer and also when talking about entries in a table in a spreadsheet. Columns are much less precise when talking about the contents of a text file.
Characters, on the other hand, are explicitly defined by the LC_CTYPE category of the current locale.