Should I say "field 8" or "column 8" in this case?


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Should I say "field 8" or "column 8" in this case?
# 1  
Old 03-22-2013
Should I say "field 8" or "column 8" in this case?

I saw some recent posts where I thought the terms "field" and "column" were being misused. I work with data a lot, and have my opinions. I'm wondering if those opinions are correct.

***** Rows seem clear - I don't think there is any controversy about what a row is, either for database or text file.

***** Flat file columns seem clear - For a flat file such as the following, I don't think there is any controversy about what a column is. Column in file shown is like "cut -c 1". Several columns may combine to make a field, so "cut -c 1-11" cuts the field in columns 1-11 (record ID here), such as 09011101001, 09011101002, etc.
Code:
09011101001270101192008BNB1102008000027060126720001305591
09011101002230101212008B5P1102008000053110126720001305591
090111010032501011120084XB1102008000085030126720001305591
09011101005250101232008GUW1202008000145050126720001305591
09011101006070101132008E3S1102008000157050126720001305591
09012101007060102062008GWB1102008000186030361080005352411
090111010081601011920082XW1102008000226050126720001305591

****** CSV and TSV "columns" seem misused to me - Here is similar data, in TSV format. I would call say "data in field 8" instead of "data in column 8". I think I'm supported by the cut command and it's use of "cut -f 8 -d," (--fields) for parsing this kind of data. To me, "column 8" means "cut -c 8". By the time we get out to "field 8", it doesn't line up vertically anymore, so I doesn't even look like a column. But it seems many, or perhaps most, say "data in column 8". But many of those can barely string together a sentence. Smilie So I thought I would ask the experts. Is it more correct to say "column 8" or "field 8" for what "cut -f 8 -d," retrieves in example below? Smilie
Code:
9,1,1,1,1001,27,1,01192008,01,19,2008,BNB,110,2008000027
9,1,1,1,1002,23,1,01212008,01,21,2008,B5P,110,2008000053
9,1,1,1,1003,25,1,01112008,01,11,2008,4XB,110,2008000085
9,1,1,1,1005,25,1,01232008,01,23,2008,GUW,120,2008000145
9,1,1,1,1006,7,1,01132008,01,13,2008,E3S,110,2008000157,2
9,1,2,1,1007,6,1,02062008,02,06,2008,GWB,110,2008000186,2
9,1,1,1,1008,16,1,01192008,01,19,2008,2XW,110,2008000226

# 2  
Old 03-22-2013
Hi.

Good topic. It may help all of us communicate better.

I tend to prefer the term fields when talking about variable-width data groups, and columns when considering fixed-width data. The article at http://en.wikipedia.org/wiki/Field_(computer_science) pretty much describes my outlook. If there is a separator character or string, I'd call it variable-width, even if all of the members of a specific group are the same width, primarily because any member could become a different width in the future.

From a historical point of view, the FORTRAN influence caused people to describe data in terms of fields even though the data were in fixed-width. Only somewhat later, perhaps in the '70s or '80s would the idea of free-form data become more prevalent.

Thanks for starting the discussion ... cheers, drl

Last edited by drl; 03-22-2013 at 08:20 AM..
# 3  
Old 03-22-2013
Good points, all.

I view fields as objects that are horizontally delimited and not in a fixed position, like drl.
I remember fields from FORTRAN and some versions of BASIC. Now the main driver seems to be portability of data files from UNIX into Excel.

UNIX uses field separators:
sort has a notion of fields delimited by -t [character].
from man sort for GNU sort
Code:
-t, --field-separator=SEP

awk has had FS from its inception.
# 4  
Old 03-22-2013
Yes, I think the sort syntax reinforces what I was trying to say. "sort -t, -k 8 fields.txt" sorts "field 8". The sort man page refers to "field separator" and "field number" and --field-separator. "column" is not even mentioned on the sort man page.

For the position within a field, both sort and cut use "character" instead of "column". In other words, cut says --characters where I would have said --columns. To me "characters" is confusing, as could be interpreted to be "--characters=ABC" as if looking for those "characters". The man pages says "select only these characters". Why did they choose --characters instead of --columns for the option name?

Of course, nobody is going to change the option name at this point. I suppose they could at least improve the cut man page, to say "select only these character positions".
# 5  
Old 03-23-2013
Quote:
Originally Posted by hanson44
Yes, I think the sort syntax reinforces what I was trying to say. "sort -t, -k 8 fields.txt" sorts "field 8". The sort man page refers to "field separator" and "field number" and --field-separator. "column" is not even mentioned on the sort man page.

For the position within a field, both sort and cut use "character" instead of "column". In other words, cut says --characters where I would have said --columns. To me "characters" is confusing, as could be interpreted to be "--characters=ABC" as if looking for those "characters". The man pages says "select only these characters". Why did they choose --characters instead of --columns for the option name?

Of course, nobody is going to change the option name at this point. I suppose they could at least improve the cut man page, to say "select only these character positions".
If you have a tab character, it may occupy one or more columns. If you have a backspace character, that character and the character it follows may occupy only one column. If you are looking at a Kanji character, a single character may occupy two columns. That is why we chose character rather than column for the tag associated with the -c option to sort.

The cut utility does not perform cuts based on the columns in which characters will be displayed. It can perform cuts based on the number of bytes (-b), the number of characters (-c), or the number of fields (-f). There is no option to cut the characters or bytes that will occupy a particular range of column positions (such as recognizing that the three character sequence <a><backspace><underline> immediately following a <newline> character will all occupy output column number one on some output devices). And the way the sequence of characters <a><tab><backspace><c> translates into output columns may vary considerably based not only on the position within a line where it appears but also on the software or hardware that is interpreting that sequence. Does the <backspace> character backspace over the previous output column or over the previous character (<tab> in this case)? What does the <backspace> character do when it is the first character on a line? Again, counting characters provides a clearly defined operation. If we had used output columns instead of characters (or bytes), the behavior required would not match any known existing implementation of the cut utility.

When using a fixed width character set, rows and columns are solid concepts on a CRT, typewriter, or printer and also when talking about entries in a table in a spreadsheet. Columns are much less precise when talking about the contents of a text file.

Characters, on the other hand, are explicitly defined by the LC_CTYPE category of the current locale.
# 6  
Old 03-23-2013
What you say makes sense for UTF-8 or other multi-byte characters, the case I wasn't considering. I knew all that, but it turns out I didn't really know it. Smilie I'm so used to dealing with regular ASCII printing characters taking up one column, but I need to keep locales in mind. Thanks
# 7  
Old 03-23-2013
Quote:
Originally Posted by hanson44
What you say makes sense for UTF-8 or other multi-byte characters, the case I wasn't considering. I knew all that, but it turns out I didn't really know it. Smilie I'm so used to dealing with regular ASCII printing characters taking up one column, but I need to keep locales in mind. Thanks
Even with single-byte characters, tab and backspace seldom take up one column. I know that these characters aren't in the print class, but cut and sort don't just work on characters for which isprint(char) returns true.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

Apache 2.4 directory cannot display "Last modified" "Size" "Description"

Hi 2 all, i have had AIX 7.2 :/# /usr/IBMAHS/bin/apachectl -v Server version: Apache/2.4.12 (Unix) Server built: May 25 2015 04:58:27 :/#:/# /usr/IBMAHS/bin/apachectl -M Loaded Modules: core_module (static) so_module (static) http_module (static) mpm_worker_module (static) ... (3 Replies)
Discussion started by: penchev
3 Replies

2. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

3. UNIX for Dummies Questions & Answers

Using "mailx" command to read "to" and "cc" email addreses from input file

How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address and column 3 contains “cc” e-mail address to include with same email. Sample input file, email.txt Below is an sample code where... (2 Replies)
Discussion started by: asjaiswal
2 Replies

4. Shell Programming and Scripting

Awk,sed : change every 2nd field ":" to "|"

Hi Experts, I have a string with colon delimited, want 2nd colon to be changed to a pipe. data: 101:8:43:4:72:14:41:69:85:3:137:4:3:0:4:0:9:3:0:3:12:3: I am trying with sed, but can change only 1 occurance: echo "101:8:43:4:72:14:41:69:85:3:137:4:3:0:4:0:9:3:0:3:12:3:" | sed 's/:/|/2'... (5 Replies)
Discussion started by: rveri
5 Replies

5. Shell Programming and Scripting

Substituting comma "," for dot "." in a specific column when comma"," is a delimiter

Hi, I'm dealing with an issue and losing a lot of hours figuring out how i would solve this. I have an input file which looks like this: ('BLABLA +200-GRS','Serviço ','TarifaçãoServiço','wap.bla.us.0000000121',2985,0,55,' de conversão em escada','Dia','Domingos') ('BLABLA +200-GRR','Serviço... (6 Replies)
Discussion started by: poliver
6 Replies

6. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

7. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

8. UNIX for Dummies Questions & Answers

Explanation of "total" field in "ls -l" command output

When I do a listing in one particular directory (ls -al) I get: total 43456 drwxrwxrwx 2 root root 4096 drwxrwxrwx 3 root root 4096 -rwxrwxr-x 1 nobody nobody 3701594 -rwxrwxr-x 1 nobody nobody 3108510 -rwxrwxr-x 1 nobody nobody 3070580 -rwxrwxr-x 1 nobody nobody 3099733 -rwxrwxr-x 1... (1 Reply)
Discussion started by: proactiveaditya
1 Replies

9. UNIX for Dummies Questions & Answers

Explain the line "mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`"

Hi Friends, Can any of you explain me about the below line of code? mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'` Im not able to understand, what exactly it is doing :confused: Any help would be useful for me. Lokesha (4 Replies)
Discussion started by: Lokesha
4 Replies
Login or Register to Ask a Question