extracting delimiter from a file.

11-30-2010

Registered User

201, 10

Join Date: Jul 2009

Last Activity: 24 December 2011, 7:16 AM EST

Location: /dev/random

Posts: 201

Thanks Given: 12

Thanked 10 Times in 8 Posts

@bakunin,

The whole point of the example was to show that a delimeter is an arbitrary field separator. Generally, if working with already existing files such as a csv file, a delimiter has been chosen for you. Even in a csv spreadsheet, its still arbitrary, but it wouldn't make much sense to use anything other than a comma. A delimiter is arbitrary in the sense that any character can be used, but its not in the sense that if the character doesn't actually mark anything, its not a delimiter.

Webster's Dictionary definition:

a character that marks the beginning or end of a unit of data

In the case of myfile

afielD1|bfiEld2|cfIeld3

The | is the only delimiter because it is the only character that is actually marking the beginning or end of a unit of data.

ilikecows

View Public Profile for ilikecows

Find all posts by ilikecows

12-01-2010

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Quote:

Originally Posted by ilikecows

Even in a csv spreadsheet, its still arbitrary, but it wouldn't make much sense to use anything other than a comma.

In a *c*sv-file (which is called "comma-separated" for some kind of reason perhaps) this is right. But then, in a comma-separated file there is no necessity to find out the delimiter, as the thread opener wanted to know and has asked.

Quote:

Originally Posted by ilikecows

A delimiter is arbitrary in the sense that any character can be used, but its not in the sense that if the character doesn't actually mark anything, its not a delimiter.

This is a misunderstanding: "doesn't [actually] mark anything" means you second-guess what exactly establishes a meaning. Consider:

Quote:

a||||||b|c

Lets say the pipe character is used as delimiter: several of the fields delimited this way are empty. Do these empty fields establish useful information or not?

Furthermore - sorry, this gets somewhat philosophical -, "meaning" is not an inherent quality at all. The string "abc" might have a meaning or not, depending on what we agree to establish meaning, depending on context, whatever.

Your argument comes down to "plausibility" and while i agree with you that limiting your search for solutions to plausible or obvious ones most times helps to solve real-world problems faster, it simply doesn't help if you are trying to find generalized solutions - like in "write a script to find the delimiter".

Consider the string "a||b||c": does this mean three fields, "a", "b" and "c", delimited by a double pipe char or does it mean 5 fields, two of them empty? Both variants would be plausible enough, both might be correct - or wrong, depending on the intention of the one who wrote the line. But this information cannot be discerned from the file alone at all. You will need some additional information - context - to do so.

Quote:

The | is the only delimiter because it is the only character that is actually marking the beginning or end of a unit of data.

Again, this is appealing to some plausibility. Everything can be considered "data", "afie" or "D1" is (or can be) as much data as "afie|D1" or whatever substring you extract from this line. If it is data or not depends on your ability to derive meaning from it. Again: context.

If i give you a succession of characters, say "R-O-T" - is it data? In other words, does it have a meaning? As long as you don't have additional information you can't decide this question at all. For instance, if you know we are talking in English then this would constitue a word (a verb) and have a meaning. If you know that we are talking in german this would also have a meaning, but a different one ("rot" means "red" and is an adjective) - and if you know we are talking Italian it would have no meaning at all as there is no word "rot" in Italian. It would be some garbled transmission in this case. This means, you need to have (or need to assume) some context (the language) to decide if this string is data or not.

The human brain is very very good in finding (or constructing) patterns, real ones or - in some pathological cases, like the mathematician John Nash - imagined ones. Still, finding a pattern is not discovering some inherent quality of the presented data but to put some organization on received information. But this organization is put on this information from outside and therefore is, what i said: arbitrary. None of these organizations is "better" or "more correct" than any other.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

01-05-2011

Registered User

163, 0

Join Date: Jan 2008

Last Activity: 23 May 2011, 9:23 AM EDT

Posts: 163

Thanks Given: 10

Thanked 0 Times in 0 Posts

Hi,

Finally is there a way to find the delimiter in the file?

Regards
JS

jisha

View Public Profile for jisha

Find all posts by jisha

Shell Programming and Scripting

extracting delimiter from a file.

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl Code to change file delimiter (passed as argument) to bar delimiter

Discussion started by: JPB1977

2. UNIX for Dummies Questions & Answers

Getting the folder name and file name after delimiter

Discussion started by: somu_june

3. Shell Programming and Scripting

Extracting Delimiter 'TAG' Data From log files

Discussion started by: Buddyluv

4. Shell Programming and Scripting

Shell script to put delimiter for a no delimiter variable length text file

Discussion started by: Gaurav Martha

5. UNIX for Advanced & Expert Users

File Delimiter

Discussion started by: raju4u

6. Shell Programming and Scripting

How to cut by delimiter, and delimiter can be anything except numbers?

Discussion started by: sunnydanniel

7. Shell Programming and Scripting

Delimiter in output file

Discussion started by: Shanks

8. UNIX for Dummies Questions & Answers

How to change delimiter in my file ?

Discussion started by: dashing201

9. Shell Programming and Scripting

need help extracting values from string separated by a delimiter

Discussion started by: adshocker

10. Shell Programming and Scripting

Substring based on delimiter, finding last delimiter

Discussion started by: gupt_ash