extracting delimiter from a file.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extracting delimiter from a file.
# 8  
Old 11-30-2010
@bakunin,

The whole point of the example was to show that a delimeter is an arbitrary field separator. Generally, if working with already existing files such as a csv file, a delimiter has been chosen for you. Even in a csv spreadsheet, its still arbitrary, but it wouldn't make much sense to use anything other than a comma. A delimiter is arbitrary in the sense that any character can be used, but its not in the sense that if the character doesn't actually mark anything, its not a delimiter.

Webster's Dictionary definition:

a character that marks the beginning or end of a unit of data

In the case of myfile

afielD1|bfiEld2|cfIeld3

The | is the only delimiter because it is the only character that is actually marking the beginning or end of a unit of data.
# 9  
Old 12-01-2010
Quote:
Originally Posted by ilikecows
Even in a csv spreadsheet, its still arbitrary, but it wouldn't make much sense to use anything other than a comma.
In a *c*sv-file (which is called "comma-separated" for some kind of reason perhaps) this is right. But then, in a comma-separated file there is no necessity to find out the delimiter, as the thread opener wanted to know and has asked.


Quote:
Originally Posted by ilikecows
A delimiter is arbitrary in the sense that any character can be used, but its not in the sense that if the character doesn't actually mark anything, its not a delimiter.
This is a misunderstanding: "doesn't [actually] mark anything" means you second-guess what exactly establishes a meaning. Consider:

Quote:
a||||||b|c
Lets say the pipe character is used as delimiter: several of the fields delimited this way are empty. Do these empty fields establish useful information or not?

Furthermore - sorry, this gets somewhat philosophical -, "meaning" is not an inherent quality at all. The string "abc" might have a meaning or not, depending on what we agree to establish meaning, depending on context, whatever.

Your argument comes down to "plausibility" and while i agree with you that limiting your search for solutions to plausible or obvious ones most times helps to solve real-world problems faster, it simply doesn't help if you are trying to find generalized solutions - like in "write a script to find the delimiter".

Consider the string "a||b||c": does this mean three fields, "a", "b" and "c", delimited by a double pipe char or does it mean 5 fields, two of them empty? Both variants would be plausible enough, both might be correct - or wrong, depending on the intention of the one who wrote the line. But this information cannot be discerned from the file alone at all. You will need some additional information - context - to do so.


Quote:
The | is the only delimiter because it is the only character that is actually marking the beginning or end of a unit of data.

Again, this is appealing to some plausibility. Everything can be considered "data", "afie" or "D1" is (or can be) as much data as "afie|D1" or whatever substring you extract from this line. If it is data or not depends on your ability to derive meaning from it. Again: context.

If i give you a succession of characters, say "R-O-T" - is it data? In other words, does it have a meaning? As long as you don't have additional information you can't decide this question at all. For instance, if you know we are talking in English then this would constitue a word (a verb) and have a meaning. If you know that we are talking in german this would also have a meaning, but a different one ("rot" means "red" and is an adjective) - and if you know we are talking Italian it would have no meaning at all as there is no word "rot" in Italian. It would be some garbled transmission in this case. This means, you need to have (or need to assume) some context (the language) to decide if this string is data or not.

The human brain is very very good in finding (or constructing) patterns, real ones or - in some pathological cases, like the mathematician John Nash - imagined ones. Still, finding a pattern is not discovering some inherent quality of the presented data but to put some organization on received information. But this organization is put on this information from outside and therefore is, what i said: arbitrary. None of these organizations is "better" or "more correct" than any other.

bakunin
# 10  
Old 01-05-2011
Hi,

Finally is there a way to find the delimiter in the file?

Regards
JS
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl Code to change file delimiter (passed as argument) to bar delimiter

Hi, Extremely new to Perl scripting, but need a quick fix without using TEXT::CSV I need to read in a file, pass any delimiter as an argument, and convert it to bar delimited on the output. In addition, enclose fields within double quotes in case of any embedded delimiters. Any help would... (2 Replies)
Discussion started by: JPB1977
2 Replies

2. UNIX for Dummies Questions & Answers

Getting the folder name and file name after delimiter

Hi, I have a input /dev/cm/test1.txt /qa/tm/hmkr/cc/test2.txt and I need an out like below foldername, filename /dev/cm/,test1.txt /qa/tm/hmkr/cc/,test2.txt I tried with awk $NF, but I'm getting the filenames and not folder names. Please let me know how to achive the above... (5 Replies)
Discussion started by: somu_june
5 Replies

3. Shell Programming and Scripting

Extracting Delimiter 'TAG' Data From log files

Hi I am trying to extract data from within a log file and output format to a new file for further manipulation can someone provide script to do this? For example I have a file as below and just want to extract all delimited variances of tag 32=* up to the delimiter "|" and output to a new file... (2 Replies)
Discussion started by: Buddyluv
2 Replies

4. Shell Programming and Scripting

Shell script to put delimiter for a no delimiter variable length text file

Hi, I have a No Delimiter variable length text file with following schema - Column Name Data length Firstname 5 Lastname 5 age 3 phoneno1 10 phoneno2 10 phoneno3 10 sample data - ... (16 Replies)
Discussion started by: Gaurav Martha
16 Replies

5. UNIX for Advanced & Expert Users

File Delimiter

Hi All, I woul like to know with out opening a file in unix ,how we can find out what is the delemeter in that file... Thanks.. edit by bakunin: changed thread title to "delimiter" so it can be found. (4 Replies)
Discussion started by: raju4u
4 Replies

6. Shell Programming and Scripting

How to cut by delimiter, and delimiter can be anything except numbers?

Hi all, I have a number of strings like below: //mnt/autocor/43°13'(33")W/ and i'm trying to get the numbers in this string, for example 431333 please help thanks ahead (14 Replies)
Discussion started by: sunnydanniel
14 Replies

7. Shell Programming and Scripting

Delimiter in output file

Hello, I am trying to find the record count in a specific folder, Here is the part of the code =========================== STARTDATE=`date +"%y%m%d%H%M"` for i in `ls *.DAT` do wc -l $i >> /XYZ/SrcFiles/"Record_counts"$STARTDATE.csv ... (2 Replies)
Discussion started by: Shanks
2 Replies

8. UNIX for Dummies Questions & Answers

How to change delimiter in my file ?

Hi I have a file in which delimiter is ';' However if the delimiter is within "" it is a part of the string and not delimiter. How to get the fields ? I want to replace the delimiter ';' to '|'. The file contains data like this : 11111; “2222 2222”; “3333; 3333”; “4444 ""44444” The file... (2 Replies)
Discussion started by: dashing201
2 Replies

9. Shell Programming and Scripting

need help extracting values from string separated by a delimiter

hi guys, basically what i'm trying to do is fetching a set of columns from an oracle database like so... my_row=`sqlplus -s user/pwd << EOF set head off select user_id, username from all_users where rownum = 1; EOF` echo $my_row the code above returns... 1 ADSHOCKER so then i... (3 Replies)
Discussion started by: adshocker
3 Replies

10. Shell Programming and Scripting

Substring based on delimiter, finding last delimiter

Hi, I have a string like ABC.123.XYZ-A1-B2-P1-C4. I want to delimit the string based on "-" and then get result as only two strings. One with string till last hyphen and other with value after last hyphen... For this case, it would be something like first string as "ABC.123.XYZ-A1-B2-P1" and... (6 Replies)
Discussion started by: gupt_ash
6 Replies
Login or Register to Ask a Question