How to find the Delimiter?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to find the Delimiter?
# 1  
Old 01-05-2011
CPU & Memory How to find the Delimiter?

Hi All,

Is there any method we can use to find what is the delimiter used in a text file, asuming the files has fixed number of colomns.

Thanks in advance.
Js
# 2  
Old 01-05-2011
Delimiters are chosen because they have a property: they are not usually part of the data.
Spaces can delimit columns of numbers, but not columns of addresses because addresses may have spaces occur in them and you would get more columns than really exist in the data. This is kind of a hard problem the way you asked it.

So your question needs some help. Generally you get files and as a programmer you are given the file format, or for example a file.csv uses commas to separate fields, with " " around text that may contain a comma.

What problem are you trying to solve? Not how you think it should be solved.

This code tells you if certain commonly used delimters exist. BUT it does not mean that they are really the delimiter. This assumes your unix supports grep -q
Code:
tmp=$(head -1 filename)
echo "$tmp" | grep -q '[:space:]' && echo 'has white space: tab or spaces' 
echo "$tmp" | grep -q ',' && echo 'has commas'
echo "$tmp" | grep -q '|' && echo 'has pipe symbols'

# 3  
Old 01-05-2011
Try:
Code:
perl -F"//" -nlae 'for (@F) {$a{$_}++};for (keys %a){print $_ if ($a{$_}+1==5 && !($_=~/\w/))};undef %a' file

Change red number for the number of columns in file. It will print non-word characters that are possible delimiters for every row. When you see the same character in every line, then it will probably be your delimiter.
# 4  
Old 01-05-2011
Code:
### Open the file
open (FIN, "INPUT_FILE_PATH") || die "Cannot Open the input file : $!";
@file=<FIN>;
close (FIN);
### Take the first line only.
### which string found maximum time
### that considered the delimit
for ($file[0] =~ m{[^\w\n\.]+}g) # ignore the word character, '.' and enter mark
{
 # Get the maximum occurrence
 if ($max<$delim{$&}++)
 {
  $destr=$&;    # assign the delimiter string to a variable
  $max=$delim{$&}; # reset the maximum number
 }
}
# print the delimiter string
print "\n--$destr--";

# 5  
Old 01-05-2011
I am trying to develop a scriptwhich takes 2 arguments and one option
Example

./1.pl <File1> <outputfile> -s ","

Here file1 is a text file withsome data (Say extracted from DB).
I do know know what this file contains or waht is the delimiter it has. But i am assuming the delimiter is coma or pipe(|) or single space or tab (4 spaces).

-s is the option indicating that "," is the delimiter of the file <File1>
-s can be "," or "|" or " " or "\t" for coma, pipe, space, tab respectively.

While reading File1 each line, I am using a split function passing the -s option.

My Problem here is
./1.pl <File1> <outputfile> -s "," ==> Works fine if File1 is coma separated

But what if File1 is pipe separated and i am specifying the -s option as coma.
Then too it works but i dont get teh expected result.

I hope the question is clear or am i confusing Smilie
# 6  
Old 01-05-2011
Apply the above logic in your program. The '$destr' gives the delimiter string...
# 7  
Old 01-06-2011
The code satisfies my requirement .. Thank you ..

Could you please explain me what the below code do?

if ($max<$delim{$&}++)
{
$destr=$&; # assign the delimiter string to a variable
$max=$delim{$&}; # reset the maximum number
}

I havenot used $& so far. And in the for loop you have given only $file[0] which means it reads only the first line of the file. Please correct me If i am wrong.

---------- Post updated at 12:24 PM ---------- Previous update was at 12:09 PM ----------

Quote:
Originally Posted by bartus11
Try:
Code:
perl -F"//" -nlae 'for (@F) {$a{$_}++};for (keys %a){print $_ if ($a{$_}+1==5 && !($_=~/\w/))};undef %a' file

Change red number for the number of columns in file. It will print non-word characters that are possible delimiters for every row. When you see the same character in every line, then it will probably be your delimiter.

Thank you so much
This gives me a long list and takes the "." also. I shoud tune it ..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to identify delimiter to find and replace a string with sed?

I need to find and replace a date format in a SQL script with sed. The original lines are like this: ep.begin_date, ep.end_date, ep.facility_code, AND ep.begin_date <= '01-JUL-2019' ep.begin_date, ep.end_date, ep.facility_code, AND ... (15 Replies)
Discussion started by: duke0001
15 Replies

2. Shell Programming and Scripting

Perl Code to change file delimiter (passed as argument) to bar delimiter

Hi, Extremely new to Perl scripting, but need a quick fix without using TEXT::CSV I need to read in a file, pass any delimiter as an argument, and convert it to bar delimited on the output. In addition, enclose fields within double quotes in case of any embedded delimiters. Any help would... (2 Replies)
Discussion started by: JPB1977
2 Replies

3. Shell Programming and Scripting

Shell script to put delimiter for a no delimiter variable length text file

Hi, I have a No Delimiter variable length text file with following schema - Column Name Data length Firstname 5 Lastname 5 age 3 phoneno1 10 phoneno2 10 phoneno3 10 sample data - ... (16 Replies)
Discussion started by: Gaurav Martha
16 Replies

4. Shell Programming and Scripting

how to get everything before the last delimiter?

hi all, i have a string with a number of "/"s as delimiter. and i want everything BEFORE the last delimiter i know to use basename to get everything after the last delimiter. thx a lot! (2 Replies)
Discussion started by: sunnydanniel
2 Replies

5. Shell Programming and Scripting

How to cut by delimiter, and delimiter can be anything except numbers?

Hi all, I have a number of strings like below: //mnt/autocor/43°13'(33")W/ and i'm trying to get the numbers in this string, for example 431333 please help thanks ahead (14 Replies)
Discussion started by: sunnydanniel
14 Replies

6. UNIX for Dummies Questions & Answers

Find delimiter and double quote the field

Hi I have a asterisk (*) delimited file and there are some fields which contain data having asterisk , now i want to double quote the fileds which contain data with asterisk Ex: input file ID*NAME*EMAIL 1*BILL*BILL@AOL.com 2*J*OY*JOY@msn.com in the 2nd record JOY has a asterisk value in... (11 Replies)
Discussion started by: halmstad
11 Replies

7. Shell Programming and Scripting

how to find the nth field value in delimiter file in unix using awk

Hi All, I wanted to find 200th field value in delimiter file using awk.? awk '{print $200}' inputfile I am getting error message :- awk: The field 200 must be in the range 0 to 199. The source line number is 1. The error context is {print >>> $200 <<< } using... (4 Replies)
Discussion started by: Jairaj
4 Replies

8. Shell Programming and Scripting

Find Word within ^A delimiter

I have a file in which the following pattern is there TAG001^A<value>^A I want to find all such values(words) which comes right next to "TAG001^A" and before the next "^A". ^A is the delimiter here. Please help! Note: I think ^A in unix resolves to \001 as delimiter (7 Replies)
Discussion started by: royzlife
7 Replies

9. Shell Programming and Scripting

Substring based on delimiter, finding last delimiter

Hi, I have a string like ABC.123.XYZ-A1-B2-P1-C4. I want to delimit the string based on "-" and then get result as only two strings. One with string till last hyphen and other with value after last hyphen... For this case, it would be something like first string as "ABC.123.XYZ-A1-B2-P1" and... (6 Replies)
Discussion started by: gupt_ash
6 Replies

10. Shell Programming and Scripting

How to find last delimiter in line?

I am working in a ksh script. I am reading a login, password, and database name from a pre-existing config file. Login and password are simple, I take the value after the first "=" sign, but the dbname has multiple equal signs in it. I have it working by temporarily reading the 23rd field, but... (4 Replies)
Discussion started by: prismtx
4 Replies
Login or Register to Ask a Question