Shell script to put delimiter for a no delimiter variable length text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shell script to put delimiter for a no delimiter variable length text file
# 8  
Old 01-28-2013
Quote:
Originally Posted by rdrtx1
Code:
 
awk '
NR==FNR {if (NF==2) field[++fc]=$2 ; next}                                      # read column widths from schema file (second field)
{ el=$0; cc=1; for (i=1; i<=fc; i++) {$i=substr(el,cc,field[i]); cc+=field[i]}} # split line based on columnt widths
1                                                                               # print line (1 = true = print line)
' OFS=, schema_file data_file                                                   # use , as field separator, specify input files

Thanks Rdrtx1 for your efforts Smilie I am newbie on Unix.. Basically a Datawarehouse guy.. So again thanks..
BTW do we require to create specific schema file in this scenario.
# 9  
Old 01-28-2013
In this example solution yes. I think the schema file is easily maintainable if it needs to be updated.

Last edited by rdrtx1; 01-28-2013 at 04:02 PM..
This User Gave Thanks to rdrtx1 For This Post:
# 10  
Old 01-28-2013
Okay..it would be really great if you can please share sample schema file for the example that taken earlier.. "pardon me if i'm asking too much" Smilie

---------- Post updated at 02:17 PM ---------- Previous update was at 02:10 PM ----------

Quote:
Originally Posted by Gaurav Martha
Okay..it would be really great if you can please share sample schema file for the example that taken earlier.. "pardon me if i'm asking too much" Smilie
One more thing would require your input... Will the script that you mentioned be able to put comma (delimiter) for field which for which no data is there example -
Code:
Sunilbassi031012345678901234567890123456789
Sunilbassi03101234567890123456789
Sunilbassi0310123456789

That is for 2nd and 3rd record.. Will all of them will be having 5 commas(delimiter)

Last edited by Scott; 01-30-2013 at 06:12 AM.. Reason: Code tags
# 11  
Old 01-28-2013
for the example the output will be like:
Code:
sunil,bassi,031,0123456789,0123456789,0123456789
yyyil,bassi,031,0123456789,0123456789,
xxxil,bassi,031,0123456789,,

This User Gave Thanks to rdrtx1 For This Post:
# 12  
Old 01-28-2013
printf can take max length aruments. But it feels like a perl or C/C++/JAVA level task. But bash has substring: ${parameterSmilieffset:length}, so just read the lengths into two arrays: length and offset, and go through them taking apart every line.
# 13  
Old 01-29-2013
Quote:
Originally Posted by rdrtx1
for the example the output will be like:
Code:
sunil,bassi,031,0123456789,0123456789,0123456789
yyyil,bassi,031,0123456789,0123456789,
xxxil,bassi,031,0123456789,,

Thanks Rdrtx1, could you please help me by providing the schema file(Or any sample for reference) for the mentioned example.
# 14  
Old 01-29-2013
The inner loop of any solution is a for every line for n = 0..max print substring of line offset[n] length[n]; if n = max then print lsep, break else print fsep continue. Make sure your solution is stable for short lines (truncates any partial field, missing fields all go blank). Setup will be to take in the field length file and set offset[0..n] and length [0..n] using length. It can be argued that recalculating offset on the fly might be faster than fetching it from an array. Try both and see, if speed is still an issue. I would write it in C as a state machine copying one byte at a time c = getchar(); . . . putchar( c ) and inserting field separators as lengths are exhausted. That is about as fast as it gets.

Last edited by DGPickett; 01-29-2013 at 11:47 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell script to Split matrix file with delimiter into multiple files

I have a large semicolon delimited file with thousands of columns and many thousands of line. It looks like: ID1;ID2;ID3;ID4;A_1;B_1;C_1;A_2;B_2;C_2;A_3;B_3;C_3 AA;ax;ay;az;01;02;03;04;05;06;07;08;09 BB;bx;by;bz;03;05;33;44;15;26;27;08;09 I want to split this table in to multiple files: ... (1 Reply)
Discussion started by: trymega
1 Replies

2. Shell Programming and Scripting

How to target certain delimiter to split text file?

Hi, all. I have an input file. I would like to generate 3 types of output files. Input: LG10_PM_map_19_LEnd_1000560 LG10_PM_map_6-1_27101856 LG10_PM_map_71_REnd_20597718 LG12_PM_map_5_chr_118419232 LG13_PM_map_121_24341052 LG14_PM_1a_456799 LG1_MM_scf_5a_opt_abc_9029993 ... (5 Replies)
Discussion started by: huiyee1
5 Replies

3. Shell Programming and Scripting

Perl Code to change file delimiter (passed as argument) to bar delimiter

Hi, Extremely new to Perl scripting, but need a quick fix without using TEXT::CSV I need to read in a file, pass any delimiter as an argument, and convert it to bar delimited on the output. In addition, enclose fields within double quotes in case of any embedded delimiters. Any help would... (2 Replies)
Discussion started by: JPB1977
2 Replies

4. Shell Programming and Scripting

Splitting records in a text file based on delimiter

A text file has 2 fields (Data, Filename) delimited by # as below, Data,Filename Row1 -> abc#Test1.xml Row2 -> xyz#Test2.xml Row3 -> ghi#Test3.xml The content in first field has to be written into a file where filename should be considered from second field. So from... (4 Replies)
Discussion started by: jayakkannan
4 Replies

5. Shell Programming and Scripting

Adding a delimiter to a variable length file

Hi, I'm new to unix, i have a variable length file like below, 01|Test|Test1|Sample| 02|AA|BB|CC|DD| 03|AAA|BBB|CCC|DDD|EEE|RRR|TTT|SSS|YYY| I need to make this as a fixed length file. Assume that i have 10 columns in the DAT file. for ex: the first 01 record is having 4cols -... (8 Replies)
Discussion started by: Mohankumar Venu
8 Replies

6. Shell Programming and Scripting

Substring based on delimiter, finding last delimiter

Hi, I have a string like ABC.123.XYZ-A1-B2-P1-C4. I want to delimit the string based on "-" and then get result as only two strings. One with string till last hyphen and other with value after last hyphen... For this case, it would be something like first string as "ABC.123.XYZ-A1-B2-P1" and... (6 Replies)
Discussion started by: gupt_ash
6 Replies

7. Shell Programming and Scripting

Adding a delimiter to a text file

Im writing a KSH script to read a simple text file and add a delimiter. Ive written the following script but it runs very slow. I initially used the cut command to substring the input record then switched to this version using awk to substring... both run too slow. Any ideas how to make this more... (2 Replies)
Discussion started by: lock
2 Replies

8. UNIX for Dummies Questions & Answers

extract fields from text file using delimiter!!

Hi All, I am new to unix scripting, please help me in solving this assignment.. I have a scenario, as follows: 1. i have a text file(read1.txt) with the following data sairam,123 kamal,122 etc.. 2. I have to write a unix... (6 Replies)
Discussion started by: G.K.K
6 Replies

9. Shell Programming and Scripting

Pivot variable record length file and change delimiter

Hi experts. I got a file (500mb max) and need to pivot it (loading into ORCL) and change BLANK delimiter to PIPE |. Sometimes there are multipel BLANKS (as a particular value may be BLANK, or simply two BLANKS instead of one BLANK). thanks for your input! Cheers, Layout... (3 Replies)
Discussion started by: thomasr
3 Replies

10. Shell Programming and Scripting

Formatting a text file based on newline and delimiter characters

Hi Everybody, I need some help on formatting the files coming into unix box on the fly. I get a file some thing like this in a single line. ISA^M00^M ^M00^M ^M14^M006929681900 ^M01^M095449419 ... (5 Replies)
Discussion started by: ntekupal
5 Replies
Login or Register to Ask a Question