The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
how to convert Fixed length file to delimited file. satyam_sat Shell Programming and Scripting 7 04-03-2008 03:41 AM
Converting a Delimited File to Fixed width file raghavan.aero Shell Programming and Scripting 2 06-06-2007 03:44 PM
convert XML file into Text file(fixed length) ram2s2001 Shell Programming and Scripting 0 11-03-2005 01:28 AM
Convert delimited to fixed length nelson553011 Shell Programming and Scripting 14 10-27-2005 05:04 PM
how can change udp lenght? Vvlad IP Networking 4 08-14-2003 08:37 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 05-29-2007
kumarsaravana_s kumarsaravana_s is offline
Registered User
  
 

Join Date: Feb 2007
Location: Bangalore
Posts: 105
Can we convert a '|' file into a fixed lenght???

Hi All,

I have a pipe separated flat file.But there is often some problem with the records.So is it possible to convert the '|' separated file into a fixed length file by means of some script.

The file has 11 columns which means 10 pipes.Your help is appreciated.

i'm using Sun OS Version 5.10

Thank you,
Kumar
  #2 (permalink)  
Old 05-29-2007
aigles's Avatar
aigles aigles is online now Forum Advisor  
Registered User
  
 

Join Date: Apr 2004
Location: Bordeaux, France
Posts: 1,429
You can do something like that :

Code:
awk '
   BEGIN {
      fields_count = split("5,5,5,5,5,5,5,5,5,5,5", fsize, ",");
      FS  = "|"
      OFS = "";
   }
   function cnv_field(fld   ) {
      if (length($fld) > fsize[fld]) {
         printf("Line %d, field %d is too long (%d > %d)\n", NR, fld, length($fld), fsize[fld]) | "cat >&2";
         status = 1;
      }
      $fld = sprintf("%-5.5s", $fld);
   }
   {
      if (NF != fields_count) { 
         printf("Line %d, fields count is invalid (%d != %d)\n", NR, NF, fields_count) | "cat >&2";
         status = 1;
      }
      for (f=1; f<=NF; f++) cnv_field(f);
      print;
   }
   END {
      exit status;
   }
    '  $1 > $2

The length of each field is specified in the fields_count assignment. In my code all the fields are 5 characters.
In the output, the field separator is set to "" but you can modify it. For example if you want a space modify the OFS assignment :
OFS = " "
Example (assume script file is convert.sh) :

Code:
$ cat input_file
111|22|333|444|555||77|888|9999|000|1
aa|bbbbbb|cc|dd|rr|ff|ggggggg|hh|ii|jjj|hhh
xxx|yyy|zzz
$ convert.sh input_file output_file
Line 2, field 2 is too long (6 > 5)
Line 2, field 7 is too long (7 > 5)
Line 3, fields count is invalid (3 != 11)
$ echo $?
1
$ cat output_file
111  22   333  444  555       77   888  9999 000  1    
aa   bbbbbcc   dd   rr   ff   ggggghh   ii   jjj  hhh  
xxx  yyy  zzz  
$

Jean-Pierre.

Last edited by aigles; 05-30-2007 at 05:26 AM.. Reason: add infos abour output field separator
  #3 (permalink)  
Old 05-30-2007
bakunin bakunin is offline Forum Staff  
Bughunter Extraordinaire
  
 

Join Date: May 2005
Location: In the leftmost byte of /dev/kmem
Posts: 1,629
As the "fields" in your file are separated by a constant char ("|") use cut to separate them, then print the lines via printf (i assume Kornshell here, use 'echo' instead of 'print' if you are using something else):


Code:
cat infile | while read line ; do
     # split each input line to fields and catch these in variables
     field1="$(print - "$line" | cut -d'|' -f1)"
     field2="$(print - "$line" | cut -d'|' -f2)"
     field3="$(print - "$line" | cut -d'|' -f3)"
     .....
     
     # after you are done with the line print it out again
     # i assume here that the first column should be 20 chars wide, the next
     # two 15, and so on. see the second example below.
     printf '%20s %15s %15s [...]\n' "$field1" "field2" "$field3" [...] >> outfile
done

This is using (a fixed number of) fixed-width columns and you have to know the widths in advance. It is possible to create dynamically formatted columns but you will have to read the infile two times:



Code:
maxlength1=0
maxlength2=0
....
cat infile | while read line ; do
     # in the first run we split and get the max width for each column
     field1="$(print - "$line" | cut -d'|' -f1)"
     length1=$(print - "$field1" | wc -c)
     if [ $length1 -gt $maxlength1 ] ; then
          maxlength1=$length1
     fi
     field2="$(print - "$line" | cut -d'|' -f2)"
     length2=$(print - "$field2" | wc -c)
     if [ $length2 -gt $maxlength2 ] ; then
          maxlength2=$length1
     fi
     .....
done

# put together the output template for printf
template='%'"$maxlength1"'s   %"'$maxlength2"'s [.....]\n'
   
cat infile | while read line ; do
     # in the second run we split again and print using the found widths
     field1="$(print - "$line" | cut -d'|' -f1)"
     field2="$(print - "$line" | cut -d'|' -f2)"
     ....
     printf "$template" "$field1" "field2" "$field3" [...] >> outfile
done

I'd suggest you use (dynamical) arrays instead the numbered variables to make the script able to deal with a variable number of fields in the input file as a further enhancement. The column separator could then be provided as a parameter making the script as widely usable as possible.

bakunin
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 09:45 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0