The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to split the String based on condition? sankar reddy Shell Programming and Scripting 2 03-19-2008 07:48 AM
Formatting a text file based on newline and delimiter characters ntekupal Shell Programming and Scripting 5 05-11-2007 03:33 PM
awk split characters knc9233 Shell Programming and Scripting 1 02-19-2007 09:07 PM
Split a huge line into multiple 120 characters lines with sed? jerome_1664 Shell Programming and Scripting 2 08-17-2006 12:03 PM
awk script to split a file based on the condition superprogrammer Shell Programming and Scripting 12 06-14-2005 03:59 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 06-12-2008
chriss_58 chriss_58 is offline
Registered User
  
 

Join Date: May 2008
Posts: 41
split based on the number of characters

Hello,

if i have file like this:
010000890306932455804 05306977653873 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC30693599000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011302942311 010000890306946317387 05306977313623 0520080417010520ISMS SMT ZZZZZZZZZZZZZOC306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202010300391748 010000890306945153336 05306977918990 0520080417010521ISMS SMT ZZZZZZZZZZZZZOC306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011304607230 010000890306948068406 05306977404213 0520080417010523ISMS SMT ZZZZZZZZZZZZZOC306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202010000717971 010000890306998573372

How can i perform a split based on the number of characters?
Foa example i want in array[0] to be stored the 70 first characters of the file and in array[1] the next 70 charactets etc...

How can i do this?
  #2 (permalink)  
Old 06-12-2008
jaduks's Avatar
jaduks jaduks is offline
Registered User
  
 

Join Date: Aug 2007
Location: Assam,India
Posts: 166
A 70 character split can be done as:

Code:
sed -e 's/.\{69\}/&\n/g' <file>
//Jadu

Last edited by jaduks; 06-12-2008 at 11:11 AM.. Reason: Remove " with '
  #3 (permalink)  
Old 06-12-2008
chriss_58 chriss_58 is offline
Registered User
  
 

Join Date: May 2008
Posts: 41
Thank you...

What about if i want to perform the split in perl always using 'size' as a limit
  #4 (permalink)  
Old 07-05-2008
AndrewTheArt AndrewTheArt is offline
Registered User
  
 

Join Date: Jul 2008
Posts: 1
I know this isn't exactly what you wanted, but this might come in handy -

Code:
split -b 60 filename.txt
Would split a file into multiple 60 byte (character) text files.

(It returns files in the format of xaa, xab, xac, xad, etc, each file having the specified number of bytes)

Last edited by AndrewTheArt; 07-05-2008 at 09:26 PM..
  #5 (permalink)  
Old 07-05-2008
Vi-Curious Vi-Curious is offline
Registered User
  
 

Join Date: Jul 2008
Location: Texas
Posts: 129
Quote:
Originally Posted by chriss_58 View Post
Thank you...

What about if i want to perform the split in perl always using 'size' as a limit
Here is a small sample to give you an idea.

Code:
 
#!/usr/bin/perl
$teststring = "1234567890abcdefghij0987654321ABCDEFGHIJlmnop";
@chunks = split /(.{10})/, $teststring;
foreach (@chunks) {
  printf "%s\n", $_;
}
In this case, I'm using 10 characters as the size of the pieces to extract. The pattern used with split is for the delimiter/separator. Here, we say match any 10 characters as the separator. If it matches every 10 characters as a separator, then it is returning null strings for the split fields. Normally, the separator is not returned but we want the separator because these will be the actual values of interest. The parentheses that are included in the pattern tell perl to also return the separators.

If you execute this, you get the following:
Hostname:> testscript3.sh

1234567890

abcdefghij

0987654321

ABCDEFGHIJ
lmnop
Hostname:>

This is because you have null strings interspersed with the separators. There is no null string before the last 5-character substring because we did not have a full 10 characters to match.

I'll leave it for you as an exercise to remove the null strings or otherwise decide how you will skip/ignore them. How exactly you end up incorporating this into your code will also be dependent on your data file. From your description, I could not tell if records spanned lines or not.
  #6 (permalink)  
Old 07-06-2008
drl's Avatar
drl drl is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2007
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 704
Hi.

Here is another method:
Code:
#!/usr/bin/perl

# @(#) p2       Demonstrate perl unpack to break apart a long line.

use warnings;
use strict;

my ($debug);
$debug = 0;
$debug = 1;

my ( @a, $i, $nc, $nv );
my ($lines) = 0;

while (<>) {
  $lines++;
  chomp;
  @a  = unpack( "(a70)*", $_ );
  $nc = length($_);
  $nv = scalar(@a);
  print " Unpacked $nv strings from line $lines (length $nc characters)\n";
  for ( $i = 0; $i < $nv; $i++ ) {
    print "$i: $a[$i]\n";
  }
}

print STDERR " ( Lines read: $lines )\n";

exit(0);
Producing:
Code:
% ./p2 data1
 Unpacked 9 strings from line 1 (length 572 characters)
0: 010000890306932455804 05306977653873 0520080417010520ISMS SMT ZZZZZZZZ
1: ZZZZZOC30693599000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011302942311 010
2: 000890306946317387 05306977313623 0520080417010520ISMS SMT ZZZZZZZZZZZ
3: ZZOC306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202010300391748 01000
4: 0890306945153336 05306977918990 0520080417010521ISMS SMT ZZZZZZZZZZZZZ
5: OC306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202011304607230 0100008
6: 90306948068406 05306977404213 0520080417010523ISMS SMT ZZZZZZZZZZZZZOC
7: 306942190000 30971360000 ZZZZZZZZZZZZZZZZZZZZ202010000717971 010000890
8: 306998573372
 ( Lines read: 1 )
For your data in file data1:
Code:
% wc data1
  1  29 573 data1
Eliminating the newline, 70 * 8 -> 560, + 12 => 572 ... cheers, drl
  #7 (permalink)  
Old 07-06-2008
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
addict
  
 

Join Date: Jan 2007
Location: Варна, България / Milano, Italia
Posts: 2,847
Using fold:

Code:
fold -b70 input
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 05:32 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0