The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
split based on the number of characters chriss_58 Shell Programming and Scripting 6 07-06-2008 10:05 AM
Split a file based on pattern in awk, grep, sed or perl kumarn Shell Programming and Scripting 5 06-20-2008 10:51 AM
Split a file with no pattern -- Split, Csplit, Awk madhunk UNIX for Dummies Questions & Answers 10 12-17-2007 12:57 PM
extracting a line based on line number narendra.pant Shell Programming and Scripting 2 09-20-2007 05:00 AM
awk script to split a file based on the condition superprogrammer Shell Programming and Scripting 12 06-14-2005 03:59 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rating: Thread Rating: 2 votes, 4.50 average. Display Modes
  #1 (permalink)  
Old 09-30-2008
shankster shankster is offline
Registered User
  
 

Join Date: Sep 2008
Posts: 3
Split File Based on Line Number Pattern

Hello all.

Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need.

My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to a third file, and then line 10 to a fourth file. I then want to repeat this condition using the same scenario, and the same four files above. Any thoughts on the best approach?
  #2 (permalink)  
Old 09-30-2008
joeyg's Avatar
joeyg joeyg is offline Forum Staff  
modérateur
  
 

Join Date: Dec 2007
Location: Home of 17-time world champion Boston Celtics
Posts: 1,311
Cool I got a start on this

But, I will need some awk help (or to think a little clearer after eating lunch)

Code:
> cat big_file4
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
d stuff to 4 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
d stuff to 4 file
What I initially wrote does not capture the file line of text - and where I think I need some HELP!
Code:
> cat -n big_file4 | awk '{printf "%1s %-15s \n", substr($1,length($1),1), $2}'
1 a               
2 b               
3 c               
4 a               
5 b               
6 c               
7 a               
8 b               
9 c               
0 d               
1 a               
2 b               
3 c               
4 a               
5 b               
6 c               
7 a               
8 b               
9 c               
0 d
Because from here, my theory is that

Code:
grep "^[147] " <infile >outfile_a
grep "^[258] " <infile >outfile_b
grep "^[369] " <infile >outfile_c
grep "^[0] " <infile >outfile_d
May need to cut before writing to each output.
  #3 (permalink)  
Old 09-30-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
Perl or Python looping over a set of file handles would seem like the most i efficient approach. For a more pedestrian solution, an awk script run four times with different parameters might be acceptable even if the file is big.

Does file four only contain every tenth line, and then 11, 14, and 17 go to the first file again?

Code:
perl -MIO::File -ne 'BEGIN { map { $file[$_] = IO::File->new(">file$_") || die $!} 0..3; 
  @m = (0, 1, 2, 0, 1, 2, 0, 1, 2, 3);
}
$file[$m[$. % 9]]->print || die $!'
csplit has some fairly versatile options, you might be able to pull this off simply with a suitable csplit pattern as well.

Last edited by era; 09-30-2008 at 12:56 PM.. Reason: csplit note
  #4 (permalink)  
Old 09-30-2008
shankster shankster is offline
Registered User
  
 

Join Date: Sep 2008
Posts: 3
Yes, 11,14, and 17 would then go to the first file again.

I am trying to use KSH to complete this task. Below is what I have so far, but the count variable does not appear to be resetting to 1 after it reaches 11. Also, I am getting output similar to:

File_split_DC.sh[42]: 2: not found.
File_split_DC.sh[42]: 3: not found.
File_split_DC.sh[42]: 4: not found.

The name of my script is "File_split_DC.sh"

#!/usr/bin/ksh

count=1

while read line
do

case $count in
1)
echo "$line" >> RT1.txt
;;
2)
echo "$line" >> RT2.txt
;;
3)
echo "$line" >> RT3.txt
;;
4)
echo "$line" >> RT1.txt
;;
5)
echo "$line" >> RT2.txt
;;
6)
echo "$line" >> RT3.txt
;;
7)
echo "$line" >> RT1.txt
;;
8)
echo "$line" >> RT2.txt
;;
9)
echo "$line" >> RT3.txt
;;
10)
echo "$line" >> RT4.txt
;;
esac
(( count+=1 ))

if $count -gt 10; then
count=1

fi
done < My_Test.txt

exit 0
  #5 (permalink)  
Old 09-30-2008
joeyg's Avatar
joeyg joeyg is offline Forum Staff  
modérateur
  
 

Join Date: Dec 2007
Location: Home of 17-time world champion Boston Celtics
Posts: 1,311
Wink what about this?

Code:
> cat -n big_file4 | awk '{printf "%1s %-100s \n", substr($1,length($1),1), $0}' | cut -c1,10- | grep "^[147]" | cut -c2- >filea
> cat -n big_file4 | awk '{printf "%1s %-100s \n", substr($1,length($1),1), $0}' | cut -c1,10- | grep "^[258]" | cut -c2- >fileb
> cat -n big_file4 | awk '{printf "%1s %-100s \n", substr($1,length($1),1), $0}' | cut -c1,10- | grep "^[369]" | cut -c2- >filec
> cat -n big_file4 | awk '{printf "%1s %-100s \n", substr($1,length($1),1), $0}' | cut -c1,10- | grep "^[0]" | cut -c2- >filed
Code:
> cat big_file4
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
d stuff to 4 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
d stuff to 4 file
and now the four separated files
Code:
> cat filea
a stuff to 1 file                                                                             
a stuff to 1 file                                                                             
a stuff to 1 file                                                                             
a stuff to 1 file                                                                             
a stuff to 1 file                                                                             
a stuff to 1 file                                                                             
> cat fileb
b stuff to 2 file                                                                             
b stuff to 2 file                                                                             
b stuff to 2 file                                                                             
b stuff to 2 file                                                                             
b stuff to 2 file                                                                             
b stuff to 2 file                                                                             
> cat filec
c stuff to 3 file                                                                             
c stuff to 3 file                                                                             
c stuff to 3 file                                                                             
c stuff to 3 file                                                                             
c stuff to 3 file                                                                             
c stuff to 3 file                                                                             
> cat filed
d stuff to 4 file                                                                             
d stuff to 4 file                                                                             
>
  #6 (permalink)  
Old 09-30-2008
era era is offline Forum Advisor  
Herder of Useless Cats (On Sabbatical)
  
 

Join Date: Mar 2008
Location: /there/is/only/bin/sh
Posts: 3,652
You want

Code:
if [ $count -gt 10 ]; then
It would be more efficient to open four file descriptors and then just print to those descriptors; this approximates the Perl approach I suggested above.

Code:
exec 1>rt1.txt 2>rt2.txt 3>rt3.txt 4>rt4.txt
count=1
while read line; do
  case $count in
    1|4|7) print "$line" >&1;;
    2|5|8) print "$line" >&2;;
    3|6|9) print "$line" >&3;;
    10) print "$line" >&4; count=0;;
  esac
  count=`expr $count + 1`
done <My_Test.txt
Note the use of print rather than echo -- this is ksh-specific, but other than that, this script should be portable.

Last edited by era; 09-30-2008 at 01:15 PM.. Reason: Note print vs echo
  #7 (permalink)  
Old 09-30-2008
shankster shankster is offline
Registered User
  
 

Join Date: Sep 2008
Posts: 3
Thanks to both of you for your input. I really don't know what I'm doing when it comes to UNIX, so I just try to piece tidbits together. I ended up using ERA's approach in the second posting. It was similar to what I had already put together, and made sense. JOEYG, I'm sure your appraoch would work as well, and I appreciate your input.
Closed Thread

Bookmarks

Tags
split by line number, split to files

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 10:25 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0