Split File Based on Line Number Pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split File Based on Line Number Pattern
# 1  
Old 09-30-2008
Split File Based on Line Number Pattern

Hello all.

Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need.

My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to a third file, and then line 10 to a fourth file. I then want to repeat this condition using the same scenario, and the same four files above. Any thoughts on the best approach?
# 2  
Old 09-30-2008
Tools I got a start on this

But, I will need some awk help (or to think a little clearer after eating lunch)

Code:
> cat big_file4
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
d stuff to 4 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
d stuff to 4 file

What I initially wrote does not capture the file line of text - and where I think I need some HELP!
Code:
> cat -n big_file4 | awk '{printf "%1s %-15s \n", substr($1,length($1),1), $2}'
1 a               
2 b               
3 c               
4 a               
5 b               
6 c               
7 a               
8 b               
9 c               
0 d               
1 a               
2 b               
3 c               
4 a               
5 b               
6 c               
7 a               
8 b               
9 c               
0 d

Because from here, my theory is that

Code:
grep "^[147] " <infile >outfile_a
grep "^[258] " <infile >outfile_b
grep "^[369] " <infile >outfile_c
grep "^[0] " <infile >outfile_d

May need to cut before writing to each output.
# 3  
Old 09-30-2008
Perl or Python looping over a set of file handles would seem like the most i efficient approach. For a more pedestrian solution, an awk script run four times with different parameters might be acceptable even if the file is big.

Does file four only contain every tenth line, and then 11, 14, and 17 go to the first file again?

Code:
perl -MIO::File -ne 'BEGIN { map { $file[$_] = IO::File->new(">file$_") || die $!} 0..3; 
  @m = (0, 1, 2, 0, 1, 2, 0, 1, 2, 3);
}
$file[$m[$. % 9]]->print || die $!'

csplit has some fairly versatile options, you might be able to pull this off simply with a suitable csplit pattern as well.

Last edited by era; 09-30-2008 at 01:56 PM.. Reason: csplit note
# 4  
Old 09-30-2008
Yes, 11,14, and 17 would then go to the first file again.

I am trying to use KSH to complete this task. Below is what I have so far, but the count variable does not appear to be resetting to 1 after it reaches 11. Also, I am getting output similar to:

File_split_DC.sh[42]: 2: not found.
File_split_DC.sh[42]: 3: not found.
File_split_DC.sh[42]: 4: not found.

The name of my script is "File_split_DC.sh"

#!/usr/bin/ksh

count=1

while read line
do

case $count in
1)
echo "$line" >> RT1.txt
;;
2)
echo "$line" >> RT2.txt
;;
3)
echo "$line" >> RT3.txt
;;
4)
echo "$line" >> RT1.txt
;;
5)
echo "$line" >> RT2.txt
;;
6)
echo "$line" >> RT3.txt
;;
7)
echo "$line" >> RT1.txt
;;
8)
echo "$line" >> RT2.txt
;;
9)
echo "$line" >> RT3.txt
;;
10)
echo "$line" >> RT4.txt
;;
esac
(( count+=1 ))

if $count -gt 10; then
count=1

fi
done < My_Test.txt

exit 0
# 5  
Old 09-30-2008
You want

Code:
if [ $count -gt 10 ]; then

It would be more efficient to open four file descriptors and then just print to those descriptors; this approximates the Perl approach I suggested above.

Code:
exec 1>rt1.txt 2>rt2.txt 3>rt3.txt 4>rt4.txt
count=1
while read line; do
  case $count in
    1|4|7) print "$line" >&1;;
    2|5|8) print "$line" >&2;;
    3|6|9) print "$line" >&3;;
    10) print "$line" >&4; count=0;;
  esac
  count=`expr $count + 1`
done <My_Test.txt

Note the use of print rather than echo -- this is ksh-specific, but other than that, this script should be portable.

Last edited by era; 09-30-2008 at 02:15 PM.. Reason: Note print vs echo
# 6  
Old 09-30-2008
Hammer & Screwdriver what about this?

Code:
> cat -n big_file4 | awk '{printf "%1s %-100s \n", substr($1,length($1),1), $0}' | cut -c1,10- | grep "^[147]" | cut -c2- >filea
> cat -n big_file4 | awk '{printf "%1s %-100s \n", substr($1,length($1),1), $0}' | cut -c1,10- | grep "^[258]" | cut -c2- >fileb
> cat -n big_file4 | awk '{printf "%1s %-100s \n", substr($1,length($1),1), $0}' | cut -c1,10- | grep "^[369]" | cut -c2- >filec
> cat -n big_file4 | awk '{printf "%1s %-100s \n", substr($1,length($1),1), $0}' | cut -c1,10- | grep "^[0]" | cut -c2- >filed

Code:
> cat big_file4
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
d stuff to 4 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
a stuff to 1 file
b stuff to 2 file
c stuff to 3 file
d stuff to 4 file

and now the four separated files
Code:
> cat filea
a stuff to 1 file                                                                             
a stuff to 1 file                                                                             
a stuff to 1 file                                                                             
a stuff to 1 file                                                                             
a stuff to 1 file                                                                             
a stuff to 1 file                                                                             
> cat fileb
b stuff to 2 file                                                                             
b stuff to 2 file                                                                             
b stuff to 2 file                                                                             
b stuff to 2 file                                                                             
b stuff to 2 file                                                                             
b stuff to 2 file                                                                             
> cat filec
c stuff to 3 file                                                                             
c stuff to 3 file                                                                             
c stuff to 3 file                                                                             
c stuff to 3 file                                                                             
c stuff to 3 file                                                                             
c stuff to 3 file                                                                             
> cat filed
d stuff to 4 file                                                                             
d stuff to 4 file                                                                             
>

# 7  
Old 09-30-2008
Thanks to both of you for your input. I really don't know what I'm doing when it comes to UNIX, so I just try to piece tidbits together. I ended up using ERA's approach in the second posting. It was similar to what I had already put together, and made sense. JOEYG, I'm sure your appraoch would work as well, and I appreciate your input.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split File based on number of rows

Hi I have a requirement, where i will receive multiple files in a folder (say: /fol1/fol2/). There will be at least 14 to 16 files. The size of the files will different, some may be 80GB or 90GB, some may be less than 5 GB (and the size of the files are very unpredictable). But the names of the... (10 Replies)
Discussion started by: kpk_ds
10 Replies

2. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies

3. Shell Programming and Scripting

Split a text file into multiple pages based on pattern

Hi, I have a text file (attached the sample). I have also, attached the way the way the files need to be split. We get this file, that will either have 24 Jurisdictions, or will miss some and retain some. Like in the attached sample file, there are only Jurisdictions 03,11,14,15, 20 and 30.... (3 Replies)
Discussion started by: ebsus
3 Replies

4. UNIX for Dummies Questions & Answers

Split file based on number of blank lines

Hello All , I have a file which needs to split based on the blank lines Name ABC Address London Age 32 (4 blank new line) Name DEF Address London Age 30 (4 blank new line) Name DEF Address London (8 Replies)
Discussion started by: Pratik4891
8 Replies

5. Shell Programming and Scripting

How to split a file based on pattern line number?

Hi i have requirement like below M <form_name> sdasadasdMklkM D ...... D ..... M form_name> sdasadasdMklkM D ...... D ..... D ...... D ..... M form_name> sdasadasdMklkM D ...... M form_name> sdasadasdMklkM i want split file based on line number by finding... (10 Replies)
Discussion started by: bhaskar v
10 Replies

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

7. Shell Programming and Scripting

Split the file based on pattern

Hi , I have huge files around 400 mb, which has clob data and have diffeent scenarios: I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria. Scenario 1: file name : scenario_1.txt ... (2 Replies)
Discussion started by: sol_nov
2 Replies

8. Shell Programming and Scripting

Split a file based on pattern and size

Hello, I have a large file (2GB) that I would like to split based on pattern and size. I've used the following command to split the file (token is "HELLO") awk '/HELLO/{i++}{print > "file"i}' input.txt and the output is similar to the following (i included filesize in KB): 10 ... (2 Replies)
Discussion started by: jl487
2 Replies

9. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

10. Shell Programming and Scripting

Split a file based on a pattern

Dear all, I have a large file which is composed of 8000 frames, what i would like to do is split the file into 8000 single files names file.pdb.1, file.pdb.2 etc etc each frame in the large file is seperated by a "ENDMDL" flag so my thinking is to use this flag a a point to split the files... (4 Replies)
Discussion started by: Mish_99
4 Replies
Login or Register to Ask a Question