Sponsored Content
Top Forums Shell Programming and Scripting Split File Based on Line Number Pattern Post 302242171 by radoulov on Wednesday 1st of October 2008 09:49:22 AM
Old 10-01-2008
Thanks era!

As far as I know [ngm]awk should maintain the files open until the end of the program or an explicit close call (close(filename)):

Code:
% strace  -eopen mawk '!(NR%10){print>(FILENAME 4);next}
{print>(FILENAME (++i))}i==3{i=0}' data
open("tls/i686/sse2/cmov/libm.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
[snip]
open("/lib/tls/i686/cmov/libc.so.6", O_RDONLY) = 3
open("data", O_RDONLY)                  = 3
open("data1", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
open("data2", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5
open("data3", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 6
open("data4", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7
Process 8618 detached

Code:
% strace  -eopen gawk '!(NR%10){print>(FILENAME 4);next}
{print>(FILENAME (++i))}i==3{i=0}' data
open("tls/i686/sse2/cmov/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
[snip]
open("/usr/lib/locale/en_US.utf8/LC_TIME", O_RDONLY) = 3
open("data", O_RDONLY|O_LARGEFILE)      = 3
open("data1", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 4
open("data2", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 5
open("data3", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 6
open("data4", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 7
Process 8641 detached

Reading the strace output I notice some differences in read/write calls timings.
I'm quite sure that the below output does not show all time consuming events.

Code:
% strace -c mawk '!(NR%10){print>(FILENAME 4);next}
{print>(FILENAME (++i))}i==3{i=0}' data
Process 7865 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 73.48    0.003954           0     26097           write
 25.83    0.001390           0     26313           read
  0.69    0.000037           1        57        49 open
  0.00    0.000000           0        10           close
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           time
  0.00    0.000000           0         4         4 access
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         5         5 ioctl
  0.00    0.000000           0         5           munmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0        13           mmap2
  0.00    0.000000           0        16        15 stat64
  0.00    0.000000           0         7           fstat64
  0.00    0.000000           0         1           set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00    0.005381                 52536        73 total

% rm data[1-4]                                     
% sync;sync                                        
% strace -c gawk '!(NR%10){print>(FILENAME 4);next}
{print>(FILENAME (++i))}i==3{i=0}' data
Process 7883 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 72.16    0.004391           0     26097           write
 27.21    0.001656           0     26102           read
  0.62    0.000038           0        89        72 open
  0.00    0.000000           0        17           close
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         5         5 access
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         6         5 ioctl
  0.00    0.000000           0         6           munmap
  0.00    0.000000           0         4           mprotect
  0.00    0.000000           0         4           _llseek
  0.00    0.000000           0         3           rt_sigaction
  0.00    0.000000           0        22           mmap2
  0.00    0.000000           0        16        15 stat64
  0.00    0.000000           0        25           fstat64
  0.00    0.000000           0         2           getgroups32
  0.00    0.000000           0        13           fcntl64
  0.00    0.000000           0         1           set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00    0.006085                 52416        97 total

% rm data[1-4]                                     
% sync;sync                                        
% strace -c newawk '!(NR%10){print>(FILENAME 4);next}
{print>(FILENAME (++i))}i==3{i=0}' data
Process 7943 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.90    0.123052           0   1000000           write
  1.10    0.001368           0     26101           read
  0.00    0.000000           0        64        52 open
  0.00    0.000000           0        15           close
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         4         4 access
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         7           munmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         1           rt_sigaction
  0.00    0.000000           0        18           mmap2
  0.00    0.000000           0        16        15 stat64
  0.00    0.000000           0        12           fstat64
  0.00    0.000000           0         1           set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00    0.124420               1026246        71 total

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a file based on a pattern

Dear all, I have a large file which is composed of 8000 frames, what i would like to do is split the file into 8000 single files names file.pdb.1, file.pdb.2 etc etc each frame in the large file is seperated by a "ENDMDL" flag so my thinking is to use this flag a a point to split the files... (4 Replies)
Discussion started by: Mish_99
4 Replies

2. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

3. Shell Programming and Scripting

Split a file based on pattern and size

Hello, I have a large file (2GB) that I would like to split based on pattern and size. I've used the following command to split the file (token is "HELLO") awk '/HELLO/{i++}{print > "file"i}' input.txt and the output is similar to the following (i included filesize in KB): 10 ... (2 Replies)
Discussion started by: jl487
2 Replies

4. Shell Programming and Scripting

Split the file based on pattern

Hi , I have huge files around 400 mb, which has clob data and have diffeent scenarios: I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria. Scenario 1: file name : scenario_1.txt ... (2 Replies)
Discussion started by: sol_nov
2 Replies

5. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

6. Shell Programming and Scripting

How to split a file based on pattern line number?

Hi i have requirement like below M <form_name> sdasadasdMklkM D ...... D ..... M form_name> sdasadasdMklkM D ...... D ..... D ...... D ..... M form_name> sdasadasdMklkM D ...... M form_name> sdasadasdMklkM i want split file based on line number by finding... (10 Replies)
Discussion started by: bhaskar v
10 Replies

7. UNIX for Dummies Questions & Answers

Split file based on number of blank lines

Hello All , I have a file which needs to split based on the blank lines Name ABC Address London Age 32 (4 blank new line) Name DEF Address London Age 30 (4 blank new line) Name DEF Address London (8 Replies)
Discussion started by: Pratik4891
8 Replies

8. Shell Programming and Scripting

Split a text file into multiple pages based on pattern

Hi, I have a text file (attached the sample). I have also, attached the way the way the files need to be split. We get this file, that will either have 24 Jurisdictions, or will miss some and retain some. Like in the attached sample file, there are only Jurisdictions 03,11,14,15, 20 and 30.... (3 Replies)
Discussion started by: ebsus
3 Replies

9. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies

10. Shell Programming and Scripting

Split File based on number of rows

Hi I have a requirement, where i will receive multiple files in a folder (say: /fol1/fol2/). There will be at least 14 to 16 files. The size of the files will different, some may be 80GB or 90GB, some may be less than 5 GB (and the size of the files are very unpredictable). But the names of the... (10 Replies)
Discussion started by: kpk_ds
10 Replies
All times are GMT -4. The time now is 04:30 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy