Split a file based on pattern in awk, grep, sed or perl Post: 302207522

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each...

2. Shell Programming and Scripting

Split File Based on Line Number Pattern

Hello all. Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need. My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to...

3. Shell Programming and Scripting

Split a file based on a pattern

Dear all, I have a large file which is composed of 8000 frames, what i would like to do is split the file into 8000 single files names file.pdb.1, file.pdb.2 etc etc each frame in the large file is seperated by a "ENDMDL" flag so my thinking is to use this flag a a point to split the files...

4. Shell Programming and Scripting

Help to search multiple pattern in file with grep/sed/awk

Hello All, I have a file which is having below type of data, Jul 19 2011 | 123456 Jul 19 2011 | 123456 Jul 20 2011 | 123456 Jul 20 2011 | 123456 Here I wanted to grep for date pattern as below, so that it should only grep "Jul 20" OR "Jul ...

5. Shell Programming and Scripting

how to get data from hex file using SED or AWK based on pattern sign

I have a binary (hex) file I need to parse to get some data which are encoded this way: .* b4 . . . 01 12 .* af .* 83 L1 x1 x2 xL 84 L2 y1 y2 yL By another words there is a stream of hexadecimal bytes (in my example separated by space for better readability). I need to get value stored in...

6. Shell Programming and Scripting

Split a file based on pattern and size

Hello, I have a large file (2GB) that I would like to split based on pattern and size. I've used the following command to split the file (token is "HELLO") awk '/HELLO/{i++}{print > "file"i}' input.txt and the output is similar to the following (i included filesize in KB): 10 ...

7. Shell Programming and Scripting

Split the file based on pattern

Hi , I have huge files around 400 mb, which has clob data and have diffeent scenarios: I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria. Scenario 1: file name : scenario_1.txt ...

8. Shell Programming and Scripting

How to split a file based on pattern line number?

Hi i have requirement like below M <form_name> sdasadasdMklkM D ...... D ..... M form_name> sdasadasdMklkM D ...... D ..... D ...... D ..... M form_name> sdasadasdMklkM D ...... M form_name> sdasadasdMklkM i want split file based on line number by finding...

9. Shell Programming and Scripting

sed and awk usage to grep a pattern 1 and with reference to this grep a pattern 2 and pattern 3

Hi , I have a file where i have modifed certain things compared to original file . The difference of the original file and modified file is as follows. # diff mir_lex.c.modified mir_lex.c.orig 3209c3209 < if(yy_current_buffer -> yy_is_our_buffer == 0) { --- >...

10. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K

LEARN ABOUT OSF1

csplit

csplit(1)						      General Commands Manual							 csplit(1)

NAME

       csplit - Splits files by context

SYNOPSIS

       csplit [-f prefix] [-ks] [-nnumber] file | - arg1...argn

       The  csplit  command  reads  the specified file (or standard input) and separates it into segments defined by the specified arguments.  The
       csplit command optionally prints the sizes, in bytes, of each file created.

STANDARDS

       Interfaces documented on this reference page conform to industry standards as follows:

       csplit:	XCU5.0

       Refer to the standards(5) reference page for more information about industry standards and associated tags.

OPTIONS

       Specifies the prefix name (xx by default) for the created file segments.  Leaves previously created file segments intact in the event of an
       error.	By  default, created files are removed if an error occurs.  Uses number decimal digits to form file names for the file pieces. The
       default is 2.  Suppresses the display of file size messages.

OPERANDS

       Specifies the text file to be split.  If you specify - in place of the input file name, csplit reads from standard input.

       The operands arg1...argn can be a combination of the following: Creates a file using the contents of the lines from the current line up to,
       but  not  including,  the line that results from the evaluation of the regular expression with an offset, if included.  The offset argument
       can be any integer (positive or negative) that represents a number of lines.  A plus or minus sign is required.	Has  the  same	effect	as
       /pattern/,  except  that  no  segment file is created.  Moves forward or backward the specified number of lines from the line matched by an
       immediately preceding pattern argument (for example, /Page/-5).	Creates a file containing the segment from the current line up to, but not
       including,  line_number,  which	becomes  the current line.  Repeats the preceding argument the specified number of times.  This number can
       follow any of the pattern or line_number arguments.  If it follows a pattern argument, csplit reuses that pattern the specified	number	of
       times.  If it follows a line_number argument, csplit splits the file from that point every line_number of lines for number times.

DESCRIPTION

       By  default,  csplit  writes the file segments to files named xx00 ...xxn, where n is the number of arguments listed on the command line (n
       may not be greater than 99).  These new files get the following pieces of file: From the start of file up to, but not including,  the  line
       referenced  by  the first argument.  From the line referenced by the first argument up to the line referenced by the second argument.  From
       the line referenced by the last argument to the end of file.

       The csplit command does not alter the original file, unless a generated file overwrites the original file.

       Quote all pattern arguments that contain spaces or other characters special to the shell.  Patterns may not contain embedded newline  char-
       acters.

       [Tru64  UNIX]  See  the	grep(1)  reference  page  for information about creating patterns.  In an expression such as [a-z], the dash means
       "through" according to the current collating sequence.  The collating sequence is determined by the value  of  the  LC_COLLATE  environment
       variable.

       Unless the -s option is specified, csplit writes one line, containing the file size in bytes, for each file created to standard output.

EXIT STATUS

       The following exit values are returned: Successful completion.  An error occurred.

       Unless the -k option is used, any files created before the error was detected will be removed.

EXAMPLES

       To split the text of a book into a separate file for each chapter, enter: csplit book "/^Chapter *[0-9]/" {9}

	      This  creates files named xx00, xx01, xx02,...,xx09, which contain individual chapters of the file book.	Each chapter begins with a
	      line that contains only the word Chapter and the chapter number.	The file xx00 contains the front  matter  that	comes  before  the
	      first  chapter. The {9} after the pattern causes csplit to create up to 9 individual chapters; the remainder are placed in xx10.	To
	      specify the prefix for the created file names, enter: csplit -f chap book "/^Chapter *[0-9]/" {9}

	      This splits book into files named chap00, chap01,...chap9, chap10.

ENVIRONMENT VARIABLES

       The following environment variables affect the execution of csplit: Provides a default value for the  internationalization  variables  that
       are  unset  or null. If LANG is unset or null, the corresponding value from the default locale is used.	If any of the internationalization
       variables contain an invalid setting, the utility behaves as if none of the variables had been defined.	 If  set  to  a  non-empty  string
       value, overrides the values of all the other internationalization variables.  Determines the locale for the behavior of ranges, equivalence
       classes, and multicharacter collating elements within regular expressions.  Determines the locale for the interpretation  of  sequences	of
       bytes  of text data as characters (for example, single-byte as opposed to multibyte characters in arguments and input files) and the behav-
       ior of character classes within regular expressions.  Determines the locale for the format and contents of diagnostic messages  written	to
       standard error.	Determines the location of message catalogues for the processing of LC_MESSAGES.

SEE ALSO

       Commands:  ed(1), grep(1), sed(1), sh(1b), sh(1p), split(1)

       Files:  regexp(3)

       Standards:  standards(5)

																	 csplit(1)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

Discussion started by: madhunk

2. Shell Programming and Scripting

Split File Based on Line Number Pattern

Discussion started by: shankster

3. Shell Programming and Scripting

Split a file based on a pattern

Discussion started by: Mish_99

4. Shell Programming and Scripting

Help to search multiple pattern in file with grep/sed/awk

Discussion started by: gr8_usk