Split files with formatted numbers Post: 302908837

Sponsored Content

Top Forums Shell Programming and Scripting Split files with formatted numbers Post 302908837 by jim mcnamara on Thursday 10th of July 2014 04:59:54 PM

07-10-2014

Registered User

I guess I missed something - generally I think it is better to use a command that does what you want than to write a script, in this case

Code:

csplit

is a possible choice. It is educational to write a script but a better idea to use known good commands for production work.

Code:

csplit  -f splitz -k  -n 3  csprap01.logscan 10000 {5}

Explanation: split csprap01.logscan into five files named splitz000..splitz004

-f splitz -prefix for numbered file name - splitz001 .. splits999

-n number of decimal digits in the number: -n 3 means use zero filled numbers with 3 digits for output filenames

10000 means start from where you are in the file (usually the beginning) and stop 10000 lines later == lines 1-9999 are in the first split. 10000 - 19999 in the second.

{5} repeat five times - {*} (Linux csplit) means keep on repeating. This last option will cause you to overwrite the splitz000 file (and others) if you create more than 999 files as splits.

The line in red means the last file came up short of lines. With -k you lose no lines in the splits in case of error.

Code:

csplit  -f splitz -k  -n 3  csprap01.logscan 10000 {5}
1293851
1305465
1306543
2458441
1785104
/usr/local/bin/csplit: `10000': line number out of range on repetition 5
258231
jmcnama>
jmcnama > ls -lrt splitz*
-rw-r--r--   1 jmcnama  other    1293851 Jul 10 14:39 splitz000
-rw-r--r--   1 jmcnama  other    1305465 Jul 10 14:39 splitz001
-rw-r--r--   1 jmcnama  other    1306543 Jul 10 14:39 splitz002
-rw-r--r--   1 jmcnama  other    2458441 Jul 10 14:39 splitz003
-rw-r--r--   1 jmcnama  other    1785104 Jul 10 14:39 splitz004
-rw-r--r--   1 jmcnama  other     258231 Jul 10 14:39 splitz005

Code:

 jmcnama > wc -l splitz*
    9999 splitz000
   10000 splitz001
   10000 splitz002
   10000 splitz003
   10000 splitz004
    2093 splitz005
   52092 total
jmcnama >  wc -l csprap01.logscan
   52092 csprap01.logscan

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to remove improperly formatted fortran output line from files, tried sed

I have been trying to remove some improperly formatted lines of output from fortran code I have been using. The problem is that I have some singularities in the math for some points that causes an incorrectly large value to be reported that exceeds the normal formating set in the code resulting in...

2. Shell Programming and Scripting

Generating formatted reports from log files

3. UNIX for Dummies Questions & Answers

Split Function Prefix Numbers

Hello, Hello, I use the following command to split a file: split -Number_of_Lines Input_File MyPrefix_ output is MyPrefix_a MyPrefix_b MyPrefix_c ...... Instead, how can I get numerical values like: MyPrefix_1 MyPrefix_2 MyPrefix_3 ......

4. Shell Programming and Scripting

Extracting formatted text and numbers

Hello, I have a file of text and numbers from which I want to extract certain fields and write it to a new file. I would use awk but unfortunately the input data isn't always formatted into the correct columns. I am using tcsh. For example, given the following data I want to extract: and...

5. UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

Hey, I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time: for i in *.rtf.out do awk '/^>/{f=++d".fasta"} {print > $i.out}' $i done

6. Shell Programming and Scripting

Split a file into multiple files based on line numbers and first column value

Hi All I have one query,say i have a requirement like the below code should be move to diffent files whose maximum lines can be of 10 lines.Say in the below example,it consist of 14 lines. This should be moved logically using the data in the fisrt coloumn to file1 and file 2.The data of first...

7. Shell Programming and Scripting

awk split numbers

I would like to split a string of numbers "1-2,4-13,16,19-20,21-25,31-32" and output these with awk into -dFirstPage=1 -dLastPage=2 file.pdf -dFirstPage=4 -dLastPage=13 file.pdf -dFirstPage=16 -dLastPage=16 file.pdf file.pdf -dFirstPage=19 -dLastPage=20 file.pdf -dFirstPage=21 -dLastPage=25...

8. UNIX for Beginners Questions & Answers

Split and Rename Split Files

Hello, I need to split a file by number of records and rename each split file with actual filename pre-pended with 3 digit split number. What I have tried is the below command with 2 digit numeric value split -l 3 -d abc.txt F (# Will Produce split Files as F00 F01 F02) How to produce...

9. Shell Programming and Scripting

Sum up formatted numbers with comma separation

I need to sum up the values in field nr 5 in a data file that contains some file listing. The 5th field denotes the size of each file and following are some sample values. 1,775,947,633 4,738 7,300 16,610 15,279 0 0 I tried the following code in a shell script. awk '{sum+=$5} END{print...

LEARN ABOUT NETBSD

csplit

CSPLIT(1)						    BSD General Commands Manual 						 CSPLIT(1)

NAME

     csplit -- split files based on context

SYNOPSIS

     csplit [-ks] [-f prefix] [-n number] file args ...

DESCRIPTION

     The csplit utility splits file into pieces using the patterns args.  If file is a dash ('-'), csplit reads from standard input.

     Files are created with a prefix of ``xx'' and two decimal digits.	The size of each file is written to standard output as it is created.  If
     an error occurs whilst files are being created, or a HUP, INT, or TERM signal is received, all files previously written are removed.

     The options are as follows:

	   -f prefix   Create file names beginning with prefix, instead of ``xx''.

	   -k	       Do not remove previously created files if an error occurs or a HUP, INT, or TERM signal is received.

	   -n number   Create file names beginning with number of decimal digits after the prefix, instead of 2.

	   -s	       Do not write the size of each output file to standard output as it is created.

     The args operands may be a combination of the following patterns:

	   /regexp/[[+|-]offset]
		       Create a file containing the input from the current line to (but not including) the next line matching the given basic reg-
		       ular expression.  An optional offset from the line that matched may be specified.

	   %regexp%[[+|-]offset]
		       Same as above but a file is not created for the output.

	   line_no     Create containing the input from the current line to (but not including) the specified line number.

	   {num}       Repeat the previous pattern the specified number of times.  If it follows a line number pattern, a new file will be created
		       for each line_no lines, num times.  The first line of the file is line number 1 for historic reasons.

     After all the patterns have been processed, the remaining input data (if there is any) will be written to a new file.

     Requesting to split at a line before the current line number or past the end of the file will result in an error.

     The csplit utility exits 0 on success, and >0 if an error occurs.

ENVIRONMENT

     The LANG, LC_ALL, LC_COLLATE, and LC_CTYPE environment variables affect the execution of csplit as described in environ(7).

EXAMPLES

     Split the mdoc(7) file foo.1 into one file for each section (up to 20):

	   $ csplit -k foo.1 '%^.Sh%' '/^.Sh/' '{20}'

     Split standard input after the first 99 lines and every 100 lines thereafter:

	   $ csplit -k - 100 '{19}'

SEE ALSO

     sed(1), split(1), re_format(7)

STANDARDS

     The csplit utility conforms to IEEE Std 1003.1-2004 (``POSIX.1'').

HISTORY

     A csplit command appeared in PWB UNIX.

BUGS

     Input lines are limited to LINE_MAX (2048) bytes in length.

BSD
								  January 4, 2009							       BSD

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to remove improperly formatted fortran output line from files, tried sed

Discussion started by: gillesc_mac

2. Shell Programming and Scripting

Generating formatted reports from log files

Discussion started by: daccad

3. UNIX for Dummies Questions & Answers

Split Function Prefix Numbers

Discussion started by: Gussifinknottle

4. Shell Programming and Scripting

Extracting formatted text and numbers

Discussion started by: DFr0st

5. UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

Discussion started by: Ann Mc Cartney

6. Shell Programming and Scripting

Split a file into multiple files based on line numbers and first column value

Discussion started by: sarav.shan

7. Shell Programming and Scripting

awk split numbers

Discussion started by: sdf

8. UNIX for Beginners Questions & Answers

Split and Rename Split Files

Discussion started by: techedipro

9. Shell Programming and Scripting

Sum up formatted numbers with comma separation

Discussion started by: krishmaths

LEARN ABOUT NETBSD

csplit