Split a file using 2-D indexing system Post: 302775461

Sponsored Content

Top Forums Shell Programming and Scripting Split a file using 2-D indexing system Post 302775461 by Don Cragun on Tuesday 5th of March 2013 12:38:40 AM

03-05-2013

Registered User

If the awk on your system only supports single character settings for RS, or if you'd like to base the output filenames on the input filenames, be able to specify more than one input file, and be able to specify the number of files to be produced before updating the value of the 1st numeric value in the output filename, you could try the following script:

Code:

#!/bin/ksh
cnt=3
Usage="Usage: $(basename $0) [-n cnt] file..."
# Split input file(s) into files named file.X.Y where X and Y reset to 1
# and 1, respectively, for each file operand.  A new file is created
# when a line in an input file starts with a <greater-than> character
# (">").  Lines starting with a <greater-than> character are not
# included in any of the output files, but all other lines are copied 
# unchanged into the corresponding output file.  When a new file is
# created, Y is incremented until it exceeds cnt (which defaults to 3 if
# the -n option is not given on the command line.  When Y exceeds cnt, X
# is incremented and Y is reset to 1.
while getopts n: opt
do      case $opt in
        (n)     cnt="$OPTARG";;
        (?)     echo "$Usage" >&2
                exit 1
        esac
done
shift $(($OPTIND - 1))
if [ $# -lt 1 ]
then    echo "$(basename $0): At least one file operand is required." >&2
        echo "$Usage" >&2
        exit 2
fi
awk -v cnt=$cnt '
FNR == 1 {
        # This is the first record of a new input file.
        # If this is not the first input file, close the last output file for
        # the previous input file.
        if(NR != FNR) close(fn)
        # Create output filename based on input filename.
        x = y = 1
        fn = FILENAME "." x "." y
}
/^>/ {  # Close current output file
        close(fn)
        if(y == cnt) {
                y = 1
                x++
        } else  y++
        fn = FILENAME "." x "." y
        next
}
{       print > fn
}' "$@"

It uses the Korn shell, but will also work with any other shell that accepts parameter expansions specified by the POSIX Standards (including bash).

Note that if the first line in an input file or two or more adjacent lines in an input file start with a >, empty files will not be created; the corresponding filename will just be skipped.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each...

2. Shell Programming and Scripting

Array indexing in shell

Hi , I have 4 array as below Input: servernames=(10.144.0.129 10.144.0.130 10.144.0.131) subfolder_129=(PSTN_SigtranCamel_03 PSTN_SigtranCamel_04 PSTN_SigtranCamel_05) subfolder_130=(SigtranCamel_11 SigtranCamel_12 SigtranCamel_13 SigtranCamel_14 SigtranCamel_15)...

3. Shell Programming and Scripting

[ask]filtering file to indexing...

dear all, i have file with format like this file_master.txt 20110212|231213|rio|apri|23112|222222 20110212|312311|jaka|dino|31223|543234 20110301|343322|alfan|budi|32131|333311 ... i want filter with output like this index_nm.txt rio|apri jaka|dino ... index_years.txt 20110212...

4. Shell Programming and Scripting

indexing list of words in a file

Hey all, I'm doing a project currently and want to index words in a webpage. So there would be a file with webpage content and a file with list of words, I want an output file with true and false that would show which word exists in the webpage. example: Webpage content data.html ...

5. Shell Programming and Scripting

indexing a file

hello guys, I have a file like this: input.dat Push-to-talk No Coonection IP support Support for IP telephony Yes Built-in SIP stack Yes Support via software Yes Microsoft Support for Microsoft Exchange Yes UMA

6. UNIX for Dummies Questions & Answers

awk, array indexing

cat filename|nawk ' { FS="="; if (!a++ == 0) print $0 } ' can anyone plz explain how does array inexing works,how it is evaluating if (!a++ == 0)??

7. Shell Programming and Scripting

Indexing Variable Names

Hi All I think I might have bitten off more than I can chew here and I'm hoping some of you guys with advanced pattern matching skills can help me. What I want to do is index the occurrence of variable names within a library of scripts that I have. Don't ask why, I'm just sad like that... ...

8. UNIX for Dummies Questions & Answers

Single Liner for indexing

Hello, This is pretty simple, I`m looking for a faster and better method than brute force that I`m doing. I have a 20GB file looks like Name1,Var1,Val1 Name1,Var2,Val2 Name2,Var1,Val3 Name2,Var2,Val4 I want 3 files. Nameindex 1 Name1 2 Name2 ...

9. Solaris

Split a big file system to several files

Gents Actually I have question and i need your support. I have this NAS file system mounted as /coresys has size of 7 TB I need to Split this file system into several file systems as mount points I mean how to can I Split it professionally to different NAS mount points how to can I decide...

10. Solaris

Split huge File System

Gents I have huge NAS File System as /sys with size 10 TB and I want to Split each 1TB in spirit File System to be mounted in the server. How to can I do that without changing anything in the source. Please your support.

LEARN ABOUT DEBIAN

total

TOTAL(1)						      General Commands Manual							  TOTAL(1)

NAME

       total - sum up columns

SYNOPSIS

       total [ -m ][ -sE | -p | -u | -l ][ -i{f|d}[N] ][ -o{f|d} ][ -tC ][ -N [ -r ]] [ file ..  ]

DESCRIPTION

       Total sums up columns of real numbers from one or more files and prints out the result on its standard output.

       By default, total computes the straigt sum of each input column, but multiplication can be specified instead with the -p option.  Likewise,
       the -u option means find the upper limit (maximum), and -l means find the lower limit (minimum).

       Sums of powers can be computed by giving an exponent with the -s option.  (Note that there is no space between the -s  and  the	exponent.)
       This  exponent  can be any real number, positive or negative.  The absolute value of the input is always taken before the power is computed
       in order to avoid complex results.  Thus, -s1 will produce a sum of absolute values.  The default power (zero) is interpreted as a straight
       sum without taking absolute values.

       The -m option can be used to compute the mean rather than the total.  For sums, the arithmetic mean is computed.  For products, the geomet-
       ric mean is computed.  (A logarithmic sum of absolute values is used to avoid overflow, and zero values are silently ignored.)

       If the input data is binary, the -id or -if option may be given for 64-bit double or 32-bit float values, respectively.	Either option  may
       be  followed  immediately  by  an  optional  count, which defaults to 1, indicating the number of double or float binary values to read per
       record on the input file.  (There can be no space between the option and this count.)  Similarly, the -od and -of  options  specify  binary
       double or float output, respectively.  These options do not need a count, as this will be determined by the number of input channels.

       A count can be given as the number of lines to read before computing a result.  Normally, total reads each file to its end before producing
       its result, but this behavior may be overridden by inserting blank lines in the input.  For each blank input line, total produces a  result
       as  if  the end-of-file had been reached.  If two blank lines immediately follow each other, total closes the file and proceeds to the next
       one (after reporting the result).  The -N option (where N is a decimal integer) tells total to produce a result and reset  the  calculation
       after  every N input lines.  In addition, the -r option can be specified to override reinitialization and thus give a running total every N
       lines (or every blank line).  If the end of file is reached, the current total is printed and the calculation is reset before the next file
       (with or without the -r option).

       The -tC option can be used to specify the input and output tab character.  The default tab character is TAB.

       If no files are given, the standard input is read.

EXAMPLE

       To compute the RMS value of colon-separated columns in a file:

	 total -t: -m -s2 input

       To produce a running product of values from a file:

	 total -p -1 -r input

BUGS

       If the input files have varying numbers of columns, mean values will certainly be off.  Total will ignore missing column entries if the tab
       separator is a non-white character, but cannot tell where a missing column should have been if the tab character is white.

AUTHOR

       Greg Ward

SEE ALSO

       cnt(1), neaten(1), rcalc(1), rlam(1), tabfunc(1)

RADIANCE
							      2/3/95								  TOTAL(1)