01-16-2013
Any utility specified to read text files (including awk, grep, read, and sed) may fail on any line longer than LINE_MAX bytes long. The value of LINE_MAX on your system can be found by running the command: getconf LINE_MAX. The cut, paste, and fold utilities, however, are required to work with text files with unlimited line lengths. So, a way to do this is to:
1. Use cut to create a file just containing field 2 from your intput file into a file (e.g., name_list).
2. Use cut to create a file with the first LINE_MAX-5 bytes from of your input file into a file (e.g., part001).
3. Use cut to create other files with sequential sets of LINE_MAX-5 bytes from your input file (e.g., part002 ... partXXX) such that every of part of your input file has been split into a file with lines less than LINE_MAX bytes long.
4. Read name_list and calculate the name of the file to contain the reassembled input line.
5. Read a line from each of the partXXX files and write it to the appropriate output file. (Note that the writes may have to be done as a separate write for each partXXX file line adding a trailing newline character to the write of the last partXXX file.) You could also create separate output_field2_partXXX files, and use paste to create the final output files from these intermediate files.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Dear All,
I would like to split a file of the following format into multiple files based on the number in the 6th column (numbers 1, 2, 3...):
ATOM 1 N GLY A 1 -3.198 27.537 -5.958 1.00 0.00 N
ATOM 2 CA GLY A 1 -2.199 28.399 -6.617 1.00 0.00 ... (3 Replies)
Discussion started by: tomasl
3 Replies
2. Shell Programming and Scripting
Hello,
What's the best way to split a large into multiple files based on the last digit in the first column.
input file:
f
2738483300000x0y03772748378831x1y13478378358383x2y23743878383802x3y33787828282820x4y43748838383881x5y5
Desired Output:
f0
3738483300000x0y03787828282820x4y4
f1... (9 Replies)
Discussion started by: alain.kazan
9 Replies
3. Shell Programming and Scripting
I am unable to spit the file based on the 2nd column passing as a parameter with awk command.
Source file:
“100”,”customer information”,”10000”
“200”,”customer information”,”50000”
“300”,”product information”,”40000”
script: the command is not allowing to pass the parameters with the awk... (7 Replies)
Discussion started by: number10
7 Replies
4. Shell Programming and Scripting
Hi,
I have a fixed width text file without any header row. One of the columns contains a date in YYYYMMDD format.
If the original file contains 3 dates, I want my shell script to split the file into 3 small files with data for each date.
I am a newbie and need help doing this. (14 Replies)
Discussion started by: bhanja_trinanja
14 Replies
5. Shell Programming and Scripting
Hi All
I have one query,say i have a requirement like the below code should be
move to diffent files whose maximum lines can be of 10 lines.Say in the below example,it consist of 14 lines.
This should be moved logically using the data in the fisrt coloumn to file1 and file 2.The data of first... (2 Replies)
Discussion started by: sarav.shan
2 Replies
6. UNIX for Dummies Questions & Answers
i have file1.txt
asdas|csada|130310|0423|A1|canberra
sdasd|sfdsf|130426|2328|A1|sydney
Expected output : on eaceh third and fourth colum, split into each two characters
asdas|csada|13|03|10|04|23|A1|canberra
sdasd|sfdsf|13|04|26|23|28|A1|sydney (10 Replies)
Discussion started by: radius
10 Replies
7. Shell Programming and Scripting
Hi,
I have a similar input format-
A_1 2
B_0 4
A_1 1
B_2 5
A_4 1
and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks!
letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies
8. Shell Programming and Scripting
Hi All,
I have a requirement to split file into 2 sets of file. Below is a sample data of the file
AU;PTN;24EX;25-AUG-14;AU;123;SE;123;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;456;SE;456;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;147;SE;147;Test NN;;;;ASD;... (6 Replies)
Discussion started by: galaxy_rocky
6 Replies
9. UNIX for Beginners Questions & Answers
Hi all,
Newbie here, so please bear over with my stupid question :)
I have used far too long time today on figuring this out, so I hope that someone here can help me move on.
I have some annotation data for a transcriptome where I want to split a column containing NCBI accession IDs into a... (7 Replies)
Discussion started by: BioBing
7 Replies
10. Shell Programming and Scripting
Hi Team,
I have a requirement in such a way that need to split the file into two based on which column particular value appears.Please find my sample file below.
Lets consider the delimiter of this file as either comma or two colons.(:: and ,). So I need to split the file in such a way that all... (2 Replies)
Discussion started by: ginrkf
2 Replies
paste(1) User Commands paste(1)
NAME
paste - merge corresponding or subsequent lines of files
SYNOPSIS
paste [-s] [-d list] file...
DESCRIPTION
The paste utility will concatenate the corresponding lines of the given input files, and write the resulting lines to standard output.
The default operation of paste will concatenate the corresponding lines of the input files. The NEWLINE character of every line except the
line from the last input file will be replaced with a TAB character.
If an EOF (end-of-file) condition is detected on one or more input files, but not all input files, paste will behave as though empty lines
were read from the files on which EOF was detected, unless the -s option is specified.
OPTIONS
The following options are supported:
-d list Unless a backslash character () appears in list, each character in list is an element specifying a delimiter character. If a
backslash character appears in list, the backslash character and one or more characters following it are an element specifying a
delimiter character as described below. These elements specify one or more delimiters to use, instead of the default TAB charac-
ter, to replace the NEWLINE character of the input lines. The elements in list are used circularly. That is, when the list is
exhausted, the first element from the list is reused.
When the -s option is specified:
o The last newline character in a file will not be modified.
o The delimiter will be reset to the first element of list after each file operand is processed.
When the option is not specified:
o The NEWLINE characters in the file specified by the last file will not be modified.
o The delimiter will be reset to the first element of list each time a line is processed from each file.
If a backslash character appears in list, it and the character following it will be used to represent the following delimiter
characters:
Newline character.
Tab character.
\ Backslash character.
Empty string (not a null character). If is immediately followed by the character x, the character X, or any character
defined by the LC_CTYPE digit keyword, the results are unspecified.
If any other characters follow the backslash, the results are unspecified.
-s Concatenate all of the lines of each separate input file in command line order. The NEWLINE character of every line except the
last line in each input file will be replaced with the TAB character, unless otherwise specified by the -d option.
OPERANDS
The following operand is supported:
file A path name of an input file. If - is specified for one or more of the files, the standard input will be used. The standard input
will be read one line at a time, circularly, for each instance of -. Implementations support pasting of at least 12 file operands.
USAGE
See largefile(5) for the description of the behavior of paste when encountering files greater than or equal to 2 Gbyte ( 2**31 bytes).
EXAMPLES
Example 1: Listing a directory in one column
example% ls | paste -d" " -
Example 2: Listing a directory in four columns
example% ls | paste - - - -
Example 3: Combining pairs of lines from a file into single lines
example% paste -s -d" t n" file
ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of paste: LANG, LC_ALL, LC_CTYPE, LC_MES-
SAGES, and NLSPATH.
EXIT STATUS
The following exit values are returned:
0 Successful completion.
>0 An error occurred.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Availability |SUNWesu |
+-----------------------------+-----------------------------+
|CSI |Enabled |
+-----------------------------+-----------------------------+
|Interface Stability |Standard |
+-----------------------------+-----------------------------+
SEE ALSO
cut(1), grep(1), pr(1), attributes(5), environ(5), largefile(5), standards(5)
DIAGNOSTICS
"line too long" Output lines are restricted to 511 characters.
"too many files" Except for -s option, no more than 12 input files may be specified.
"no delimiters" The -d option was specified with an empty list.
"cannot open file" The specified file cannot be opened.
SunOS 5.10 20 Dec 1996 paste(1)