01-16-2013
Any utility specified to read text files (including awk, grep, read, and sed) may fail on any line longer than LINE_MAX bytes long. The value of LINE_MAX on your system can be found by running the command: getconf LINE_MAX. The cut, paste, and fold utilities, however, are required to work with text files with unlimited line lengths. So, a way to do this is to:
1. Use cut to create a file just containing field 2 from your intput file into a file (e.g., name_list).
2. Use cut to create a file with the first LINE_MAX-5 bytes from of your input file into a file (e.g., part001).
3. Use cut to create other files with sequential sets of LINE_MAX-5 bytes from your input file (e.g., part002 ... partXXX) such that every of part of your input file has been split into a file with lines less than LINE_MAX bytes long.
4. Read name_list and calculate the name of the file to contain the reassembled input line.
5. Read a line from each of the partXXX files and write it to the appropriate output file. (Note that the writes may have to be done as a separate write for each partXXX file line adding a trailing newline character to the write of the last partXXX file.) You could also create separate output_field2_partXXX files, and use paste to create the final output files from these intermediate files.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Dear All,
I would like to split a file of the following format into multiple files based on the number in the 6th column (numbers 1, 2, 3...):
ATOM 1 N GLY A 1 -3.198 27.537 -5.958 1.00 0.00 N
ATOM 2 CA GLY A 1 -2.199 28.399 -6.617 1.00 0.00 ... (3 Replies)
Discussion started by: tomasl
3 Replies
2. Shell Programming and Scripting
Hello,
What's the best way to split a large into multiple files based on the last digit in the first column.
input file:
f
2738483300000x0y03772748378831x1y13478378358383x2y23743878383802x3y33787828282820x4y43748838383881x5y5
Desired Output:
f0
3738483300000x0y03787828282820x4y4
f1... (9 Replies)
Discussion started by: alain.kazan
9 Replies
3. Shell Programming and Scripting
I am unable to spit the file based on the 2nd column passing as a parameter with awk command.
Source file:
“100”,”customer information”,”10000”
“200”,”customer information”,”50000”
“300”,”product information”,”40000”
script: the command is not allowing to pass the parameters with the awk... (7 Replies)
Discussion started by: number10
7 Replies
4. Shell Programming and Scripting
Hi,
I have a fixed width text file without any header row. One of the columns contains a date in YYYYMMDD format.
If the original file contains 3 dates, I want my shell script to split the file into 3 small files with data for each date.
I am a newbie and need help doing this. (14 Replies)
Discussion started by: bhanja_trinanja
14 Replies
5. Shell Programming and Scripting
Hi All
I have one query,say i have a requirement like the below code should be
move to diffent files whose maximum lines can be of 10 lines.Say in the below example,it consist of 14 lines.
This should be moved logically using the data in the fisrt coloumn to file1 and file 2.The data of first... (2 Replies)
Discussion started by: sarav.shan
2 Replies
6. UNIX for Dummies Questions & Answers
i have file1.txt
asdas|csada|130310|0423|A1|canberra
sdasd|sfdsf|130426|2328|A1|sydney
Expected output : on eaceh third and fourth colum, split into each two characters
asdas|csada|13|03|10|04|23|A1|canberra
sdasd|sfdsf|13|04|26|23|28|A1|sydney (10 Replies)
Discussion started by: radius
10 Replies
7. Shell Programming and Scripting
Hi,
I have a similar input format-
A_1 2
B_0 4
A_1 1
B_2 5
A_4 1
and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks!
letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies
8. Shell Programming and Scripting
Hi All,
I have a requirement to split file into 2 sets of file. Below is a sample data of the file
AU;PTN;24EX;25-AUG-14;AU;123;SE;123;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;456;SE;456;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;147;SE;147;Test NN;;;;ASD;... (6 Replies)
Discussion started by: galaxy_rocky
6 Replies
9. UNIX for Beginners Questions & Answers
Hi all,
Newbie here, so please bear over with my stupid question :)
I have used far too long time today on figuring this out, so I hope that someone here can help me move on.
I have some annotation data for a transcriptome where I want to split a column containing NCBI accession IDs into a... (7 Replies)
Discussion started by: BioBing
7 Replies
10. Shell Programming and Scripting
Hi Team,
I have a requirement in such a way that need to split the file into two based on which column particular value appears.Please find my sample file below.
Lets consider the delimiter of this file as either comma or two colons.(:: and ,). So I need to split the file in such a way that all... (2 Replies)
Discussion started by: ginrkf
2 Replies
LEARN ABOUT OPENSOLARIS
cut
cut(1) User Commands cut(1)
NAME
cut - cut out selected fields of each line of a file
SYNOPSIS
cut -b list [-n] [file]...
cut -c list [file]...
cut -f list [-d delim] [-s] [file]...
DESCRIPTION
Use the cut utility to cut out columns from a table or fields from each line of a file; in data base parlance, it implements the projection
of a relation. The fields as specified by list can be fixed length, that is, character positions as on a punched card (-c option) or the
length can vary from line to line and be marked with a field delimiter character like TAB (-f option). cut can be used as a filter.
Either the -b, -c, or -f option must be specified.
Use grep(1) to make horizontal ``cuts'' (by context) through a file, or paste(1) to put files together column-wise (that is, horizontally).
To reorder columns in a table, use cut and paste.
OPTIONS
The following options are supported:
list A comma-separated or blank-character-separated list of integer field numbers (in increasing order), with optional - to indi-
cate ranges (for instance, 1,4,7; 1-3,8; -5,10 (short for 1-5,10); or 3- (short for third through last field)).
-b list The list following -b specifies byte positions (for instance, -b1-72 would pass the first 72 bytes of each line). When -b and
-n are used together, list is adjusted so that no multi-byte character is split.
-c list The list following -c specifies character positions (for instance, -c1-72 would pass the first 72 characters of each line).
-d delim The character following -d is the field delimiter (-f option only). Default is tab. Space or other characters with special
meaning to the shell must be quoted. delim can be a multi-byte character.
-f list The list following -f is a list of fields assumed to be separated in the file by a delimiter character (see -d ); for
instance, -f1,7 copies the first and seventh field only. Lines with no field delimiters will be passed through intact (useful
for table subheadings), unless -s is specified.
-n Do not split characters. When -b list and -n are used together, list is adjusted so that no multi-byte character is split.
-s Suppresses lines with no delimiter characters in case of -f option. Unless specified, lines with no delimiters will be passed
through untouched.
OPERANDS
The following operands are supported:
file A path name of an input file. If no file operands are specified, or if a file operand is -, the standard input will be used.
USAGE
See largefile(5) for the description of the behavior of cut when encountering files greater than or equal to 2 Gbyte (2^31 bytes).
EXAMPLES
Example 1 Mapping user IDs
A mapping of user IDs to names follows:
example% cut -d: -f1,5 /etc/passwd
Example 2 Setting current login name
To set name to current login name:
example$ name=`who am i | cut -f1 -d' '`
ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of cut: LANG, LC_ALL, LC_CTYPE, LC_MES-
SAGES, and NLSPATH.
EXIT STATUS
The following exit values are returned:
0 All input files were output successfully.
>0 An error occurred.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Availability |SUNWcsu |
+-----------------------------+-----------------------------+
|CSI |Enabled |
+-----------------------------+-----------------------------+
|Interface Stability |Standard |
+-----------------------------+-----------------------------+
SEE ALSO
grep(1), paste(1), attributes(5), environ(5), largefile(5), standards(5)
DIAGNOSTICS
cut: -n may only be used with -b
cut: -d may only be used with -f
cut: -s may only be used with -f
cut: cannot open <file>
Either file cannot be read or does not exist. If multiple files are present, processing continues.
cut: no delimiter specified
Missing delim on -d option.
cut: invalid delimiter
cut: no list specified
Missing list on -b, -c, or -f option.
cut: invalid range specifier
cut: too many ranges specified
cut: range must be increasing
cut: invalid character in range
cut: internal error processing input
cut: invalid multibyte character
cut: unable to allocate enough memory
SunOS 5.11 29 Apr 1999 cut(1)