05-16-2010
Split large file based on last digit from a column
Hello,
What's the best way to split a large into multiple files based on the last digit in the first column.
input file:
f
2738483300000x0y03772748378831x1y13478378358383x2y23743878383802x3y33787828282820x4y43748838383881x5y5
Desired Output:
f0
3738483300000x0y03787828282820x4y4
f1
3772748378831x14y143748838383881x53y51
f2
3743878383802x28y73
f3
3478378358383x56y66
the file is about 60Million records, and im using grep to do the splitting but i guess there must be a faster way. grep ^3.........."$i" ( where value of i is from 0 to 9 )
Appreciate your ideas .
Alain
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I have a file containing date/time sorted data of the form
...
2009/06/10,20:59:59.950,XAG/USD,Q,1,1115, 14.3025,100,1,1
2009/06/10,20:59:59.950,XAG/USD,Q,1,1116, 14.3026,125,1,1
2009/06/10,20:59:59.950,XAG/USD,R,0,0, , 0,0,0
2009/06/10,20:59:59.950,XAG/USD,R,1,0, 14.1910,100,1,1... (6 Replies)
Discussion started by: asriva
6 Replies
2. Shell Programming and Scripting
Dear All,
I would like to split a file of the following format into multiple files based on the number in the 6th column (numbers 1, 2, 3...):
ATOM 1 N GLY A 1 -3.198 27.537 -5.958 1.00 0.00 N
ATOM 2 CA GLY A 1 -2.199 28.399 -6.617 1.00 0.00 ... (3 Replies)
Discussion started by: tomasl
3 Replies
3. Shell Programming and Scripting
I am unable to spit the file based on the 2nd column passing as a parameter with awk command.
Source file:
“100”,”customer information”,”10000”
“200”,”customer information”,”50000”
“300”,”product information”,”40000”
script: the command is not allowing to pass the parameters with the awk... (7 Replies)
Discussion started by: number10
7 Replies
4. Shell Programming and Scripting
Hi,
I have a fixed width text file without any header row. One of the columns contains a date in YYYYMMDD format.
If the original file contains 3 dates, I want my shell script to split the file into 3 small files with data for each date.
I am a newbie and need help doing this. (14 Replies)
Discussion started by: bhanja_trinanja
14 Replies
5. Shell Programming and Scripting
Hi All
I have one query,say i have a requirement like the below code should be
move to diffent files whose maximum lines can be of 10 lines.Say in the below example,it consist of 14 lines.
This should be moved logically using the data in the fisrt coloumn to file1 and file 2.The data of first... (2 Replies)
Discussion started by: sarav.shan
2 Replies
6. Shell Programming and Scripting
Hi,
I have a file sample_1.txt (300k rows) which has data like below:
* Also each record is around 64k bytes
11|1|abc|102553|125589|64k bytes of data
10|2|def|123452|123356|......
13|2|geh|144351|121123|...
25|4|fgh|165250|118890|..
14|1|abc|186149|116657|......... (6 Replies)
Discussion started by: sol_nov
6 Replies
7. Shell Programming and Scripting
Help needed urgently please.
I have a large file - a few hundred thousand lines.
Sample
CP START ACCOUNT
1234556
name 1
CP END ACCOUNT
CP START ACCOUNT
2224444
name 1
CP END ACCOUNT
CP START ACCOUNT
333344444
name 1
CP END ACCOUNT
I need to split this file each time "CP START... (7 Replies)
Discussion started by: frustrated1
7 Replies
8. UNIX for Dummies Questions & Answers
i have file1.txt
asdas|csada|130310|0423|A1|canberra
sdasd|sfdsf|130426|2328|A1|sydney
Expected output : on eaceh third and fourth colum, split into each two characters
asdas|csada|13|03|10|04|23|A1|canberra
sdasd|sfdsf|13|04|26|23|28|A1|sydney (10 Replies)
Discussion started by: radius
10 Replies
9. Shell Programming and Scripting
Hi All,
I have a requirement to split file into 2 sets of file. Below is a sample data of the file
AU;PTN;24EX;25-AUG-14;AU;123;SE;123;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;456;SE;456;Test NN;;;;ASD;
AU;PTN;24EX;25-AUG-14;AU;147;SE;147;Test NN;;;;ASD;... (6 Replies)
Discussion started by: galaxy_rocky
6 Replies
10. Shell Programming and Scripting
Hi Team,
I have a requirement in such a way that need to split the file into two based on which column particular value appears.Please find my sample file below.
Lets consider the delimiter of this file as either comma or two colons.(:: and ,). So I need to split the file in such a way that all... (2 Replies)
Discussion started by: ginrkf
2 Replies
pr(1) General Commands Manual pr(1)
Name
pr - print files
Syntax
pr [ options ] [ files ]
Description
The command prints the named files on the standard output. If file is designated by a minus sign (-), or if no files are specified the
command assumes standard input. By default, the listing is separated into pages, each headed by the page number, a date and time, and the
name of the file.
By default, columns are of equal width, separated by at least one space. Lines that do not fit are truncated. However, if the -s option is
used, lines are not truncated and columns are separated by the separation character.
If the standard output is associated with a terminal, error messages are withheld until has finished printing.
Options
The following options can be used singly or in combination:
-a Prints multi-column output across the page.
-b Prints blank headers.
-d Double-spaces the output.
-eck Expands input tabs to character positions k+1, 2*k+1, 3*k+1,... n*k+1. If k is 0 or is omitted, tabs are set at every eighth posi-
tion. Tab characters in the input are expanded into the appropriate number of spaces. The default for c (any non-digit character)
is the tab character; therefore, if c is given, it is treated as the input tab character.
-f Uses form-feed character for new pages. The default is to use a sequence of line-feeds. The -f option causes the command to pause
before beginning the first page if the standard output is associated with a terminal.
-h Uses the next argument as the header to be printed instead of the file name.
-ick Replaces white space in output by inserting tabs to character positions k+1, 2*k+1, 3*k+1,...n*k+1. If k is 0 or is omitted, tabs
are set at every eighth position. The default for c (any non-digit character) is the tab character; therefore, if c is given, it
is treated as the input tab character.
+k Begins printing with page k (default is 1).
-k Produces k-column output (default is 1). The -e and -i options are assumed for multi-column output.
-lk Sets the length of a page to k lines. The default is 66 lines.
-m Merges and prints all files simultaneously, one per column (overrides the -k, and -a options).
-nck Numbers lines. The default for k is 20. The number occupies the first k+1 character positions of each column of normal output or
each line of -m output. If c, which is any non-digit character is given, it is appended to the line number to separate it from
whatever follows. The default for c is a tab.
-ok Offsets each line by k character positions (default is 0). The number of character positions per line is the sum of the width and
offset.
-p Pauses before beginning each page if the output is directed to a terminal. The command rings the bell at the terminal and awaits a
carriage return.
-r Suppresses diagnostic reports on failure to open files.
-sc Separates columns by the single character c instead of by the appropriate number of spaces (default for c is a tab).
-t Suppresses the five-line identifying header and the five-line trailer normally supplied for each page. The -t option causes the
command to quit printing after the last line of each file without spacing to the end of the page.
-wk Sets the width of a line to k character positions. The default is 72 for equal-width multi-column output; otherwise there is no
limit.
Examples
Print file1 and file2 as a double-spaced, three-column listing with the heading: file list.
pr -3dh "file list" file1 file2
Write file1 on file2, expanding tabs to columns 10, 19, 28, 37,...:
pr -e9 -t <file1>file2
Files
/dev/tty* to suspend messages
See Also
cat(1)
pr(1)