02-17-2011
Help- counting delimiter in a huge file and split data into 2 files
I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;”
Here is the sample of 5 lines in the file:
Name1;phone1;address1;city1;state1;zipcode1
Name2;phone2;address2;city2;state2;zipcode2;comment
Name3;phone3;address3;city3;state3;zipcode3
Name4;phone4;address4;city4;state4;zipcode4
Name5;phone5;address5
I need a script to read each line and count the number of ; on each line
If delimiter counts = 5 Then
Write that line to goodfile1
Else
Write bad line to rejectedfile1.
The result of two output files should look like this
goodfile1 has:
Name1;phone1;address1;city1;state1;zipcode1
Name3;phone3;address3;city3;state3;zipcode3
Name4;phone4;address4;city4;state4;zipcode4
rejectedfile1 has:
Name2;phone2;address2;city2;state2;zipcode2;comment
Name5;phone5;address5
Thanks
9 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Input file data contents:
>seq_1
MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA
>seq_2
AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE
>seq_3
ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM
ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA... (7 Replies)
Discussion started by: patrick87
7 Replies
2. Shell Programming and Scripting
Below is my perl script:
#!/usr/bin/perl
open(FILE,"$ARGV") or die "$!";
@DATA = <FILE>;
close FILE;
$join = join("",@DATA);
@array = split( ">",$join);
for($i=0;$i<=scalar(@array);$i++){
system ("/home/bin/./program_name_count_length MULTI_sequence_DATA_FILE -d... (5 Replies)
Discussion started by: patrick87
5 Replies
3. Shell Programming and Scripting
I have a directory of files that I need to rename by splitting the first and second halves of the filenames using the delimiter "-O" and then renaming with the second half first, followed by two underscores and then the first half. For example, natfinal1995annvol1_14.pdf -O filenum-20639 will be... (2 Replies)
Discussion started by: swimulator
2 Replies
4. Shell Programming and Scripting
I have file which contains around 5000 lines.
The lines are fixed legth but having no delimiter.Each line line contains nearly 3000 characters.
I want to delete the lines
a> if it starts with 1 and if 576th postion is a digit i,e 0-9
or
b> if it starts with 0 or 9(i,e header and footer)
... (4 Replies)
Discussion started by: millan
4 Replies
5. Shell Programming and Scripting
Hi,
I have a file which has many URLs delimited by space. Now i want them to move to separate files each one holding 10 URLs per file.
http://3276.e-printphoto.co.uk/guardian http://abdera.apache.org/ http://abdera.apache.org/docs/api/index.html
I have used the below code to arrange... (6 Replies)
Discussion started by: vel4ever
6 Replies
6. UNIX for Dummies Questions & Answers
Hi,
I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each.
Please help me as Split command cannot work here as it might miss tags..
Format of the file is as below
<!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies
7. Shell Programming and Scripting
We have a folder XYZ with large number of files (>350,000). how can i split the folder and create say 10 of them XYZ1 to XYZ10 with 35,000 files each. (doesnt matter which files go where). (12 Replies)
Discussion started by: AlokKumbhare
12 Replies
8. UNIX for Advanced & Expert Users
I have 2 large file (.dat) around 70 g, 12 columns but the data not sorted in both the files.. need your inputs in giving the best optimized method/command to achieve this and redirect the not macthing lines to the thrid file ( diff.dat)
File 1 - 15 columns
File 2 - 15 columns
Data is... (9 Replies)
Discussion started by: kartikirans
9 Replies
9. UNIX for Beginners Questions & Answers
I have a large semicolon delimited file with thousands of columns and many thousands of line. It looks like:
ID1;ID2;ID3;ID4;A_1;B_1;C_1;A_2;B_2;C_2;A_3;B_3;C_3
AA;ax;ay;az;01;02;03;04;05;06;07;08;09
BB;bx;by;bz;03;05;33;44;15;26;27;08;09
I want to split this table in to multiple files:
... (1 Reply)
Discussion started by: trymega
1 Replies
ucblinks(1B) SunOS/BSD Compatibility Package Commands ucblinks(1B)
NAME
ucblinks - adds /dev entries to give SunOS 4.x compatible names to SunOS 5.x devices
SYNOPSIS
/usr/ucb/ucblinks [-e rulebase] [-r rootdir]
DESCRIPTION
ucblinks creates symbolic links under the /dev directory for devices whose SunOS 5.x names differ from their SunOS 4.x names. Where possi-
ble, these symbolic links point to the device's SunOS 5.x name rather than to the actual /devices entry.
ucblinks does not remove unneeded compatibility links; these must be removed by hand.
ucblinks should be called each time the system is reconfiguration-booted, after any new SunOS 5.x links that are needed have been created,
since the reconfiguration may have resulted in more compatibility names being needed.
In releases prior to SunOS 5.4, ucblinks used a nawk rule-base to construct the SunOS 4.x compatible names. ucblinks no longer uses nawk
for the default operation, although nawk rule-bases can still be specifed with the -e option. The nawk rule-base equivalent to the SunOS
5.4 default operation can be found in /usr/ucblib/ucblinks.awk.
OPTIONS
-e rulebase Specify rulebase as the file containing nawk(1) pattern-action statements.
-r rootdir Specify rootdir as the directory under which dev and devices will be found, rather than the standard root directory /.
FILES
/usr/ucblib/ucblinks.awk sample rule-base for compatibility links
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Availability |SUNWscpu |
+-----------------------------+-----------------------------+
SEE ALSO
devlinks(1M), disks(1M), ports(1M), tapes(1M), attributes(5)
SunOS 5.10 13 Apr 1994 ucblinks(1B)