02-17-2011
Help- counting delimiter in a huge file and split data into 2 files
I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;”
Here is the sample of 5 lines in the file:
Name1;phone1;address1;city1;state1;zipcode1
Name2;phone2;address2;city2;state2;zipcode2;comment
Name3;phone3;address3;city3;state3;zipcode3
Name4;phone4;address4;city4;state4;zipcode4
Name5;phone5;address5
I need a script to read each line and count the number of ; on each line
If delimiter counts = 5 Then
Write that line to goodfile1
Else
Write bad line to rejectedfile1.
The result of two output files should look like this
goodfile1 has:
Name1;phone1;address1;city1;state1;zipcode1
Name3;phone3;address3;city3;state3;zipcode3
Name4;phone4;address4;city4;state4;zipcode4
rejectedfile1 has:
Name2;phone2;address2;city2;state2;zipcode2;comment
Name5;phone5;address5
Thanks
9 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Input file data contents:
>seq_1
MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA
>seq_2
AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE
>seq_3
ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM
ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA... (7 Replies)
Discussion started by: patrick87
7 Replies
2. Shell Programming and Scripting
Below is my perl script:
#!/usr/bin/perl
open(FILE,"$ARGV") or die "$!";
@DATA = <FILE>;
close FILE;
$join = join("",@DATA);
@array = split( ">",$join);
for($i=0;$i<=scalar(@array);$i++){
system ("/home/bin/./program_name_count_length MULTI_sequence_DATA_FILE -d... (5 Replies)
Discussion started by: patrick87
5 Replies
3. Shell Programming and Scripting
I have a directory of files that I need to rename by splitting the first and second halves of the filenames using the delimiter "-O" and then renaming with the second half first, followed by two underscores and then the first half. For example, natfinal1995annvol1_14.pdf -O filenum-20639 will be... (2 Replies)
Discussion started by: swimulator
2 Replies
4. Shell Programming and Scripting
I have file which contains around 5000 lines.
The lines are fixed legth but having no delimiter.Each line line contains nearly 3000 characters.
I want to delete the lines
a> if it starts with 1 and if 576th postion is a digit i,e 0-9
or
b> if it starts with 0 or 9(i,e header and footer)
... (4 Replies)
Discussion started by: millan
4 Replies
5. Shell Programming and Scripting
Hi,
I have a file which has many URLs delimited by space. Now i want them to move to separate files each one holding 10 URLs per file.
http://3276.e-printphoto.co.uk/guardian http://abdera.apache.org/ http://abdera.apache.org/docs/api/index.html
I have used the below code to arrange... (6 Replies)
Discussion started by: vel4ever
6 Replies
6. UNIX for Dummies Questions & Answers
Hi,
I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each.
Please help me as Split command cannot work here as it might miss tags..
Format of the file is as below
<!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies
7. Shell Programming and Scripting
We have a folder XYZ with large number of files (>350,000). how can i split the folder and create say 10 of them XYZ1 to XYZ10 with 35,000 files each. (doesnt matter which files go where). (12 Replies)
Discussion started by: AlokKumbhare
12 Replies
8. UNIX for Advanced & Expert Users
I have 2 large file (.dat) around 70 g, 12 columns but the data not sorted in both the files.. need your inputs in giving the best optimized method/command to achieve this and redirect the not macthing lines to the thrid file ( diff.dat)
File 1 - 15 columns
File 2 - 15 columns
Data is... (9 Replies)
Discussion started by: kartikirans
9 Replies
9. UNIX for Beginners Questions & Answers
I have a large semicolon delimited file with thousands of columns and many thousands of line. It looks like:
ID1;ID2;ID3;ID4;A_1;B_1;C_1;A_2;B_2;C_2;A_3;B_3;C_3
AA;ax;ay;az;01;02;03;04;05;06;07;08;09
BB;bx;by;bz;03;05;33;44;15;26;27;08;09
I want to split this table in to multiple files:
... (1 Reply)
Discussion started by: trymega
1 Replies
LEARN ABOUT DEBIAN
pydhcplib.ipv4
pydhcplib.ipv4(3) PYDHCPLIB pydhcplib.ipv4(3)
NAME
pydhcplib.ipv4 - Type for IP addresses version 4
SYNOPSIS
from pydhcplib.type_ipv4 import ipv4
a = ipv4()
a = ipv4(string)
a = ipv4(strlist)
a = ipv4(int)
DESCRIPTION
The class pydhcplib.ipv4 is a type "IP address version 4". It's used for string processing like "192.168.0.4".
The class creation argument can be a string like "192.168.0.4".
The class creation argument can be a list of bytes like [192,168,0,4].
METHODS
The implemented methods in this class are mostly methods of comparison (= =, >, etc...)
else :
str() return data converted into a printable string.
list() return data converted into a list of bytes.
int() return data converted into an 4 bytes int.
EXAMPLES
Example program ipv4_example.py :
from pydhcplib.type_ipv4 import ipv4
address = ipv4()
address1 = ipv4("192.168.0.1")
address2 = ipv4("10.0.0.1")
address3 = ipv4([192,168,0,1])
print "a0 : ",address
print "a1 : ",address1
print "a2 : ",address2
print "a3 : ",address3
if address1 == address2 :
print "test 1 : ",address1, "==",address2
else :
print "test 1 : " ,address1, "!=",address2
if address1 == address3 :
print "test 2 : ", address1, "==",address3
else :
print "test 2 : ", address1, "!=",address3
SEE ALSO
pydhcp(8), pydhcplib.hwmac(3), pydhcplib.ipv4(3), pydhcplib.strlist(3), pydhcplib.DhcpPacket(3), pydhcplib.DhcpBasicPacket(3), pydhc-
plib.DhcpNetwork(3), pydhcplib.DhcpClient(3), pydhcplib.DhcpRawClient(3), pydhcplib.DhcpDerver(3)
BUGS
See http://pydhcplib.tuxfamily.org/ for more information.
AUTHOR
Mathieu Ignacio (mignacio[AT]april.org)
pydhcplib.ipv4(3)