I have a huge file (over 30mb) that I am processing through with perl. I am pulling out a list of filenames and placing it in an array called @reports.
I am fine up till here. What I then want to do is go through the array and find any duplicates. If there is a duplicate, output it to the screen.... (3 Replies)
I am trying to figure out how to scan a file like so:
1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
2 margies office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims... (17 Replies)
I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found.
Eg. data...
AAAA00000000000000XXXX0000 0000000000... upto50 chars... (2 Replies)
Hi,
can I do something like this to add a condition of checking if the 4th field is number or space or blank also:
awk -F, '$4 /^*||*/' MYFILE >> OTHERFILE
I also want the other part i.e. I need to exclude all lines whose 4th field is space or blank or number:
MYFILE
a,b,c,d,e
a,b,c,2,r... (2 Replies)
Hi,
I have a pipe seperated file
I want to write a code to display count of lines that have 20th field not null.
nawk -F"|" '{if ($20!="") print NR,$20}' xyz..txt
This displays records with 20th field also null.
I would like output as: (4 Replies)
I was trying to use the AIX 6.1 sort command to sort fixed-length data records, sorting by specific columns only. It took some time to figure out how to get it to work, so I wanted to share the solution. The sort man page wasn't much help, because it talks about field delimeters (default space... (1 Reply)
I am currently creating a script to find filenames that are listed once in an input file (find non duplicates). I then want to report those single files in another file. Here is the function that I have so far:
function dups_filenames
{
file2=""
file1=""
file=""
dn=""
ch=""
pn=""
... (6 Replies)
Hi team,
I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record.
can one help me on finding the duplicates,
Thanks in advance.
... (2 Replies)
Hi everyone. I'm trying to help my wife with a project, she has exported 200 images from many different folders, unfortunately there was a problem with the export and I need to find the master versions so that she doesn't have to go through and select them again.
I need to:
For each image in... (2 Replies)
Discussion started by: Rhinoskin
2 Replies
LEARN ABOUT DEBIAN
compalign
COMPALIGN(1) General Commands Manual COMPALIGN(1)NAME
compalign - compare two multiple alignments
SYNOPSIS
compalign [-options] <trusted-alignment> <test-alignment>
DESCRIPTION
compalign calculates the fractional "identity" between the trusted alignment and the test alignment. The two files must contain exactly the
same sequences, in exactly the same order.
The identity of the multiple sequence alignments is defined as the averaged identity over all N(N-1)/2 pairwise alignments.
The fractional identity of two sets of pairwise alignments is in turn defined as follows (for aligned known sequences k1 and k2, and
aligned test sequences t1 and t2):
matched columns / total columns
where total columns = the total number of columns in which there is
a valid (nongap) symbol in k1 or k2;
matched columns = the number of columns in which one of the
following is true:
k1 and k2 both have valid symbols at a given column; t1 and t2
have the same symbols aligned in a column of the t1/t2
alignment;
k1 has a symbol aligned to a gap in k2; that symbol in t1 is
also aligned to a gap;
k2 has a symbol aligned to a gap in k1; that symbol in t2 is
also aligned to a gap.
Because scores for all possible pairs are calculated, the algorithm is of order (N^2)L for N sequences of length L; large sequence sets
will take a while.
OPTIONS
Available options:
-h Print short help and usage info.
-c Only compare under marked #=CS consensus structure.
--informat <s>
Specify that both alignments are in format <s> (MSF, for instance).
--quiet
Suppress verbose header (used in regression testing).
SEE ALSO afetch(1), alistat(1), compstruct(1), revcomp(1), seqsplit(1), seqstat(1), sfetch(1), shuffle(1), sindex(1), sreformat(1), stranslate(1),
weight(1).
AUTHOR
Sean Eddy
HHMI/Department of Genetics
Washington University School of Medicine
4444 Forest Park Blvd., Box 8510
St Louis, MO 63108 USA
Phone: 1-314-362-7666
FAX : 1-314-362-2157
Email: eddy@genetics.wustl.edu
This manual page was written by Nelson A. de Oliveira <naoliv@gmail.com>,
for the Debian project (but may be used by others).
Mon, 01 Aug 2005 15:28:08 -0300COMPALIGN(1)