Assigning the same frequency to more than one words in a file
I have a file of names with the following structure
Code:
NAME [tab] FREQUENCY
NAME NAME [tab] FREQUENCY
NAME NAME NAME [tab] FREQUENCY
i.e. more than one name is assigned the same frequency. An example will make this clear
Code:
SANDHYA DAS 6901
ARATI DAS 6201
KALPANA DAS 4714
GITA DAS 4550
BISWANATH DAS 3949
SWAPAN DAS 3941
SUKUMAR DAS 3876
GOPAL DAS 3835
SARASWATI DAS 3769
DILIP DAS 3653
TAPAN DAS 3607
ASHOKE DAS 3604
PRATIMA DAS 3558
PURNIMA DAS 3546
BASANTI DAS 3372
SHANKAR DAS 3279
SANDHYA GHOSH 3254
SANJAY DAS 3252
PRATIMA DAS 3212
KALPANA DAS 3203
ARATI GHOSH 3155
MALATI DAS 3151
SWAPAN DAS 3138
SANDHYA RANI DAS 3120
LAKSHMI DAS 3104
ANJALI DAS 3085
I want to assign the same frequency to both names or to all three names to ensure that statistically both or all three names within a field retain their frequency.
The expected output would be
Code:
ANJALI 3085
ARATI 6201
ARATI 3155
ASHOKE 3604
BASANTI 3372
BISWANATH 3949
DILIP 3653
GITA 4550
GOPAL 3835
KALPANA 4714
KALPANA 3203
LAKSHMI 3104
MALATI 3151
PRATIMA 3558
PRATIMA 3212
PURNIMA 3546
SANDHYA 6901
SANDHYA 3254
SANDHYA 3120
SANJAY 3252
SARASWATI 3769
SHANKAR 3279
SUKUMAR 3876
SWAPAN 3941
SWAPAN 3138
TAPAN 3607
DAS 3085
DAS 6201
DAS 3155
DAS 3604
DAS 3372
DAS 3949
DAS 3653
DAS 4550
DAS 3835
DAS 4714
DAS 3203
DAS 3104
DAS 3151
DAS 3558
DAS 3212
DAS 3546
DAS 3254
DAS 3120
DAS 3252
DAS 3279
DAS 3876
DAS 3138
DAS 3607
GHOSH 6901
GHOSH 3769
RANI 3941
DAS 3120
I am doing this field separation by means of a Macro in Excel but since the database is huge, the process is long and tedious.
Would it be possible to do the same with the help of a PERL/AWK script ? I already have written an awk tool to merge all frequencies, which I could use to merge the frequencies. Aa an example all occurencies of
Code:
DAS
would thus have a merged frequency.
I work under the Windows OS and UNIX (sigh) is not my OS. No shell scripts please.
Many thanks.
Hello,
I have a complex problem. I have a file in which words have been joined together:
Theboy ranslowly
I want to be able to correctly split the words using a lookup file in which all the words occur:
the
boy
ran
slowly
slow
put
child
ly
The lookup file which is meant for look up... (21 Replies)
I need to write a shell script "cmn" that, given an integer k, print the k most common words in descending order of frequency.
Example Usage:
user@ubuntu:/$ cmn 4 < example.txt :b: (3 Replies)
Dear all,
I am working with names and I have a large file of names in which some words are written together (upto 4 or 5) and their corresponding single forms are also present in the word-list.
An example would make this clear
annamarie
mariechristine
johnsmith
johnjoseph smith
john
smith... (8 Replies)
Hello,
I have a very large file of around 2 million records which has the following structure:
I have used the standard awk program to sort:
# wordfreq.awk --- print list of word frequencies
{
# remove punctuation
#gsub(/_]/, "", $0)
for (i = 1; i <= NF; i++)
freq++
}
END {
for (word... (3 Replies)
Hello,
I have a file which has the following structure
word space Frequency
The file is around 30,000 headwords each along with its frequency. The words have different lengths. What I need is a PERL or AWK script which can sort the file on length of the headword and once the file is sorted on... (12 Replies)
Hello,
I have a large file of syllables /strings in Urdu. Each word is on a separate line.
Example in English:
be
at
for
if
being
attract
I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and... (7 Replies)
Hi ,
I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Hello,
I would like to change my setting in a file to the setting that user input.
For example, by default it is
ONBOOT=ON
When user key in "YES", it would be
ONBOOT=YES
--------------
This code only adds in the entire user input, but didn't replace it.
How do i go about... (5 Replies)
tr -cs A-Za-z\' '\n' | tr A-Z a-z | sort | uniq -c | sort -k1,1nr -k2 | sed ${1:-25} < book7.txt
This is not my script, it can be found way back from 1980 but once it worked fine to give me the most used words in a text file.
Now the shell is complaining about an error in sed
sed: -e... (5 Replies)
Hi All,
I need one help to replace particular words in file based on if finds another words in that file .
i.e.
my self is peter@king.
i am staying at north sydney.
we all are peter@king.
How to replace peter to sham if it finds @king in any line of that file.
Please help me... (8 Replies)
Discussion started by: Rajib Podder
8 Replies
LEARN ABOUT DEBIAN
bio::graphics::glyph::hat
Bio::Graphics::Glyph::hat(3pm) User Contributed Perl Documentation Bio::Graphics::Glyph::hat(3pm)NAME
Bio::Graphics::Glyph::hat - The "hat" glyph
SYNOPSIS
See L<Bio::Graphics::Panel> and L<Bio::Graphics::Glyph>.
DESCRIPTION
This glyph draws an inverted V parallel to the sequence segment. It is different from other glyphs in that it is designed to work with DAS
tracks. The inverted V is drawn BETWEEN subparts as if you specified a connector type of "hat".
OPTIONS
This glyph takes only the standard options. See Bio::Graphics::Glyph for a full explanation.
Option Description Default
------------------------
-fgcolor Foreground color black
-outlinecolor Synonym for -fgcolor
-bgcolor Background color turquoise
-fillcolor Synonym for -bgcolor
-linewidth Line width 1
-height Height of glyph 10
-font Glyph font gdSmallFont
-connector Connector type 0 (false)
-connector_color
Connector color black
-label Whether to draw a label 0 (false)
-description Whether to draw a description 0 (false)
-strand_arrow Whether to indicate 0 (false)
strandedness
-hilite Highlight color undef (no color)
BUGS
Please report them.
SEE ALSO
Bio::Graphics::Panel, Bio::Graphics::Glyph, Bio::Graphics::Glyph::arrow, Bio::Graphics::Glyph::cds, Bio::Graphics::Glyph::crossbox,
Bio::Graphics::Glyph::diamond, Bio::Graphics::Glyph::dna, Bio::Graphics::Glyph::dot, Bio::Graphics::Glyph::ellipse,
Bio::Graphics::Glyph::extending_arrow, Bio::Graphics::Glyph::generic, Bio::Graphics::Glyph::graded_segments,
Bio::Graphics::Glyph::heterogeneous_segments, Bio::Graphics::Glyph::line, Bio::Graphics::Glyph::pinsertion, Bio::Graphics::Glyph::primers,
Bio::Graphics::Glyph::rndrect, Bio::Graphics::Glyph::segments, Bio::Graphics::Glyph::ruler_arrow, Bio::Graphics::Glyph::toomany,
Bio::Graphics::Glyph::transcript, Bio::Graphics::Glyph::transcript2, Bio::Graphics::Glyph::translation, Bio::Graphics::Glyph::triangle,
Bio::DB::GFF, Bio::SeqI, Bio::SeqFeatureI, Bio::Das, GD
AUTHOR
Allen Day <day@cshl.org>.
Copyright (c) 2001 Cold Spring Harbor Laboratory
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for
disclaimers of warranty.
perl v5.14.2 2012-02-20 Bio::Graphics::Glyph::hat(3pm)