Assigning the same frequency to more than one words in a file Post: 302850841

Sponsored Content

Top Forums Shell Programming and Scripting Assigning the same frequency to more than one words in a file Post 302850841 by gimley on Thursday 5th of September 2013 08:08:19 PM

09-05-2013

Registered User

Assigning the same frequency to more than one words in a file

I have a file of names with the following structure

Code:

NAME [tab] FREQUENCY
NAME NAME [tab] FREQUENCY
NAME NAME NAME [tab] FREQUENCY

i.e. more than one name is assigned the same frequency. An example will make this clear

Code:

SANDHYA DAS	6901
ARATI DAS	6201
KALPANA DAS	4714
GITA DAS	4550
BISWANATH DAS	3949
SWAPAN DAS	3941
SUKUMAR DAS	3876
GOPAL DAS	3835
SARASWATI DAS	3769
DILIP DAS	3653
TAPAN DAS	3607
ASHOKE DAS	3604
PRATIMA DAS	3558
PURNIMA DAS	3546
BASANTI DAS	3372
SHANKAR DAS	3279
SANDHYA GHOSH	3254
SANJAY DAS	3252
PRATIMA DAS	3212
KALPANA DAS	3203
ARATI GHOSH	3155
MALATI DAS	3151
SWAPAN DAS	3138
SANDHYA RANI DAS	3120
LAKSHMI DAS	3104
ANJALI DAS	3085

I want to assign the same frequency to both names or to all three names to ensure that statistically both or all three names within a field retain their frequency.
The expected output would be

Code:

ANJALI	3085
ARATI	6201
ARATI	3155
ASHOKE	3604
BASANTI	3372
BISWANATH	3949
DILIP	3653
GITA	4550
GOPAL	3835
KALPANA	4714
KALPANA	3203
LAKSHMI	3104
MALATI	3151
PRATIMA	3558
PRATIMA	3212
PURNIMA	3546
SANDHYA	6901
SANDHYA	3254
SANDHYA	3120
SANJAY	3252
SARASWATI	3769
SHANKAR	3279
SUKUMAR	3876
SWAPAN	3941
SWAPAN	3138
TAPAN	3607
DAS	3085
DAS	6201
DAS	3155
DAS	3604
DAS	3372
DAS	3949
DAS	3653
DAS	4550
DAS	3835
DAS	4714
DAS	3203
DAS	3104
DAS	3151
DAS	3558
DAS	3212
DAS	3546
DAS	3254
DAS	3120
DAS	3252
DAS	3279
DAS	3876
DAS	3138
DAS	3607
GHOSH	6901
GHOSH	3769
RANI	3941
DAS	3120

I am doing this field separation by means of a Macro in Excel but since the database is huge, the process is long and tedious.
Would it be possible to do the same with the help of a PERL/AWK script ? I already have written an awk tool to merge all frequencies, which I could use to merge the frequencies. Aa an example all occurencies of

Code:

DAS

would thus have a merged frequency.
I work under the Windows OS and UNIX (sigh) is not my OS. No shell scripts please.
Many thanks.

gimley

View Public Profile for gimley

Find all posts by gimley

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting Concatenated Words in Input File with Words from a Master File

Hello, I have a complex problem. I have a file in which words have been joined together: Theboy ranslowly I want to be able to correctly split the words using a lookup file in which all the words occur: the boy ran slowly slow put child ly The lookup file which is meant for look up...

2. Shell Programming and Scripting

count frequency of words in a file

I need to write a shell script "cmn" that, given an integer k, print the k most common words in descending order of frequency. Example Usage: user@ubuntu:/$ cmn 4 < example.txt :b:

3. Shell Programming and Scripting

Splitting concatenated words in input file with words from the same file

Dear all, I am working with names and I have a large file of names in which some words are written together (upto 4 or 5) and their corresponding single forms are also present in the word-list. An example would make this clear annamarie mariechristine johnsmith johnjoseph smith john smith...

4. Shell Programming and Scripting

Script to sort large file with frequency

Hello, I have a very large file of around 2 million records which has the following structure: I have used the standard awk program to sort: # wordfreq.awk --- print list of word frequencies { # remove punctuation #gsub(/_]/, "", $0) for (i = 1; i <= NF; i++) freq++ } END { for (word...

5. Shell Programming and Scripting

Sorting a file with frequency on length

Hello, I have a file which has the following structure word space Frequency The file is around 30,000 headwords each along with its frequency. The words have different lengths. What I need is a PERL or AWK script which can sort the file on length of the headword and once the file is sorted on...

6. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Hello, I have a large file of syllables /strings in Urdu. Each word is on a separate line. Example in English: be at for if being attract I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and...

7. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l

8. UNIX for Dummies Questions & Answers

Replace the words in the file to the words that user type?

Hello, I would like to change my setting in a file to the setting that user input. For example, by default it is ONBOOT=ON When user key in "YES", it would be ONBOOT=YES -------------- This code only adds in the entire user input, but didn't replace it. How do i go about...

9. Shell Programming and Scripting

Frequency of Words in a File, sed script from 1980

tr -cs A-Za-z\' '\n' | tr A-Z a-z | sort | uniq -c | sort -k1,1nr -k2 | sed ${1:-25} < book7.txt This is not my script, it can be found way back from 1980 but once it worked fine to give me the most used words in a text file. Now the shell is complaining about an error in sed sed: -e...

10. Shell Programming and Scripting

Replace particular words in file based on if finds another words in that line

Hi All, I need one help to replace particular words in file based on if finds another words in that file . i.e. my self is peter@king. i am staying at north sydney. we all are peter@king. How to replace peter to sham if it finds @king in any line of that file. Please help me...

LEARN ABOUT DEBIAN

bio::graphics::glyph::hat

Bio::Graphics::Glyph::hat(3pm)				User Contributed Perl Documentation			    Bio::Graphics::Glyph::hat(3pm)

NAME

       Bio::Graphics::Glyph::hat - The "hat" glyph

SYNOPSIS

	 See L<Bio::Graphics::Panel> and L<Bio::Graphics::Glyph>.

DESCRIPTION

       This glyph draws an inverted V parallel to the sequence segment. It is different from other glyphs in that it is designed to work with DAS
       tracks. The inverted V is drawn BETWEEN subparts as if you specified a connector type of "hat".

   OPTIONS
       This glyph takes only the standard options. See Bio::Graphics::Glyph for a full explanation.

	 Option      Description		      Default
	 ------      -----------		      -------

	 -fgcolor      Foreground color 	      black

	 -outlinecolor Synonym for -fgcolor

	 -bgcolor      Background color 	      turquoise

	 -fillcolor    Synonym for -bgcolor

	 -linewidth    Line width		      1

	 -height       Height of glyph		      10

	 -font	       Glyph font		      gdSmallFont

	 -connector    Connector type		      0 (false)

	 -connector_color
		       Connector color		      black

	 -label        Whether to draw a label	      0 (false)

	 -description  Whether to draw a description  0 (false)

	 -strand_arrow Whether to indicate	      0 (false)
			strandedness

	 -hilite       Highlight color		      undef (no color)

BUGS

       Please report them.

SEE ALSO

       Bio::Graphics::Panel, Bio::Graphics::Glyph, Bio::Graphics::Glyph::arrow, Bio::Graphics::Glyph::cds, Bio::Graphics::Glyph::crossbox,
       Bio::Graphics::Glyph::diamond, Bio::Graphics::Glyph::dna, Bio::Graphics::Glyph::dot, Bio::Graphics::Glyph::ellipse,
       Bio::Graphics::Glyph::extending_arrow, Bio::Graphics::Glyph::generic, Bio::Graphics::Glyph::graded_segments,
       Bio::Graphics::Glyph::heterogeneous_segments, Bio::Graphics::Glyph::line, Bio::Graphics::Glyph::pinsertion, Bio::Graphics::Glyph::primers,
       Bio::Graphics::Glyph::rndrect, Bio::Graphics::Glyph::segments, Bio::Graphics::Glyph::ruler_arrow, Bio::Graphics::Glyph::toomany,
       Bio::Graphics::Glyph::transcript, Bio::Graphics::Glyph::transcript2, Bio::Graphics::Glyph::translation, Bio::Graphics::Glyph::triangle,
       Bio::DB::GFF, Bio::SeqI, Bio::SeqFeatureI, Bio::Das, GD

AUTHOR

       Allen Day <day@cshl.org>.

       Copyright (c) 2001 Cold Spring Harbor Laboratory

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.  See DISCLAIMER.txt for
       disclaimers of warranty.

perl v5.14.2							    2012-02-20					    Bio::Graphics::Glyph::hat(3pm)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting Concatenated Words in Input File with Words from a Master File

Discussion started by: gimley

2. Shell Programming and Scripting

count frequency of words in a file

Discussion started by: mohit_iitk

3. Shell Programming and Scripting

Splitting concatenated words in input file with words from the same file

Discussion started by: gimley

4. Shell Programming and Scripting

Script to sort large file with frequency

Discussion started by: gimley

5. Shell Programming and Scripting

Sorting a file with frequency on length

Discussion started by: gimley

6. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Discussion started by: gimley

7. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Discussion started by: jmarx

8. UNIX for Dummies Questions & Answers

Replace the words in the file to the words that user type?

Discussion started by: malfolozy

9. Shell Programming and Scripting

Frequency of Words in a File, sed script from 1980

Discussion started by: 1in10

10. Shell Programming and Scripting

Replace particular words in file based on if finds another words in that line

Discussion started by: Rajib Podder

LEARN ABOUT DEBIAN

bio::graphics::glyph::hat