Converting a list to a row to create clusters based on numerical identity


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Converting a list to a row to create clusters based on numerical identity
# 1  
Old 09-29-2012
Converting a list to a row to create clusters based on numerical identity

Hello.
I have a long list of data which has the following structure
Quote:
Number=word
The number shows the unique identity of the word. And all homophones are clustered with the same number ID.
An example will make this clear
Quote:
782=angelina
782=anjelina
782=angelinaa
782=aangelina
117=angoori
117=angoorie
117=angooriy
117=angooriye
The awk script I have allows conversion of a list to row but on condition that each cluster is separated by a Hard return.
Code:
# script to convert a list of names to a row
BEGIN{FS="="}
NF==0{f=0;printf "\n";next}{if(f==1){printf "="$1}else{f=1;printf $1}}

Unluckily this is not the case here. I have tried to modify it such that a database of the type above can be clustered together. The expected output would be:
Quote:
782=angelina=anjelina=angelinaa=aangelina
117=angoori=angoorie=angooriy=angooriye
How do I modify the script such that the output above is generated and all homographs belonging to the same numerical ID are clustered together.
I have tried to adapt the script but without much success.
Many thanks
# 2  
Old 09-29-2012
Code:
BEGIN{FS="=";}
{if (!a[$1]) a[$1]=$1; a[$1]=a[$1] "=" $2;}
END {for(i in a) print a[i];}

This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 09-29-2012
Many thanks. It worked beautifully.
I understood the logic (most important) My only query:
Code:
BEGIN{FS="=";}

Wnat does the semi-colon do after the file delimiter. I normally write:
Code:
BEGIN{FS="="}

The script gives the same output both with and without the semi-colon.
Does the semi-colon bring in any change
# 4  
Old 09-29-2012
The ";" in awk is code line delimeter. End of line, ";", or closing "}" perform the same function in awk.

The BEGIN statements shown perform the same function.

Last edited by rdrtx1; 09-29-2012 at 11:48 AM..
# 5  
Old 09-29-2012
Many thanks. I will rememeber that when I write the first "invocation".
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting single row into multiple rows based on for every 10 digits of last field of the row

Hi ALL, We have requirement in a file, i have multiple rows. Example below: Input file rows 01,1,102319,0,0,70,26,U,1,331,000000113200000011920000001212 01,1,102319,0,1,80,20,U,1,241,00000059420000006021 I need my output file should be as mentioned below. Last field should split for... (4 Replies)
Discussion started by: kotra
4 Replies

2. Shell Programming and Scripting

Create multiple files from single file based on row separator

Hello , Can anyone please help me to solve the below - Input.txt source table abc col1 char col2 number source table bcd col1 date col2 char output should be 2 files based on the row separator "source table" abc.txt col1 char (6 Replies)
Discussion started by: Pratik4891
6 Replies

3. Shell Programming and Scripting

Getting unique based on clusters

Hi, I have a file with 25 clusters and each cluster has multiple rows. I need to find the unique genes in each cluster and assign them Annotation Cluster 2 Enrichment Score: 10.199579524507685 Category Term Count % PValue Genes List Total Pop Hits Pop Total Fold... (1 Reply)
Discussion started by: Diya123
1 Replies

4. Shell Programming and Scripting

Converting a list to a row delimited

Hello, I have a large database with the following structure: set of clustered names followed by a hard return and followed by a second set of clustered names and so on. Sometimes the clusters can be as many as 150. Since the data is in an Indian language, a theoretical example will make this... (9 Replies)
Discussion started by: gimley
9 Replies

5. UNIX for Dummies Questions & Answers

finding and moving files based on the last three numerical characters in the filename

Hi, I have a series of files (upwards of 500) the filename format is as follows CC10-1234P1999.WGS84.p190, all in one directory. Now the last three numeric characters, in this case 999, can be anything from 001 to 999. I need to move some of them to a seperate directory, the ones I need to... (5 Replies)
Discussion started by: roche.j.mike
5 Replies

6. Shell Programming and Scripting

Insert new line based on numerical number of column

My input file: Class Number Position Range 1 Initial 50 1 Initial 50 2 Terminal 150 2 Terminal 20 2 Single 10 3 Single 20 4 Double 50 5 Initial 50 5 Initial 60 Class Number... (11 Replies)
Discussion started by: patrick87
11 Replies

7. Shell Programming and Scripting

create diffrent files based on other file and parameters list

I would like ot create shell script/ bash to create diffrent files based on a file and parameters list. Here is the detail example: I have a textfile and four static parameter files (having ‘?'). mainfile.txt has below records (this count may be more than 50) A200001 A200101 B200001... (9 Replies)
Discussion started by: raghav525
9 Replies
Login or Register to Ask a Question