Sponsored Content
Full Discussion: Collapsing similar strings
Top Forums UNIX for Dummies Questions & Answers Collapsing similar strings Post 302962890 by Xterra on Sunday 20th of December 2015 03:16:05 PM
Old 12-20-2015
Collapsing similar strings

I have a file that looks like this:
Code:
BC00001	GA	2	2	3	3	2	5	1	5	3	3	2	4																																																																																																																																																																																	
BC00002	CA	2	2	3	3	2	5	1	5	3	3	2	4																																																																																																																																																																																	
BC00003	TX	2	2	3	3	2	5	1	5	3	3	2	4																																																																																																																																																																																	
BC00004	TX	2	2	4	3	2	6	2	2	3	4	3	2																																																																																																																																																																																	
BC00005	NC	2	2	4	3	2	6	2	2	3	4	3	2																																																																																																																																																																																	
BC00006	TX	3	3	3	3	2	5	1	5	3	2	2	2																																																																																																																																																																																	
BC00007	TX	2	2	3	3	2	5	1	5	4	3	2	4																																																																																																																																																																																	
BC00008	TX	3	3	3	3	2	5	1	5	3	2	2	4																																																																																																																																																																																	
BC00009	NY	3	2	3	3	2	5	1	3	3	3	2	3																																																																																																																																																																																	
BC00010	NY	1	2	3	3	2	5	1	6	4	3	3	3

Column 1 $ 2 are the Identifiers. I need to scan each entry from column 3-14 and find those that are identical and 'collapse' them into one entry. I should also record the frequency by state "()" and global "Freq". Thus, my outfile should look like this:
Code:
BC00001	GA(1),CA(1),TX(1)-Freq-3	2	2	3	3	2	5	1	5	3	3	2	4																																																																																																																																																																																	
BC00004	TX(1),NC(1)-Freq-2	2	2	4	3	2	6	2	2	3	4	3	2																																																																																																																																																																																	
BC00006	TX	3	3	3	3	2	5	1	5	3	2	2	2																																																																																																																																																																																	
BC00007	TX	2	2	3	3	2	5	1	5	4	3	2	4																																																																																																																																																																																	
BC00008	TX	3	3	3	3	2	5	1	5	3	2	2	4																																																																																																																																																																																	
BC00009	NY	3	2	3	3	2	5	1	3	3	3	2	3																																																																																																																																																																																	
BC00010	NY	1	2	3	3	2	5	1	6	4	3	3	3

I put together the following awk script:
Code:
awk '{id=$1}{query=$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10"\t"$11"\t"$12"\t"$13"\t"$14}{F[query]++;if (!I[query]) I[query]=id"\t"$2" Freq"}END{for(i in I)print I[i],F[i],i}'

but I am far from getting the expected results. This is what I am getting:
Code:
BC00010 NY Freq 1 1     2       3       3       2       5       1       6       4       3       3       3
BC00006 TX Freq 1 3     3       3       3       2       5       1       5       3       2       2       2
BC00008 TX Freq 1 3     3       3       3       2       5       1       5       3       2       2       4
BC00007 TX Freq 1 2     2       3       3       2       5       1       5       4       3       2       4
BC00004 TX Freq 2 2     2       4       3       2       6       2       2       3       4       3       2
         Freq 1
BC00001 GA Freq 3 2     2       3       3       2       5       1       5       3       3       2       4
BC00009 NY Freq 1 3     2       3       3       2       5       1       3       3       3       2       3

Any help with my code will be greatly appreciated!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to concatenate two strings or several strings into one string in B-shell?

like connect "summer" and "winter" to "summerwinter"? Can anybody help me? thanks a lot. (2 Replies)
Discussion started by: fontana
2 Replies

2. UNIX for Dummies Questions & Answers

Delete strings in file1 based on the list of strings in file2

Hello guys, should be a very easy questn for you: I need to delete strings in file1 based on the list of strings in file2. like file2: word1_word2_ word3_word5_ word3_word4_ word6_word7_ file1: word1_word2_otherwords..,word3_word5_others... (7 Replies)
Discussion started by: roussine
7 Replies

3. Shell Programming and Scripting

Collapsing and counting by key column in a sorted file

Hi I have a tab separated file with reads mappings of more than 2 million reads> the file is sorted by ID and looks like the following: SeqID Seq FreqSeq PosSeq HWI-EA332_0036:5:100:10131:16361#ATGC/1 GACTTGAGGTCTCCCCCGCA 1 TZRTMR_40497:317:+... (4 Replies)
Discussion started by: ramouz87
4 Replies

4. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

5. Shell Programming and Scripting

awk to search similar strings and add their values

Hi, I have a text file with the following content: monday,20 tuesday,10 wednesday,29 monday,10 friday,12 wednesday,14 monday,15 thursday,34 i want the following output: monday,45 tuesday,10 wednesday,43 friday,12 (3 Replies)
Discussion started by: prashu_g
3 Replies

6. Shell Programming and Scripting

awk to search similar strings and arrange in a specified pattern

Hi, I'm running a DB query which returns names of people and writes it in a text file as shown below: Carey, Jim; Cena, John Cena, John Sen, Tim; Burt, Terrence Lock, Jessey; Carey, Jim Norris, Chuck; Lee, Bruce Rock, Dwayne; Lee, Bruce I want to use awk and get all the names... (9 Replies)
Discussion started by: prashu_g
9 Replies

7. UNIX for Dummies Questions & Answers

Finding similar strings between two files

Hi, I have a file1 like this: ABAT ABCA1 ABCC1 ABCC5 ABCC8 ABCE1 ABHD2 ABL1 CAMTA1 ACBD3 ACCN1 And I have a second file like this: chr19 46118590 46119564 MACS_peak_1499 3100.00 chr19 46122009 46148405 CYP2B7P1 -2445 chr1 7430312 7430990... (7 Replies)
Discussion started by: a_bahreini
7 Replies

8. UNIX for Dummies Questions & Answers

Issue when using egrep to extract strings (too many strings)

Dear all, I have a data like below (n of rows=400,000) and I want to extract the rows with certain strings. I use code below. It works if there is not too many strings for example n of strings <5000. while I have 90,000 strings to extract. If I use the egrep code below, I will get error: ... (3 Replies)
Discussion started by: forevertl
3 Replies

9. UNIX for Beginners Questions & Answers

How to pass strings from a list of strings from another file and create multiple files?

Hello Everyone , Iam a newbie to shell programming and iam reaching out if anyone can help in this :- I have two files 1) Insert.txt 2) partition_list.txt insert.txt looks like this :- insert into emp1 partition (partition_name) (a1, b2, c4, s6, d8) select a1, b2, c4, (2 Replies)
Discussion started by: nubie2linux
2 Replies

10. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which... (1 Reply)
Discussion started by: jvoot
1 Replies
All times are GMT -4. The time now is 02:30 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy