Sponsored Content
Full Discussion: Collapsing similar strings
Top Forums UNIX for Dummies Questions & Answers Collapsing similar strings Post 302962899 by RudiC on Sunday 20th of December 2015 05:04:25 PM
Old 12-20-2015
I don't understand what haplotype-1 - 10 is. This is what I have so far:
Code:
awk '
        {T=$0
         gsub ($1 FS $2 FS "|" FS "*$", "", T)
         FREQ[T]++
         ST[T] = ST[T] $2 FS
         FQST[$2 FS T]++
        }
END     {for (f in FREQ)        {printf "%s ", f
                                 n = split (ST[f], TMP)
                                 for (i=1; i<n; i++) printf "%s(%s),", TMP[i], FQST[TMP[i] FS f]
                                 printf "-Freq-%s\n",  FREQ[f]
                                }
        }
' FS="\t" file
1    2    3    3    2    5    1    6    4    3    3    3 NY(1),-Freq-1
2    2    4    3    2    6    2    2    3    4    3    2 TX(1),NC(1),-Freq-2
3    3    3    3    2    5    1    5    3    2    2    4 TX(1),-Freq-1
3    3    3    3    2    5    1    5    3    2    2    2 TX(1),-Freq-1
3    2    3    3    2    5    1    3    3    3    2    3 NY(1),-Freq-1
2    2    3    3    2    5    1    5    4    3    2    4 TX(1),-Freq-1
2    2    3    3    2    5    1    5    3    3    2    4 GA(1),CA(1),TX(1),-Freq-3

---------- Post updated at 23:04 ---------- Previous update was at 22:19 ----------

This may come closer to what you need:
Code:
awk '
        {T=$0
         gsub ($1 FS $2 FS "|" FS "*$", "", T)
         FREQ[T]++
         ST[T] = ST[T] $2 FS
         FQST[$2 FS T]++
         BC[T] = $1
        }
END     {for (f in FREQ)        {printf "%s%s%s  ", BC[f], FS, f
                                 n = split (ST[f], TMP)
                                 if (n == 2)    print TMP[1]
                                 else   {for (i=1; i<n; i++) printf "%s(%s)%s", TMP[i], FQST[TMP[i] FS f], i==n-1?"-":","
                                                 printf "Freq-%s\n",  FREQ[f]
                                                }
                                }
        }
' FS="\t" file
BC00010    1    2    3    3    2    5    1    6    4    3    3    3  NY
BC00005    2    2    4    3    2    6    2    2    3    4    3    2  TX(1),NC(1)-Freq-2
BC00008    3    3    3    3    2    5    1    5    3    2    2    4  TX
BC00006    3    3    3    3    2    5    1    5    3    2    2    2  TX
BC00009    3    2    3    3    2    5    1    3    3    3    2    3  NY
BC00007    2    2    3    3    2    5    1    5    4    3    2    4  TX
BC00003    2    2    3    3    2    5    1    5    3    3    2    4  GA(1),CA(1),TX(1)-Freq-3

This User Gave Thanks to RudiC For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to concatenate two strings or several strings into one string in B-shell?

like connect "summer" and "winter" to "summerwinter"? Can anybody help me? thanks a lot. (2 Replies)
Discussion started by: fontana
2 Replies

2. UNIX for Dummies Questions & Answers

Delete strings in file1 based on the list of strings in file2

Hello guys, should be a very easy questn for you: I need to delete strings in file1 based on the list of strings in file2. like file2: word1_word2_ word3_word5_ word3_word4_ word6_word7_ file1: word1_word2_otherwords..,word3_word5_others... (7 Replies)
Discussion started by: roussine
7 Replies

3. Shell Programming and Scripting

Collapsing and counting by key column in a sorted file

Hi I have a tab separated file with reads mappings of more than 2 million reads> the file is sorted by ID and looks like the following: SeqID Seq FreqSeq PosSeq HWI-EA332_0036:5:100:10131:16361#ATGC/1 GACTTGAGGTCTCCCCCGCA 1 TZRTMR_40497:317:+... (4 Replies)
Discussion started by: ramouz87
4 Replies

4. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

5. Shell Programming and Scripting

awk to search similar strings and add their values

Hi, I have a text file with the following content: monday,20 tuesday,10 wednesday,29 monday,10 friday,12 wednesday,14 monday,15 thursday,34 i want the following output: monday,45 tuesday,10 wednesday,43 friday,12 (3 Replies)
Discussion started by: prashu_g
3 Replies

6. Shell Programming and Scripting

awk to search similar strings and arrange in a specified pattern

Hi, I'm running a DB query which returns names of people and writes it in a text file as shown below: Carey, Jim; Cena, John Cena, John Sen, Tim; Burt, Terrence Lock, Jessey; Carey, Jim Norris, Chuck; Lee, Bruce Rock, Dwayne; Lee, Bruce I want to use awk and get all the names... (9 Replies)
Discussion started by: prashu_g
9 Replies

7. UNIX for Dummies Questions & Answers

Finding similar strings between two files

Hi, I have a file1 like this: ABAT ABCA1 ABCC1 ABCC5 ABCC8 ABCE1 ABHD2 ABL1 CAMTA1 ACBD3 ACCN1 And I have a second file like this: chr19 46118590 46119564 MACS_peak_1499 3100.00 chr19 46122009 46148405 CYP2B7P1 -2445 chr1 7430312 7430990... (7 Replies)
Discussion started by: a_bahreini
7 Replies

8. UNIX for Dummies Questions & Answers

Issue when using egrep to extract strings (too many strings)

Dear all, I have a data like below (n of rows=400,000) and I want to extract the rows with certain strings. I use code below. It works if there is not too many strings for example n of strings <5000. while I have 90,000 strings to extract. If I use the egrep code below, I will get error: ... (3 Replies)
Discussion started by: forevertl
3 Replies

9. UNIX for Beginners Questions & Answers

How to pass strings from a list of strings from another file and create multiple files?

Hello Everyone , Iam a newbie to shell programming and iam reaching out if anyone can help in this :- I have two files 1) Insert.txt 2) partition_list.txt insert.txt looks like this :- insert into emp1 partition (partition_name) (a1, b2, c4, s6, d8) select a1, b2, c4, (2 Replies)
Discussion started by: nubie2linux
2 Replies

10. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which... (1 Reply)
Discussion started by: jvoot
1 Replies
cpufreq-selector(1)					      General Commands Manual					       cpufreq-selector(1)

NAME
cpufreq-selector -- tool to set CPU frequency SYNOPSIS
cpufreq-selector [-c CPU] [-g GOV] [-f FREQ] DESCRIPTION
cpufreq-selector is a command-line tool for choosing CPU frequency settings. OPTIONS
This program follows the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. -?, --help " 10 Show summary of options. -c NUMBER, --cpu=NUMBER " 10 number of CPU to set. If omitted, zeroth CPU is implied. -g GOV, --governor=GOV " 10 CPU governor to use, such as ``powersave'', ``performance''. -f FREQ, --freq=FREQ " 10 CPU frequency to use, in kHz. AUTHOR
cpufreq-selector, as part of gnome-applets, was written by Carlos Garcia Campos carlosgc@gnome.org and other GNOME contributors. This manual page was written by Theppitak Karoonboonyanan thep@linux.thai.net for the Debian system (but may be used by others). Permis- sion is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later ver- sion published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. cpufreq-selector(1)
All times are GMT -4. The time now is 06:02 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy