Sponsored Content
Full Discussion: Collapsing similar strings
Top Forums UNIX for Dummies Questions & Answers Collapsing similar strings Post 302962899 by RudiC on Sunday 20th of December 2015 05:04:25 PM
Old 12-20-2015
I don't understand what haplotype-1 - 10 is. This is what I have so far:
Code:
awk '
        {T=$0
         gsub ($1 FS $2 FS "|" FS "*$", "", T)
         FREQ[T]++
         ST[T] = ST[T] $2 FS
         FQST[$2 FS T]++
        }
END     {for (f in FREQ)        {printf "%s ", f
                                 n = split (ST[f], TMP)
                                 for (i=1; i<n; i++) printf "%s(%s),", TMP[i], FQST[TMP[i] FS f]
                                 printf "-Freq-%s\n",  FREQ[f]
                                }
        }
' FS="\t" file
1    2    3    3    2    5    1    6    4    3    3    3 NY(1),-Freq-1
2    2    4    3    2    6    2    2    3    4    3    2 TX(1),NC(1),-Freq-2
3    3    3    3    2    5    1    5    3    2    2    4 TX(1),-Freq-1
3    3    3    3    2    5    1    5    3    2    2    2 TX(1),-Freq-1
3    2    3    3    2    5    1    3    3    3    2    3 NY(1),-Freq-1
2    2    3    3    2    5    1    5    4    3    2    4 TX(1),-Freq-1
2    2    3    3    2    5    1    5    3    3    2    4 GA(1),CA(1),TX(1),-Freq-3

---------- Post updated at 23:04 ---------- Previous update was at 22:19 ----------

This may come closer to what you need:
Code:
awk '
        {T=$0
         gsub ($1 FS $2 FS "|" FS "*$", "", T)
         FREQ[T]++
         ST[T] = ST[T] $2 FS
         FQST[$2 FS T]++
         BC[T] = $1
        }
END     {for (f in FREQ)        {printf "%s%s%s  ", BC[f], FS, f
                                 n = split (ST[f], TMP)
                                 if (n == 2)    print TMP[1]
                                 else   {for (i=1; i<n; i++) printf "%s(%s)%s", TMP[i], FQST[TMP[i] FS f], i==n-1?"-":","
                                                 printf "Freq-%s\n",  FREQ[f]
                                                }
                                }
        }
' FS="\t" file
BC00010    1    2    3    3    2    5    1    6    4    3    3    3  NY
BC00005    2    2    4    3    2    6    2    2    3    4    3    2  TX(1),NC(1)-Freq-2
BC00008    3    3    3    3    2    5    1    5    3    2    2    4  TX
BC00006    3    3    3    3    2    5    1    5    3    2    2    2  TX
BC00009    3    2    3    3    2    5    1    3    3    3    2    3  NY
BC00007    2    2    3    3    2    5    1    5    4    3    2    4  TX
BC00003    2    2    3    3    2    5    1    5    3    3    2    4  GA(1),CA(1),TX(1)-Freq-3

This User Gave Thanks to RudiC For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to concatenate two strings or several strings into one string in B-shell?

like connect "summer" and "winter" to "summerwinter"? Can anybody help me? thanks a lot. (2 Replies)
Discussion started by: fontana
2 Replies

2. UNIX for Dummies Questions & Answers

Delete strings in file1 based on the list of strings in file2

Hello guys, should be a very easy questn for you: I need to delete strings in file1 based on the list of strings in file2. like file2: word1_word2_ word3_word5_ word3_word4_ word6_word7_ file1: word1_word2_otherwords..,word3_word5_others... (7 Replies)
Discussion started by: roussine
7 Replies

3. Shell Programming and Scripting

Collapsing and counting by key column in a sorted file

Hi I have a tab separated file with reads mappings of more than 2 million reads> the file is sorted by ID and looks like the following: SeqID Seq FreqSeq PosSeq HWI-EA332_0036:5:100:10131:16361#ATGC/1 GACTTGAGGTCTCCCCCGCA 1 TZRTMR_40497:317:+... (4 Replies)
Discussion started by: ramouz87
4 Replies

4. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

5. Shell Programming and Scripting

awk to search similar strings and add their values

Hi, I have a text file with the following content: monday,20 tuesday,10 wednesday,29 monday,10 friday,12 wednesday,14 monday,15 thursday,34 i want the following output: monday,45 tuesday,10 wednesday,43 friday,12 (3 Replies)
Discussion started by: prashu_g
3 Replies

6. Shell Programming and Scripting

awk to search similar strings and arrange in a specified pattern

Hi, I'm running a DB query which returns names of people and writes it in a text file as shown below: Carey, Jim; Cena, John Cena, John Sen, Tim; Burt, Terrence Lock, Jessey; Carey, Jim Norris, Chuck; Lee, Bruce Rock, Dwayne; Lee, Bruce I want to use awk and get all the names... (9 Replies)
Discussion started by: prashu_g
9 Replies

7. UNIX for Dummies Questions & Answers

Finding similar strings between two files

Hi, I have a file1 like this: ABAT ABCA1 ABCC1 ABCC5 ABCC8 ABCE1 ABHD2 ABL1 CAMTA1 ACBD3 ACCN1 And I have a second file like this: chr19 46118590 46119564 MACS_peak_1499 3100.00 chr19 46122009 46148405 CYP2B7P1 -2445 chr1 7430312 7430990... (7 Replies)
Discussion started by: a_bahreini
7 Replies

8. UNIX for Dummies Questions & Answers

Issue when using egrep to extract strings (too many strings)

Dear all, I have a data like below (n of rows=400,000) and I want to extract the rows with certain strings. I use code below. It works if there is not too many strings for example n of strings <5000. while I have 90,000 strings to extract. If I use the egrep code below, I will get error: ... (3 Replies)
Discussion started by: forevertl
3 Replies

9. UNIX for Beginners Questions & Answers

How to pass strings from a list of strings from another file and create multiple files?

Hello Everyone , Iam a newbie to shell programming and iam reaching out if anyone can help in this :- I have two files 1) Insert.txt 2) partition_list.txt insert.txt looks like this :- insert into emp1 partition (partition_name) (a1, b2, c4, s6, d8) select a1, b2, c4, (2 Replies)
Discussion started by: nubie2linux
2 Replies

10. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which... (1 Reply)
Discussion started by: jvoot
1 Replies
PAPS(1) 						      General Commands Manual							   PAPS(1)

NAME
paps - UTF-8 to PostScript converter using Pango SYNOPSIS
paps [options] files... DESCRIPTION
paps reads a UTF-8 encoded file and generates a PostScript language rendering of the file. The rendering is done by creating outline curves through the pango ft2 backend. OPTIONS
These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. --landscape Landscape output. Default is portrait. --columns=cl Number of columns output. Default is 1. Please notice this option isn't related to the terminal length as in a "80 culums terminal". --font=desc Set the font description. Default is Monospace 12. --rtl Do right to left (RTL) layout. --paper ps Choose paper size. Known paper sizes are legal, letter and A4. Default is A4. Postscript points Each postscript point equals to 1/72 of an inch. 36 points are 1/2 of an inch. --bottom-margin=bm Set bottom margin. Default is 36 postscript points. --top-margin=tm Set top margin. Default is 36 postscript points. --left-margin=lm Set left margin. Default is 36 postscript points. --right-margin=rm Set right margin. Default is 36 postscript points. --gutter-width=gw Set gutter width. Default is 40 postscript points. --help Show summary of options. --header Draw page header for each page. --markup Interpret the text as pango markup. --lpi Set the lines per inch. This determines the line spacing. --cpi Set the characters per inch. This is an alternative method of specifying the font size. --stretch-chars Indicates that characters should be stretched in the y-direction to fill up their vertical space. This is similar to the texttops behaviour. AUTHOR
paps was written by Dov Grobgeld <dov.grobgeld@gmail.com>. This manual page was written by Lior Kaplan <kaplan@debian.org>, for the Debian project (but may be used by others). April 17, 2006 PAPS(1)
All times are GMT -4. The time now is 10:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy