Collapsing similar strings Post: 302962903

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers Collapsing similar strings Post 302962903 by Xterra on Sunday 20th of December 2015 09:36:56 PM

12-20-2015

Registered User

Rudi
Awesome! Would you mind explain it the code a bit?
Thanks

---------- Post updated at 09:36 PM ---------- Previous update was at 05:32 PM ----------

After testing the script, I came to realized that it does not do exactly what I need. Using the following infile (slight variation from my initial file):

Code:

BC00001 GA      2       2       3       3       2       5       1       5       3       3       2       4
BC00002 CA      2       2       3       3       2       5       1       5       3       3       2       4
BC00003 TX      2       2       3       3       2       5       1       5       3       3       2       4
BC00004 TX      2       2       4       3       2       6       2       2       3       4       3       2
BC00005 NC      2       2       4       3       2       6       2       2       3       4       3       2
BC00006 TX      3       3       3       3       2       5       1       5       3       2       2       2
BC00007 TX      2       2       3       3       2       5       1       5       4       3       2       4
BC00008 TX      3       3       3       3       2       5       1       5       3       2       2       4
BC00009 NY      3       2       3       3       2       5       1       3       3       3       2       3
BC00010 NY      1       2       3       3       2       5       1       6       4       3       3       3
BC00011 CA      2       2       3       3       2       5       1       5       3       3       2       4

This is what I get with Rudi's script:

Code:

BC00010 1       2       3       3       2       5       1       6       4       3       3       3  NY
BC00006 3       3       3       3       2       5       1       5       3       2       2       2  TX
BC00008 3       3       3       3       2       5       1       5       3       2       2       4  TX
BC00007 2       2       3       3       2       5       1       5       4       3       2       4  TX
BC00005 2       2       4       3       2       6       2       2       3       4       3       2  TX(1),NC(1)-Freq-2
BC00011 2       2       3       3       2       5       1       5       3       3       2       4  GA(1),CA(2),TX(1),CA(2)-Freq-4
BC00009 3       2       3       3       2       5       1       3       3       3       2       3  NY

However, this is what I need:

Code:

BC00010 1       2       3       3       2       5       1       6       4       3       3       3  NY
BC00006 3       3       3       3       2       5       1       5       3       2       2       2  TX
BC00008 3       3       3       3       2       5       1       5       3       2       2       4  TX
BC00007 2       2       3       3       2       5       1       5       4       3       2       4  TX
BC00005 2       2       4       3       2       6       2       2       3       4       3       2  TX(1),NC(1)-Freq-2
BC00011 2       2       3       3       2       5       1       5       3       3       2       4  GA(1),CA(2),TX(1)-Freq-4
BC00009 3       2       3       3       2       5       1       3       3       3       2       3  NY

As you can see, the cumulative number for CA is correct, but repeated

Xterra

View Public Profile for Xterra

Find all posts by Xterra

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to concatenate two strings or several strings into one string in B-shell?

like connect "summer" and "winter" to "summerwinter"? Can anybody help me? thanks a lot.

2. UNIX for Dummies Questions & Answers

Delete strings in file1 based on the list of strings in file2

Hello guys, should be a very easy questn for you: I need to delete strings in file1 based on the list of strings in file2. like file2: word1_word2_ word3_word5_ word3_word4_ word6_word7_ file1: word1_word2_otherwords..,word3_word5_others...

3. Shell Programming and Scripting

Collapsing and counting by key column in a sorted file

Hi I have a tab separated file with reads mappings of more than 2 million reads> the file is sorted by ID and looks like the following: SeqID Seq FreqSeq PosSeq HWI-EA332_0036:5:100:10131:16361#ATGC/1 GACTTGAGGTCTCCCCCGCA 1 TZRTMR_40497:317:+...

4. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of...

5. Shell Programming and Scripting

awk to search similar strings and add their values

Hi, I have a text file with the following content: monday,20 tuesday,10 wednesday,29 monday,10 friday,12 wednesday,14 monday,15 thursday,34 i want the following output: monday,45 tuesday,10 wednesday,43 friday,12

6. Shell Programming and Scripting

awk to search similar strings and arrange in a specified pattern

Hi, I'm running a DB query which returns names of people and writes it in a text file as shown below: Carey, Jim; Cena, John Cena, John Sen, Tim; Burt, Terrence Lock, Jessey; Carey, Jim Norris, Chuck; Lee, Bruce Rock, Dwayne; Lee, Bruce I want to use awk and get all the names...

7. UNIX for Dummies Questions & Answers

Finding similar strings between two files

Hi, I have a file1 like this: ABAT ABCA1 ABCC1 ABCC5 ABCC8 ABCE1 ABHD2 ABL1 CAMTA1 ACBD3 ACCN1 And I have a second file like this: chr19 46118590 46119564 MACS_peak_1499 3100.00 chr19 46122009 46148405 CYP2B7P1 -2445 chr1 7430312 7430990...

8. UNIX for Dummies Questions & Answers

Issue when using egrep to extract strings (too many strings)

Dear all, I have a data like below (n of rows=400,000) and I want to extract the rows with certain strings. I use code below. It works if there is not too many strings for example n of strings <5000. while I have 90,000 strings to extract. If I use the egrep code below, I will get error: ...

9. UNIX for Beginners Questions & Answers

How to pass strings from a list of strings from another file and create multiple files?

Hello Everyone , Iam a newbie to shell programming and iam reaching out if anyone can help in this :- I have two files 1) Insert.txt 2) partition_list.txt insert.txt looks like this :- insert into emp1 partition (partition_name) (a1, b2, c4, s6, d8) select a1, b2, c4,

10. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which...

LEARN ABOUT MOJAVE

locale::codes::langvar

Locale::Codes::LangVar(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangVar(3pm)

NAME

       Locale::Codes::LangVar - standard codes for language variation identification

SYNOPSIS

	  use Locale::Codes::LangVar;

	  $lvar = code2langvar('acm');		       # $lvar gets 'Mesopotamian Arabic'
	  $code = langvar2code('Mesopotamian Arabic'); # $code gets 'acm'

	  @codes   = all_langvar_codes();
	  @names   = all_langvar_names();

DESCRIPTION

       The "Locale::Codes::LangVar" module provides access to standard codes used for identifying language variations, such as those as defined in
       the IANA language registry.

       Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default IANA language
       registry codes will be used.

SUPPORTED CODE SETS

       There are several different code sets you can use for identifying language variations. A code set may be specified using either a name, or
       a constant that is automatically exported by this module.

       For example, the two are equivalent:

	  $lvar = code2langvar('en','alpha-2');
	  $lvar = code2langvar('en',LOCALE_CODE_ALPHA_2);

       The codesets currently supported are:

       alpha
	   This is the set of alphanumeric codes from the IANA language registry, such as 'arevela' for Eastern Armenian.

	   This code set is identified with the symbol "LOCALE_LANGVAR_ALPHA".

	   This is the default code set.

ROUTINES

       code2langvar ( CODE [,CODESET] )
       langvar2code ( NAME [,CODESET] )
       langvar_code2code ( CODE ,CODESET ,CODESET2 )
       all_langvar_codes ( [CODESET] )
       all_langvar_names ( [CODESET] )
       Locale::Codes::LangVar::rename_langvar  ( CODE ,NEW_NAME [,CODESET] )
       Locale::Codes::LangVar::add_langvar  ( CODE ,NAME [,CODESET] )
       Locale::Codes::LangVar::delete_langvar  ( CODE [,CODESET] )
       Locale::Codes::LangVar::add_langvar_alias  ( NAME ,NEW_NAME )
       Locale::Codes::LangVar::delete_langvar_alias  ( NAME )
       Locale::Codes::LangVar::rename_langvar_code  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Codes::LangVar::add_langvar_code_alias  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Codes::LangVar::delete_langvar_code_alias  ( CODE [,CODESET] )
	   These routines are all documented in the Locale::Codes::API man page.

SEE ALSO

       Locale::Codes
	   The Locale-Codes distribution.

       Locale::Codes::API
	   The list of functions supported by this module.

       http://www.iana.org/assignments/language-subtag-registry
	   The IANA language subtag registry.

AUTHOR

       See Locale::Codes for full author history.

       Currently maintained by Sullivan Beck (sbeck@cpan.org).

COPYRIGHT

	  Copyright (c) 2011-2013 Sullivan Beck

       This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.18.2							    2014-01-06					       Locale::Codes::LangVar(3pm)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to concatenate two strings or several strings into one string in B-shell?

Discussion started by: fontana

2. UNIX for Dummies Questions & Answers

Delete strings in file1 based on the list of strings in file2

Discussion started by: roussine

3. Shell Programming and Scripting

Collapsing and counting by key column in a sorted file

Discussion started by: ramouz87

4. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

Discussion started by: raidzero

5. Shell Programming and Scripting

awk to search similar strings and add their values

Discussion started by: prashu_g

6. Shell Programming and Scripting

awk to search similar strings and arrange in a specified pattern

Discussion started by: prashu_g

7. UNIX for Dummies Questions & Answers

Finding similar strings between two files

Discussion started by: a_bahreini

8. UNIX for Dummies Questions & Answers

Issue when using egrep to extract strings (too many strings)

Discussion started by: forevertl

9. UNIX for Beginners Questions & Answers

How to pass strings from a list of strings from another file and create multiple files?

Discussion started by: nubie2linux

10. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

Discussion started by: jvoot

LEARN ABOUT MOJAVE

locale::codes::langvar