Sponsored Content
Full Discussion: Collapsing similar strings
Top Forums UNIX for Dummies Questions & Answers Collapsing similar strings Post 302962903 by Xterra on Sunday 20th of December 2015 09:36:56 PM
Old 12-20-2015
Rudi
Awesome! Would you mind explain it the code a bit?
Thanks

---------- Post updated at 09:36 PM ---------- Previous update was at 05:32 PM ----------

After testing the script, I came to realized that it does not do exactly what I need. Using the following infile (slight variation from my initial file):
Code:
BC00001 GA      2       2       3       3       2       5       1       5       3       3       2       4
BC00002 CA      2       2       3       3       2       5       1       5       3       3       2       4
BC00003 TX      2       2       3       3       2       5       1       5       3       3       2       4
BC00004 TX      2       2       4       3       2       6       2       2       3       4       3       2
BC00005 NC      2       2       4       3       2       6       2       2       3       4       3       2
BC00006 TX      3       3       3       3       2       5       1       5       3       2       2       2
BC00007 TX      2       2       3       3       2       5       1       5       4       3       2       4
BC00008 TX      3       3       3       3       2       5       1       5       3       2       2       4
BC00009 NY      3       2       3       3       2       5       1       3       3       3       2       3
BC00010 NY      1       2       3       3       2       5       1       6       4       3       3       3
BC00011 CA      2       2       3       3       2       5       1       5       3       3       2       4

This is what I get with Rudi's script:
Code:
BC00010 1       2       3       3       2       5       1       6       4       3       3       3  NY
BC00006 3       3       3       3       2       5       1       5       3       2       2       2  TX
BC00008 3       3       3       3       2       5       1       5       3       2       2       4  TX
BC00007 2       2       3       3       2       5       1       5       4       3       2       4  TX
BC00005 2       2       4       3       2       6       2       2       3       4       3       2  TX(1),NC(1)-Freq-2
BC00011 2       2       3       3       2       5       1       5       3       3       2       4  GA(1),CA(2),TX(1),CA(2)-Freq-4
BC00009 3       2       3       3       2       5       1       3       3       3       2       3  NY

However, this is what I need:
Code:
BC00010 1       2       3       3       2       5       1       6       4       3       3       3  NY
BC00006 3       3       3       3       2       5       1       5       3       2       2       2  TX
BC00008 3       3       3       3       2       5       1       5       3       2       2       4  TX
BC00007 2       2       3       3       2       5       1       5       4       3       2       4  TX
BC00005 2       2       4       3       2       6       2       2       3       4       3       2  TX(1),NC(1)-Freq-2
BC00011 2       2       3       3       2       5       1       5       3       3       2       4  GA(1),CA(2),TX(1)-Freq-4
BC00009 3       2       3       3       2       5       1       3       3       3       2       3  NY

As you can see, the cumulative number for CA is correct, but repeated
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to concatenate two strings or several strings into one string in B-shell?

like connect "summer" and "winter" to "summerwinter"? Can anybody help me? thanks a lot. (2 Replies)
Discussion started by: fontana
2 Replies

2. UNIX for Dummies Questions & Answers

Delete strings in file1 based on the list of strings in file2

Hello guys, should be a very easy questn for you: I need to delete strings in file1 based on the list of strings in file2. like file2: word1_word2_ word3_word5_ word3_word4_ word6_word7_ file1: word1_word2_otherwords..,word3_word5_others... (7 Replies)
Discussion started by: roussine
7 Replies

3. Shell Programming and Scripting

Collapsing and counting by key column in a sorted file

Hi I have a tab separated file with reads mappings of more than 2 million reads> the file is sorted by ID and looks like the following: SeqID Seq FreqSeq PosSeq HWI-EA332_0036:5:100:10131:16361#ATGC/1 GACTTGAGGTCTCCCCCGCA 1 TZRTMR_40497:317:+... (4 Replies)
Discussion started by: ramouz87
4 Replies

4. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

5. Shell Programming and Scripting

awk to search similar strings and add their values

Hi, I have a text file with the following content: monday,20 tuesday,10 wednesday,29 monday,10 friday,12 wednesday,14 monday,15 thursday,34 i want the following output: monday,45 tuesday,10 wednesday,43 friday,12 (3 Replies)
Discussion started by: prashu_g
3 Replies

6. Shell Programming and Scripting

awk to search similar strings and arrange in a specified pattern

Hi, I'm running a DB query which returns names of people and writes it in a text file as shown below: Carey, Jim; Cena, John Cena, John Sen, Tim; Burt, Terrence Lock, Jessey; Carey, Jim Norris, Chuck; Lee, Bruce Rock, Dwayne; Lee, Bruce I want to use awk and get all the names... (9 Replies)
Discussion started by: prashu_g
9 Replies

7. UNIX for Dummies Questions & Answers

Finding similar strings between two files

Hi, I have a file1 like this: ABAT ABCA1 ABCC1 ABCC5 ABCC8 ABCE1 ABHD2 ABL1 CAMTA1 ACBD3 ACCN1 And I have a second file like this: chr19 46118590 46119564 MACS_peak_1499 3100.00 chr19 46122009 46148405 CYP2B7P1 -2445 chr1 7430312 7430990... (7 Replies)
Discussion started by: a_bahreini
7 Replies

8. UNIX for Dummies Questions & Answers

Issue when using egrep to extract strings (too many strings)

Dear all, I have a data like below (n of rows=400,000) and I want to extract the rows with certain strings. I use code below. It works if there is not too many strings for example n of strings <5000. while I have 90,000 strings to extract. If I use the egrep code below, I will get error: ... (3 Replies)
Discussion started by: forevertl
3 Replies

9. UNIX for Beginners Questions & Answers

How to pass strings from a list of strings from another file and create multiple files?

Hello Everyone , Iam a newbie to shell programming and iam reaching out if anyone can help in this :- I have two files 1) Insert.txt 2) partition_list.txt insert.txt looks like this :- insert into emp1 partition (partition_name) (a1, b2, c4, s6, d8) select a1, b2, c4, (2 Replies)
Discussion started by: nubie2linux
2 Replies

10. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which... (1 Reply)
Discussion started by: jvoot
1 Replies
Locale::Codes::LangVar(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangVar(3pm)

NAME
Locale::Codes::LangVar - standard codes for language variation identification SYNOPSIS
use Locale::Codes::LangVar; $lvar = code2langvar('acm'); # $lvar gets 'Mesopotamian Arabic' $code = langvar2code('Mesopotamian Arabic'); # $code gets 'acm' @codes = all_langvar_codes(); @names = all_langvar_names(); DESCRIPTION
The "Locale::Codes::LangVar" module provides access to standard codes used for identifying language variations, such as those as defined in the IANA language registry. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default IANA language registry codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying language variations. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lvar = code2langvar('en','alpha-2'); $lvar = code2langvar('en',LOCALE_CODE_ALPHA_2); The codesets currently supported are: alpha This is the set of alphanumeric codes from the IANA language registry, such as 'arevela' for Eastern Armenian. This code set is identified with the symbol "LOCALE_LANGVAR_ALPHA". This is the default code set. ROUTINES
code2langvar ( CODE [,CODESET] ) langvar2code ( NAME [,CODESET] ) langvar_code2code ( CODE ,CODESET ,CODESET2 ) all_langvar_codes ( [CODESET] ) all_langvar_names ( [CODESET] ) Locale::Codes::LangVar::rename_langvar ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::LangVar::add_langvar ( CODE ,NAME [,CODESET] ) Locale::Codes::LangVar::delete_langvar ( CODE [,CODESET] ) Locale::Codes::LangVar::add_langvar_alias ( NAME ,NEW_NAME ) Locale::Codes::LangVar::delete_langvar_alias ( NAME ) Locale::Codes::LangVar::rename_langvar_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangVar::add_langvar_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangVar::delete_langvar_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.iana.org/assignments/language-subtag-registry The IANA language subtag registry. AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 2011-2013 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.18.2 2014-01-06 Locale::Codes::LangVar(3pm)
All times are GMT -4. The time now is 06:08 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy