Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Extract strings based on the value Post 302923963 by yuejian on Wednesday 5th of November 2014 01:43:02 PM
Old 11-05-2014
Extract strings based on the value

I have a file with multiple columns (in this case, the file has 3 columns):
Code:
NM_001006304 (-33.7)	XM_418228 (-38.4)	JN880447 (-33.7)
CR387600 (-33.7)	CR524203 (-36.3)	GALGA_6AKII_KRT75 (-33.7)
GALGA25_SC7 (-31.9)	CR352795 (-36.3)	NM_204172 (-31.7)
NM_204137 (-31.9)	NM_001030561 (-36.3)	
AB011672 (-31.5)	XM_414526 (-35.3)	
CR386285 (-31.3)	NM_001278076 (-35.3)	
BX930087 (-30.8)	NM_213578 (-35.0)	
CR406893 (-30.6)	NM_205141 (-34.9)	
BX930205 (-30.5)	NM_001277385 (-34.1)	
CR385278 (-30.4)	CR386046 (-34.0)	
CR406366 (-30.4)	NM_001001603 (-33.8)	
NM_001277590 (-30.3)	CR385555 (-33.5)	
XM_414551 (-30.1)	CR407317 (-33.2)	
CR386585 (-30.0)	CR391594 (-33.1)	
CR390278 (-30.0)	CR391382 (-32.8)	
NM_001277979 (-30.0)	XM_004939970 (-32.8)	
CR352458 (-29.9)	J02823 (-32.7)	
CR353040 (-29.9)	X80114 (-32.7)	
CR352882 (-29.8)	BX931544 (-32.5)	
XM_003643271 (-29.7)	CR391698 (-32.2)	
CR389895 (-29.6)	GALGA_UnR_CL2 (-29.5)	
NM_001002856 (-29.5)	L25374 (-28.6)	
BX930628 (-29.3)		
CR407317 (-29.2)		
NM_001199294 (-29.2)		
CR387217 (-28.7)		
CR389430 (-28.7)		
CR388761 (-28.5)		
NM_001185051 (-28.1)		
CR390290 (-27.9)		
GALGA25_CL8 (-27.1)		
GALGA25_CL4 (-26.8)

the strings in each column has been sorted ascendingly by the value in the parenthesis as you can see. For each column, how can I extract the string with the top “n” lowest values in the parenthesis? If n is 1, I just want to extract the string with the lowest value in the parenthesis for each column, so the output file is like this:
Code:
NM_001006304 (-33.7)	XM_418228 (-38.4)	JN880447 (-33.7)
CR387600 (-33.7)		                GALGA_6AKII_KRT75 (-33.7)

If n is 2, I want to extract strings with top 2 lowest values in the parenthesis for each column, so the output file is like this:
Code:
NM_001006304 (-33.7)	XM_418228 (-38.4)	JN880447 (-33.7)
CR387600 (-33.7)	CR524203 (-36.3)	GALGA_6AKII_KRT75 (-33.7)
GALGA25_SC7 (-31.9)	CR352795 (-36.3)	NM_204172 (-31.7)
NM_204137 (-31.9)	NM_001030561 (-36.3)

If n is 3, I want to extract strings with top 3 lowest values in the parenthesis for each column, although the third column has only 2 different values (-33.7 and -31.7), I will still extract all strings in this column if it’s available. so the output file is like this:
Code:
NM_001006304 (-33.7)	XM_418228 (-38.4)	JN880447 (-33.7)
CR387600 (-33.7)	CR524203 (-36.3)	GALGA_6AKII_KRT75 (-33.7)
GALGA25_SC7 (-31.9)	CR352795 (-36.3)	NM_204172 (-31.7)
NM_204137 (-31.9)	NM_001030561 (-36.3)	
AB011672 (-31.5)	XM_414526 (-35.3)	
	                NM_001278076 (-35.3)

Thank you in advance!

Last edited by yuejian; 11-05-2014 at 03:30 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extract strings between tags

Hi, I have data as follows in a text file <key='data1'> <String>abcdef</String> <String>abcdef1</String> <String>abcdef2</String> </key> <key='data2'> <String>abcdef</String> <String>abcdef1</String> <String>abcdef2</String> <String>abcdef3</String> </key> Is there a way i... (10 Replies)
Discussion started by: userscript
10 Replies

2. Shell Programming and Scripting

How extract strings (perl)

Sample data: revision001 | some text | some text Comment: some comment Brief: 1) brief 2) brief ------------------------------------------ revision002 | some text | some text Brief: 1) brief 2) brief FIX: some fix ------------------------------------------ revision003 | some... (8 Replies)
Discussion started by: inotech
8 Replies

3. Shell Programming and Scripting

Extract data between two strings

Hi , I have a billing CDR file which has repeated lines as indicated below and I need to extract data between two strings (i.e.: <?> and </?>). Eventually, map that information with the corresponding field. I'm new to unix, any help will be greatly appreciated. Gamini Input (single line): !... (3 Replies)
Discussion started by: jaygamini
3 Replies

4. UNIX for Dummies Questions & Answers

Delete strings in file1 based on the list of strings in file2

Hello guys, should be a very easy questn for you: I need to delete strings in file1 based on the list of strings in file2. like file2: word1_word2_ word3_word5_ word3_word4_ word6_word7_ file1: word1_word2_otherwords..,word3_word5_others... (7 Replies)
Discussion started by: roussine
7 Replies

5. Shell Programming and Scripting

ksh: how to extract strings from each line based on a condition

Hi , I'm a newbie.Never worked on Unix before. I want a shell script to perform the following: I want to extract strings from each line ,based on the type of line(Nameline,Subline) and output it to another file.Below is a sample format. 2010-12-21 14:00"1"Nameline"Midterm"First Name:Jane ... (4 Replies)
Discussion started by: angie1234
4 Replies

6. Shell Programming and Scripting

Extract two strings from a file and create a new file with these strings

I have the following lines in a log file. It would be great if some one can help me to create a new file with the just entries in the below format. 66.150.161.195 HPSAC=Z05 66.150.161.196 HPSAC=A05 That is just extract the IP address and the string DPSAC=its value 66.150.161.195 -... (1 Reply)
Discussion started by: Tuxidow
1 Replies

7. UNIX for Dummies Questions & Answers

Extract code between 2 strings.

Hi, Im having some problems with this. I have loaded a file with html code. All code is placed in the same line. I want to get everything between two given strings (including these strings and get only the first appearance). Example: File contains <html><body><a href='a.html'>abc</a><a... (5 Replies)
Discussion started by: ngb
5 Replies

8. UNIX for Dummies Questions & Answers

Issue when using egrep to extract strings (too many strings)

Dear all, I have a data like below (n of rows=400,000) and I want to extract the rows with certain strings. I use code below. It works if there is not too many strings for example n of strings <5000. while I have 90,000 strings to extract. If I use the egrep code below, I will get error: ... (3 Replies)
Discussion started by: forevertl
3 Replies

9. UNIX for Beginners Questions & Answers

Extract content between strings

Hello i am stuck with this. i have input which is as follows /type/work /works/OL10627594W 3 2019-04-24T16:46:21.351549 {"created": {"type": "/type/datetime", "value": "2009-12-11T03:18:17.488715"}, "title": "Tog the dog", "covers": , "last_modified": {"type":... (3 Replies)
Discussion started by: ahfze
3 Replies

10. Shell Programming and Scripting

Extract strings from output

I am having the following output when executing a dig command : dig @1.1.1.1 google.com +noall +answer +stats ; <<>> DiG 9.11.4-P1 <<>> @1.1.1.1 google.com +noall +answer +stats ; (1 server found) ;; global options: +cmd obodrm.prod.at.dmdsdp.com. 86154 IN A ... (1 Reply)
Discussion started by: liviusbr
1 Replies
Locale::Codes::LangFam(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangFam(3pm)

NAME
Locale::Codes::LangFam - standard codes for language extension identification SYNOPSIS
use Locale::Codes::LangFam; $lext = code2langfam('apa'); # $lext gets 'Apache languages' $code = langfam2code('Apache languages'); # $code gets 'apa' @codes = all_langfam_codes(); @names = all_langfam_names(); DESCRIPTION
The "Locale::Codes::LangFam" module provides access to standard codes used for identifying language families, such as those as defined in ISO 639-5. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 639-5 language family codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying language families. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lext = code2langfam('apa','alpha'); $lext = code2langfam('apa',LOCALE_LANGFAM_ALPHA); The codesets currently supported are: alpha This is the set of three-letter (lowercase) codes from ISO 639-5 such as 'apa' for Apache languages. This is the default code set. ROUTINES
code2langfam ( CODE [,CODESET] ) langfam2code ( NAME [,CODESET] ) langfam_code2code ( CODE ,CODESET ,CODESET2 ) all_langfam_codes ( [CODESET] ) all_langfam_names ( [CODESET] ) Locale::Codes::LangFam::rename_langfam ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::LangFam::add_langfam ( CODE ,NAME [,CODESET] ) Locale::Codes::LangFam::delete_langfam ( CODE [,CODESET] ) Locale::Codes::LangFam::add_langfam_alias ( NAME ,NEW_NAME ) Locale::Codes::LangFam::delete_langfam_alias ( NAME ) Locale::Codes::LangFam::rename_langfam_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangFam::add_langfam_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangFam::delete_langfam_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.loc.gov/standards/iso639-5/id.php ISO 639-5 . AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 2011-2013 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.18.2 2013-11-04 Locale::Codes::LangFam(3pm)
All times are GMT -4. The time now is 02:22 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy