Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Extract strings based on the value Post 302923963 by yuejian on Wednesday 5th of November 2014 01:43:02 PM
Old 11-05-2014
Extract strings based on the value

I have a file with multiple columns (in this case, the file has 3 columns):
Code:
NM_001006304 (-33.7)	XM_418228 (-38.4)	JN880447 (-33.7)
CR387600 (-33.7)	CR524203 (-36.3)	GALGA_6AKII_KRT75 (-33.7)
GALGA25_SC7 (-31.9)	CR352795 (-36.3)	NM_204172 (-31.7)
NM_204137 (-31.9)	NM_001030561 (-36.3)	
AB011672 (-31.5)	XM_414526 (-35.3)	
CR386285 (-31.3)	NM_001278076 (-35.3)	
BX930087 (-30.8)	NM_213578 (-35.0)	
CR406893 (-30.6)	NM_205141 (-34.9)	
BX930205 (-30.5)	NM_001277385 (-34.1)	
CR385278 (-30.4)	CR386046 (-34.0)	
CR406366 (-30.4)	NM_001001603 (-33.8)	
NM_001277590 (-30.3)	CR385555 (-33.5)	
XM_414551 (-30.1)	CR407317 (-33.2)	
CR386585 (-30.0)	CR391594 (-33.1)	
CR390278 (-30.0)	CR391382 (-32.8)	
NM_001277979 (-30.0)	XM_004939970 (-32.8)	
CR352458 (-29.9)	J02823 (-32.7)	
CR353040 (-29.9)	X80114 (-32.7)	
CR352882 (-29.8)	BX931544 (-32.5)	
XM_003643271 (-29.7)	CR391698 (-32.2)	
CR389895 (-29.6)	GALGA_UnR_CL2 (-29.5)	
NM_001002856 (-29.5)	L25374 (-28.6)	
BX930628 (-29.3)		
CR407317 (-29.2)		
NM_001199294 (-29.2)		
CR387217 (-28.7)		
CR389430 (-28.7)		
CR388761 (-28.5)		
NM_001185051 (-28.1)		
CR390290 (-27.9)		
GALGA25_CL8 (-27.1)		
GALGA25_CL4 (-26.8)

the strings in each column has been sorted ascendingly by the value in the parenthesis as you can see. For each column, how can I extract the string with the top “n” lowest values in the parenthesis? If n is 1, I just want to extract the string with the lowest value in the parenthesis for each column, so the output file is like this:
Code:
NM_001006304 (-33.7)	XM_418228 (-38.4)	JN880447 (-33.7)
CR387600 (-33.7)		                GALGA_6AKII_KRT75 (-33.7)

If n is 2, I want to extract strings with top 2 lowest values in the parenthesis for each column, so the output file is like this:
Code:
NM_001006304 (-33.7)	XM_418228 (-38.4)	JN880447 (-33.7)
CR387600 (-33.7)	CR524203 (-36.3)	GALGA_6AKII_KRT75 (-33.7)
GALGA25_SC7 (-31.9)	CR352795 (-36.3)	NM_204172 (-31.7)
NM_204137 (-31.9)	NM_001030561 (-36.3)

If n is 3, I want to extract strings with top 3 lowest values in the parenthesis for each column, although the third column has only 2 different values (-33.7 and -31.7), I will still extract all strings in this column if it’s available. so the output file is like this:
Code:
NM_001006304 (-33.7)	XM_418228 (-38.4)	JN880447 (-33.7)
CR387600 (-33.7)	CR524203 (-36.3)	GALGA_6AKII_KRT75 (-33.7)
GALGA25_SC7 (-31.9)	CR352795 (-36.3)	NM_204172 (-31.7)
NM_204137 (-31.9)	NM_001030561 (-36.3)	
AB011672 (-31.5)	XM_414526 (-35.3)	
	                NM_001278076 (-35.3)

Thank you in advance!

Last edited by yuejian; 11-05-2014 at 03:30 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extract strings between tags

Hi, I have data as follows in a text file <key='data1'> <String>abcdef</String> <String>abcdef1</String> <String>abcdef2</String> </key> <key='data2'> <String>abcdef</String> <String>abcdef1</String> <String>abcdef2</String> <String>abcdef3</String> </key> Is there a way i... (10 Replies)
Discussion started by: userscript
10 Replies

2. Shell Programming and Scripting

How extract strings (perl)

Sample data: revision001 | some text | some text Comment: some comment Brief: 1) brief 2) brief ------------------------------------------ revision002 | some text | some text Brief: 1) brief 2) brief FIX: some fix ------------------------------------------ revision003 | some... (8 Replies)
Discussion started by: inotech
8 Replies

3. Shell Programming and Scripting

Extract data between two strings

Hi , I have a billing CDR file which has repeated lines as indicated below and I need to extract data between two strings (i.e.: <?> and </?>). Eventually, map that information with the corresponding field. I'm new to unix, any help will be greatly appreciated. Gamini Input (single line): !... (3 Replies)
Discussion started by: jaygamini
3 Replies

4. UNIX for Dummies Questions & Answers

Delete strings in file1 based on the list of strings in file2

Hello guys, should be a very easy questn for you: I need to delete strings in file1 based on the list of strings in file2. like file2: word1_word2_ word3_word5_ word3_word4_ word6_word7_ file1: word1_word2_otherwords..,word3_word5_others... (7 Replies)
Discussion started by: roussine
7 Replies

5. Shell Programming and Scripting

ksh: how to extract strings from each line based on a condition

Hi , I'm a newbie.Never worked on Unix before. I want a shell script to perform the following: I want to extract strings from each line ,based on the type of line(Nameline,Subline) and output it to another file.Below is a sample format. 2010-12-21 14:00"1"Nameline"Midterm"First Name:Jane ... (4 Replies)
Discussion started by: angie1234
4 Replies

6. Shell Programming and Scripting

Extract two strings from a file and create a new file with these strings

I have the following lines in a log file. It would be great if some one can help me to create a new file with the just entries in the below format. 66.150.161.195 HPSAC=Z05 66.150.161.196 HPSAC=A05 That is just extract the IP address and the string DPSAC=its value 66.150.161.195 -... (1 Reply)
Discussion started by: Tuxidow
1 Replies

7. UNIX for Dummies Questions & Answers

Extract code between 2 strings.

Hi, Im having some problems with this. I have loaded a file with html code. All code is placed in the same line. I want to get everything between two given strings (including these strings and get only the first appearance). Example: File contains <html><body><a href='a.html'>abc</a><a... (5 Replies)
Discussion started by: ngb
5 Replies

8. UNIX for Dummies Questions & Answers

Issue when using egrep to extract strings (too many strings)

Dear all, I have a data like below (n of rows=400,000) and I want to extract the rows with certain strings. I use code below. It works if there is not too many strings for example n of strings <5000. while I have 90,000 strings to extract. If I use the egrep code below, I will get error: ... (3 Replies)
Discussion started by: forevertl
3 Replies

9. UNIX for Beginners Questions & Answers

Extract content between strings

Hello i am stuck with this. i have input which is as follows /type/work /works/OL10627594W 3 2019-04-24T16:46:21.351549 {"created": {"type": "/type/datetime", "value": "2009-12-11T03:18:17.488715"}, "title": "Tog the dog", "covers": , "last_modified": {"type":... (3 Replies)
Discussion started by: ahfze
3 Replies

10. Shell Programming and Scripting

Extract strings from output

I am having the following output when executing a dig command : dig @1.1.1.1 google.com +noall +answer +stats ; <<>> DiG 9.11.4-P1 <<>> @1.1.1.1 google.com +noall +answer +stats ; (1 server found) ;; global options: +cmd obodrm.prod.at.dmdsdp.com. 86154 IN A ... (1 Reply)
Discussion started by: liviusbr
1 Replies
COLRM(1)						    BSD General Commands Manual 						  COLRM(1)

NAME
colrm -- remove columns from a file SYNOPSIS
colrm [start [stop]] DESCRIPTION
The colrm utility removes selected columns from the lines of a file. A column is defined as a single character in a line. Input is read from the standard input. Output is written to the standard output. If only the start column is specified, columns numbered less than the start column will be written. If both start and stop columns are spec- ified, columns numbered less than the start column or greater than the stop column will be written. Column numbering starts with one, not zero. Tab characters increment the column count to the next multiple of eight. Backspace characters decrement the column count by one. ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of colrm as described in environ(7). EXIT STATUS
The colrm utility exits 0 on success, and >0 if an error occurs. SEE ALSO
awk(1), column(1), cut(1), paste(1) HISTORY
The colrm command appeared in 3.0BSD. BSD
August 4, 2004 BSD
All times are GMT -4. The time now is 06:44 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy