Sponsored Content
Top Forums Shell Programming and Scripting Sort file by field 1 that has text as well as a number Post 302956822 by drl on Sunday 4th of October 2015 11:03:01 AM
Old 10-04-2015
Hi.

Utility msort allows fields to be described as hybrid, mixed characters and numeric:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate sort of mixed field, "hybrid", with msort.
# If msort is not in repository:
# http://freecode.com/projects/msort

LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C msort

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results of msort:"
msort -l -q -j -d: -n 1 -c hybrid $FILE

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
msort 8.44

-----
 Input data file data1:
chr20:43625799-43625957 STK4:exon.6;STK4:exon.7 310.703
chr20:36770455-36770611 TGM2:exon.6;TGM2:exon.7 614.756
chr20:19945585-19945678 RIN2:exon.6;RIN2:exon.7 175.258
chr20:10632768-10632908 JAG1:exon.5;JAG1:exon.7 319.586
chr20:8630010-8630106 PLCB1:exon.1;PLCB1:exon.7 183.188
chr19:17952438-17952581 JAK3:exon.2;JAK3:exon.6;JAK3:exon.7 306.566
chr19:13051547-13051711 CALR:exon.3;CALR:exon.6;CALR:exon.7 337.811
chr19:13006795-13006945 GCDH:exon.5;GCDH:exon.6;GCDH:exon.7 628.62
chr19:11491549-11491657 EPOR:exon.1;EPOR:exon.6;EPOR:exon.7 301.87
chr18:3456341-3456588 TGIF1:exon.1;TGIF1:exon.2;TGIF1:exon.3 430.332
chr15:90630333-90630505 IDH2:exon.5;IDH2:exon.7 516.128

-----
 Results of msort:
chr15:90630333-90630505 IDH2:exon.5;IDH2:exon.7 516.128
chr18:3456341-3456588 TGIF1:exon.1;TGIF1:exon.2;TGIF1:exon.3 430.332
chr19:17952438-17952581 JAK3:exon.2;JAK3:exon.6;JAK3:exon.7 306.566
chr19:13051547-13051711 CALR:exon.3;CALR:exon.6;CALR:exon.7 337.811
chr19:13006795-13006945 GCDH:exon.5;GCDH:exon.6;GCDH:exon.7 628.62
chr19:11491549-11491657 EPOR:exon.1;EPOR:exon.6;EPOR:exon.7 301.87
chr20:43625799-43625957 STK4:exon.6;STK4:exon.7 310.703
chr20:10632768-10632908 JAG1:exon.5;JAG1:exon.7 319.586
chr20:8630010-8630106 PLCB1:exon.1;PLCB1:exon.7 183.188
chr20:36770455-36770611 TGM2:exon.6;TGM2:exon.7 614.756
chr20:19945585-19945678 RIN2:exon.6;RIN2:exon.7 175.258

See link listed in script if msort is not your repository ... cheers, drl
This User Gave Thanks to drl For This Post:
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to sort a field in a file having date values

Hi All, I am having a pipe delimited file .In this file the 3rd column is having date values.I need to get the min date and max date from that file. I have used cut -d '|' test.dat -f 3|sort -u But it is not sorting the date .How to sort the date column using unix commands Thanks ... (4 Replies)
Discussion started by: risshanth
4 Replies

2. Shell Programming and Scripting

Sort alpha on 1st field, numerical on 2nd field (sci notation)

I want to sort alphabetically on the first field and sort in descending numerical order on the 2nd field. With a normal "sort -r -n" it does this: abc ||| 5e-05 ||| bla abc ||| 3 ||| ble def ||| 1 ||| abc def ||| 0.2 ||| def As you can see it ignores the fact that 5e-05 is actually 0.00005... (1 Reply)
Discussion started by: FrancoisCN
1 Replies

3. Shell Programming and Scripting

need Shell script for Sort BASED ON FIRST FIELD and PRINT THE WHOLE FILE WITHOUT DUPLICATES

Can some one provide me a shell script. I have file with many columns and many rows. need to sort the first column and then remove the duplicates records if exists.. finally print the full data with first coulm as unique. Sort BASED ON FIRST FIELD and remove the duplicates if exists... (2 Replies)
Discussion started by: tuffEnuff
2 Replies

4. UNIX for Dummies Questions & Answers

Inserting a sequential number into a field on a flat file

I have a csv flatfile with a few million rows. I need to replace a field (field number is 85) in the file with a sequential number. As an example, let's assume there are only 4 fields in the file: A,A,,32 A,A,,27 A,B,,43 C,C,,354 If I wanted to amend the 3rd field in this way my... (2 Replies)
Discussion started by: BristolSmithy
2 Replies

5. UNIX for Dummies Questions & Answers

Sort Files based on the number(s) on the file name

Experts I have a list of files in the directory mysample1 mysample2 mysample3 mysample4 mysample5 mysample6 mysample7 mysample8 mysample9 mysample10 mysample11 mysample12 mysample13 mysample14 mysample15 (4 Replies)
Discussion started by: dsedi
4 Replies

6. Shell Programming and Scripting

Sort the file based on number of occurences

I have a file (input) I want to sort the file based on the number of times a pattern in the first column occurs for example grapes occurs 4 times in combination with other patterns so i want it to be first like shown in the output file. then apple ocuurs thrice so it occupies second position and so... (7 Replies)
Discussion started by: anurupa777
7 Replies

7. Shell Programming and Scripting

Replace a field with line number in file

I am working on a script to convert bank data to a csv file. I have the format done - columns etc. The final piece of the puzzle is to change the second field (after the R) of every line to reflect its' line number in the file. I am stumped. I can use awk on each line but need help looping through... (9 Replies)
Discussion started by: Melah Gindi
9 Replies

8. Shell Programming and Scripting

Sort file based on number of delimeters in line

Hi, Need to sort file based on the number of delimeters in the lines. cat testfile /home/oracle/testdb /home /home/oracle/testdb/newdb /home/oracle Here delimeter is "/" expected Output: /home/oracle/testdb/newdb /home/oracle/testdb /home/oracle /home (3 Replies)
Discussion started by: Sumanthsv
3 Replies

9. Shell Programming and Scripting

Different field number in the file

Hello Friends I have a data file which is comma seperate (,) where i am expecting 2 column but there are number of time when file comes with data having more than 2 column. I want to check which line has more columns 20141115,15/11/2014 20141129,29/11/2014 20141003,03/10/2014... (4 Replies)
Discussion started by: guddu_12
4 Replies
Bio::Map::GeneRelative(3pm)				User Contributed Perl Documentation			       Bio::Map::GeneRelative(3pm)

NAME
Bio::Map::GeneRelative - Represents being relative to named sub-regions of a gene. SYNOPSIS
use Bio::Map::GeneRelative; # say that a somthing will have a position relative to the start of the # gene on map my $rel = Bio::Map::GeneRelative->new(-gene => 0); # or that something will be relative to the third transcript of a gene # on a map $rel = Bio::Map::GeneRelative->new(-transcript => 3); # or to the 5th intron of the default transcript $rel = Bio::Map::GeneRelative->new(-intron => [0, 5]); # use the $rel as normal; see L<Bio::Map::Relative> DESCRIPTION
Be able to say that a given position is relative to some standard part of a gene. FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR - Sendu Bala Email bix@sendu.me.uk APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new Title : new Usage : my $relative = Bio::Map::Relative->new(); Function: Build a new Bio::Map::Relative object. Returns : Bio::Map::Relative object Args : -gene => int : coordinates are relative to the int'th base downstream of the Position's map's gene [default is gene => 0, ie. relative to the start of the gene], -transcript => int : or relative to the start of the int'th transcript of the Position's map's gene, -exon => [i, n] : or relative to the start of the n'th transcript's i'th exon, -intron => [i, n] : or intron, -coding => int : or the start of the int'th transcript's coding region. -description => string : Free text description of what this relative describes (To say a Position is relative to something and upstream of it, the Position's start() co-ordinate should be set negative) In all cases, a transcript number of 0 means the active transcript. absolute_conversion Title : absolute_conversion Usage : my $absolute_coord = $relative->absolute_conversion($pos); Function: Convert the start co-ordinate of the supplied position into a number relative to the start of its map. Returns : scalar number Args : Bio::Map::PositionI object type Title : type Usage : my $type = $relative->type(); Function: Get the type of thing we are relative to. The types correspond to a method name, so the value of what we are relative to can subsequently be found by $value = $relative->$type; Note that type is set by the last method that was set, or during new(). Returns : 'gene', 'transcript', 'exon', 'intron' or 'coding' Args : none gene Title : gene Usage : my $int = $relative->gene(); $relative->gene($int); Function: Get/set the distance from the start of the gene that the Position's co-ordiantes are relative to. Returns : int Args : none to get, OR int to set; a value of 0 means relative to the start of the gene. transcript Title : transcript Usage : my $int = $relative->transcript(); $relative->transcript($int); Function: Get/set which transcript of the Position's map's gene the Position's co-ordinates are relative to. Returns : int Args : none to get, OR int to set; a value of 0 means the active (default) transcript. exon Title : exon Usage : my ($exon_number, $transcript_number) = @{$relative->exon()}; $relative->exon($exon_number, $transcript_number); Function: Get/set which exon of which transcript of the Position's map's gene the Position's co-ordinates are relative to. Returns : reference to list with two ints, exon number and transcript number Args : none to get, OR int (exon number) AND int (transcript number) to set. The second int is optional and defaults to 0 (meaning default/active transcript). intron Title : intron Usage : my ($intron_number, $transcript_number) = @{$relative->intron()}; $relative->intron($intron_number, $transcript_number); Function: Get/set which intron of which transcript of the Position's map's gene the Position's co-ordinates are relative to. Returns : reference to list with two ints, intron number and transcript number Args : none to get, OR int (intron number) AND int (transcript number) to set. The second int is optional and defaults to 0 (meaning default/active transcript). coding Title : coding Usage : my $transcript_number = $relative->coding; $relative->coding($transcript_number); Function: Get/set which transcript's coding region of the Position's map's gene the Position's co-ordinates are relative to. Returns : int Args : none to get, OR int to set (the transcript number, see transcript()) perl v5.14.2 2012-03-02 Bio::Map::GeneRelative(3pm)
All times are GMT -4. The time now is 03:19 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy