Sponsored Content
Top Forums Shell Programming and Scripting Extraction of upstream and downstream regions from long sequence file Post 302951507 by RudiC on Friday 7th of August 2015 04:21:39 AM
Old 08-07-2015
Try this, based on the good work of Scrutinizer in this post
Code:
awk '
BEGIN           {print "sequence id\textracted region small\textracted region big upstream and downstream"
                }
NR==FNR &&
FNR>1           {CNT[$1]++
                 S[$1,CNT[$1]]=$2
                 E[$1,CNT[$1]]=$3
                 next
                }
                {split ($1, T, " ")
                }
T[1] in CNT     {i=T[1]
                 $1=x
                 for (j=1; j<=CNT[T[1]]; j++)
                        print RS i "\t" substr ($0,S[i,j],E[i,j]-S[i,j]+1) "\t" substr ($0, S[i,j]-100, E[i,j]-S[i,j]+201)
                }
' file2 RS=\> FS='\n' OFS= file1

 

10 More Discussions You Might Find Interesting

1. Programming

selecting rows with specific IDs for downstream analysis

Hi, I'm working hard on SQL and I came across a hurdle I'm hoping you can help me out with. I have two tables table1 headers: chrom start end name score strand 11 9720685 9720721 U0 0 + 21 9721043 9721079 U0 0 - 1 9721093 9721129 U0 0 + 20 ... (2 Replies)
Discussion started by: labrazil
2 Replies

2. Shell Programming and Scripting

awk: union regions

Hi all, I have difficulty to solve the followign problem. mydata: StartPoint EndPoint 22 55 2222 2230 33 66 44 58 222 240 11 25 22 60 33 45 The union of above... (2 Replies)
Discussion started by: phoeberunner
2 Replies

3. UNIX for Dummies Questions & Answers

fast sequence extraction

Hi everyone, I have a large text file containing DNA sequences in fasta format as follows: >someseq GAACTTGAGATCCGGGGAGCAGTGGATCTC CACCAGCGGCCAGAACTGGTGCACCTCCAG GCCAGCCTCGTCCTGCGTGTC >another seq GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT GACATTTTCATTACTACCATTTTGGAGTACA >seq3450... (4 Replies)
Discussion started by: Fahmida
4 Replies

4. UNIX for Dummies Questions & Answers

extract regions of file based on start and end position

Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2. Based on a post elsewhere, I found the code: awk... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

6. Shell Programming and Scripting

FILE_ID extraction from file name and save it in CSV file after looping through each folders

FILE_ID extraction from file name and save it in CSV file after looping through each folders My files are located in UNIX Server, i want to extract file_id and file_name from each file .and save it in a CSV file. How do I do that? I have folders in unix environment, directory structure is... (15 Replies)
Discussion started by: princetd001
15 Replies

7. Shell Programming and Scripting

Obtain the names of the flanking regions

Hi I have 2 files; usually the end position in the file1 is the start position in the file2 and the end position in file2 will be the start position in file1 (flanks) file1 Id start end aaa1 0 3000070 aaa1 3095270 3095341 aaa1 3100822 3100894 aaa1 ... (1 Reply)
Discussion started by: anurupa777
1 Replies

8. IP Networking

Newbie BIND DNS question: resolving upstream hosts?

Old skool UNIX and Linux geek here, but newbie to the world of DNS and bind. I've recently been tasked with replacing our DNS infrastructure, currently on Windows, with a RHEL based solution. And I assume that means using bind, which I've not used before. Here's my question: Suppose our company... (3 Replies)
Discussion started by: lupin..the..3rd
3 Replies

9. Shell Programming and Scripting

Parsing and masking regions from a single fasta file with subsequence

HI, I have a Complete genome fasta file and I have list of sub sequence regions in the format as : 4353..5633 6795..9354 1034..14456 I want a script which can mask these region in a single complete genome fasta file with the alphabet N kindly help (2 Replies)
Discussion started by: margarita
2 Replies

10. Shell Programming and Scripting

Sequence extraction

i want to extract specific region of interest from big file. i have only start position, end position and seq id, see my query is: I have file1 is this >GL3482.1 GAACTTGAGATCCGGGGA GCAGTGGATCTCCACCAG CGGCCAGAACTGGTGCAC CTCCAGGCCAGCCTCGTC CTGCGTGTC >GL3550.1... (14 Replies)
Discussion started by: harpreetmanku04
14 Replies
GEOIP_REGION_NAME_BY_CODE(3)						 1					      GEOIP_REGION_NAME_BY_CODE(3)

geoip_region_name_by_code - Returns the region name for some country and region code combo

SYNOPSIS
string geoip_region_name_by_code (string $country_code, string $region_code) DESCRIPTION
The geoip_region_name_by_code(3) function will return the region name corresponding to a country and region code combo. In the United States, the region code corresponds to the two-letter abbreviation of each state. In Canada, the region code corresponds to the two-letter province or territory code as attributed by Canada Post. For the rest of the world, GeoIP uses FIPS 10-4 codes to represent regions. You can check http://www.maxmind.com/app/fips10_4 for a detailed list of FIPS 10-4 codes. This function is always available if using GeoIP Library version 1.4.1 or newer. The data is taken directly from the GeoIP Library and not from any database. PARAMETERS
o $country_code - The two-letter country code (see geoip_country_code_by_name(3)) o $region_code - The two-letter (or digit) region code (see geoip_region_by_name(3)) RETURN VALUES
Returns the region name on success, or FALSE if the country and region code combo cannot be found. EXAMPLES
Example #1 A geoip_region_name_by_code(3) example using region code for US/Canada This will print the region name for country CA (Canada), region QC (Quebec). <?php $region = geoip_region_name_by_code('CA', 'QC'); if ($region) { echo 'Region name for CA/QC is: ' . $region; } ?> The above example will output: Region name for CA/QC is: Quebec Example #2 A geoip_region_name_by_code(3) example using FIPS codes This will print the region name for country JP (Japan), region 01. <?php $region = geoip_region_name_by_code('JP', '01'); if ($region) { echo 'Region name for JP/01 is: ' . $region; } ?> The above example will output: Region name for JP/01 is: Aichi PHP Documentation Group GEOIP_REGION_NAME_BY_CODE(3)
All times are GMT -4. The time now is 10:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy