Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Extract specific contents from each line Post 302750607 by Don Cragun on Wednesday 2nd of January 2013 03:17:34 AM
Old 01-02-2013
Quote:
Originally Posted by luoruicd
Thanks! It works pretty well!
I am trying to understand this code, this line puzzles me:
s = (s ? s "/" : "") substr($i, index($i, "=") + 1

You probably uses some regular expression, right? Can you recommend some document or book for me to better understand this?
For a reference I would start the awk man page on your system and the awk man pages on this site, and then look at the Wikipedia entry for awk; it provides a good overview of the language with examples, differences provided by various versions of awk, and a decent list of reference materials.

There are several regular expressions in this script, but none of them are in this statement. (And note that you missed a closing parenthesis at the end of this statement.)

Note that the variable s is intended to be a concatenation of the values of single alleles found in fields of the form allele=X with a / separating entries if more than one is found. Also note that awk sets all uninitialized variables to an empty string (or zero depending on context) and that the function pr() in this script resets s to an empty string whenever it is called to print the data that has been accumulated for an SNP. And, due to the if statement on the previous line, this line of code is executed only in the ith field on the current input line matches the regular expression /allele=/ and this will match if and only if the field contains the string "allele=".

The ? : operators behave the same way in awk as they do in C, C++, and several other languages (i.e., in this case if the expression before the ? is not an empty string, return the expression between the ? and the :; otherwise return the expression after the :); the substr() function returns a substring or the string named by its first operand starting at the offset specified by its second operand (and since there is no third argument, returns the rest of the string); the index() function finds the position in its first argument where the string specified by its second argument first appears. So stringing all of this together in a single statement:
Code:
s = (s ? s "/" : "") substr($i, index($i, "=") + 1)

Set the variable s to (if s is not an empty string, the concatenation of the current value of the variable s followed by a slash character, or if s is an empty string, an empty string) concatenated with the character(s) following the = sign in allele=X.

So assume you have an SNP listing in your input file contained the following fields (all on one line or on different lines):
Code:
allele=C | allele=A | allele=G

When the first one of these is found, s will be set to "C".
When the second one is found s will be set to the concatenation of "C", "/", and "A" (i.e., "C/A").
And, when the third one is found, s will be set to "C/A/G".

Last edited by Don Cragun; 01-02-2013 at 04:22 AM.. Reason: Add suggestion to look at awk man pages
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extract the lines between specific line number from a text file

Hi I want to extract certain text between two line numbers like 23234234324 and 54446655567567 How do I do this with a simple sed or awk command? Thank you. ---------- Post updated at 06:16 PM ---------- Previous update was at 05:55 PM ---------- found it: sed -n '#1,#2p'... (1 Reply)
Discussion started by: return_user
1 Replies

2. Shell Programming and Scripting

Shell script or command help to extract specific contents from a long list of content

Hi, I got a long list of contents: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF... (2 Replies)
Discussion started by: patrick87
2 Replies

3. Shell Programming and Scripting

extract specific line if the search pattern is found

Hi, I need to extract <APPNUMBER> tag alone, if the <college> haas IIT Chennai value. college tag value will have spaces embedded. Those spaces should not be suppresses. My Source file <Record><sno>1</sno><empid>E0001</empid><name>Rejsh suderam</name><college>IIT ... (3 Replies)
Discussion started by: Sekar1
3 Replies

4. Shell Programming and Scripting

Extract a specific line from a stream

Hello, I'm trying to code a bash script and I was wondering how to extract a specific line from a stream. E.g. My file "file" contains three lines and i'd like to find a function f which returns after execution a specific line like the second line, which would be : f(file, 2) = Second... (4 Replies)
Discussion started by: Oddant
4 Replies

5. Shell Programming and Scripting

Extract character between specific line numbers

Hi guys, I have txt file and I would need to extract all the contents between specific line numbers. Line 1: apple Line 2: orange Line 3: mango Line 4: grapes Line 5: pine apple I need to extract the content between line 2 and 4, including the contents of Line 2 and 4 so the ouput... (2 Replies)
Discussion started by: gowrishankar05
2 Replies

6. Shell Programming and Scripting

Using awk to read a specific line and a specific field on that line.

Say the input was as follows: Brat 20 x 1000 32rf Pour 15 p 1621 05pr Dart 10 z 1111 22xx My program prompts for an input, what I want is to use the input to locate a specific field. Like if I type in, "Pou" then it would return "Pour" and just "Pour" I currently have this line but it is... (6 Replies)
Discussion started by: Bungkai
6 Replies

7. Shell Programming and Scripting

how to read the contents of two files line by line and compare the line by line?

Hi All, I'm trying to figure out which are the trusted-ips and which are not using a script file.. I have a file named 'ip-list.txt' which contains some ip addresses and another file named 'trusted-ip-list.txt' which also contains some ip addresses. I want to read a line from... (4 Replies)
Discussion started by: mjavalkar
4 Replies

8. Shell Programming and Scripting

sed or awk, cut, to extract specific data from line

Hi guys, I have been trying to do this, but... no luck so maybe you can help me. I have a line like this: Total Handled, Received, on queue Input Mgs: 140 / 14 => 0 I need to, get the number after the / until the =, to get only 14 . Any help is greatly appreciated. Thanks, (4 Replies)
Discussion started by: ocramas
4 Replies

9. Shell Programming and Scripting

sed to replace specific positions on line with file contents

Hi, I am trying to use an awk command to replace specific character positions on a line beginning with 80 with contents of another file. The line beginning with 80 in file1 is as follows: I want to replace the 000000000178800 (positions 34 - 49) on this file with the contents of... (2 Replies)
Discussion started by: nwalsh88
2 Replies

10. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies
All times are GMT -4. The time now is 06:50 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy