01-19-2019
Quote:
Are the ranges given in your first input file always in increasing numerical order for each $1,$4 set of values (as in your sample file f1)? If they are we can use that information to make your code run faster.
Yes, these should always be sorted like in
f1
Quote:
Is the fifth subfield of $4 in your second input file always identical to the $1 value on the same input line (as in your sample files)? If they are, we can use that information to make your code run faster.
Yes, this will always be the case if
$4 is found as in
f1
Quote:
You note that your input files fields are separated by tabs. Do you want the output file to be tab delimited too; or do you want the output to be delimited by spaces as shown in your sample output?
f1 will always be
tab-delimited except for a whitespace after
$3 and
$4, but the output would be
tab-delimited I did and
OFS="\t" but I think the whitespaces are making that not work
You are correct in that I meant to be looking for inclusive endpoints so the
>=/<= is what I should have used.
Quote:
Is it your intent to print the line containing exon if either endpoint is in an entry in the first input file for that $1,$4 pair, or should it only print the exon line if both endpoints are in range?
I used the
|| statement to make sure the script works as expected but it could be
&& as both coordinates should lie within the endpoints (trying to think of a situation where its not the case and not coming up with anything).
Thank you very much
.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
hello experts,
I have a file: File1 Sample Test1
This is a Test
Sample Test2
Another Test
Final Test3
A final Test
I can use sed to delete the line with specific text
ie: sed '/Test2/d' File1.txt > File2.txt
How can I delete the line with the matching text and the line immediately... (6 Replies)
Discussion started by: orahi001
6 Replies
2. Shell Programming and Scripting
Hi,
I wish to use a column, as inputted by a user from command line, for pattern matching.
awk file:
{
if($1 ~ /^8/)
{
print $0> "temp2.csv"
}
}
something like this, but i want '$1' to be any column as selected by the user from command line.
... (1 Reply)
Discussion started by: invinclible0009
1 Replies
3. Shell Programming and Scripting
Dear All,
I would like to add values of a field, if the lines match in a certain field. Then I would like to divide the sum though the number of lines that have a matched field. This is the Input:
Input:
Test1 5
Test1 10
Test2 2
Test2 5
Test2 13
Test3 4
Output:
Test1 7.5
Test1 7.5... (6 Replies)
Discussion started by: DerSeb
6 Replies
4. Shell Programming and Scripting
Sample file:
This is line one,
this is another line,
this is the PRIMARY INDEX line
l ;
This is another line
The command should find the line with “PRIMARY INDEX” and remove the last character from the line preceding it (in this case , comma) and remove the first character from the line... (5 Replies)
Discussion started by: KC_Rules
5 Replies
5. Shell Programming and Scripting
Hi,
I want to achieve something similar to what described in another post:
The difference is I want to add the line if the pattern is not found.
File 1:
A123, valueA, valueB
B234, valueA, valueB
C345, valueA, valueB
D456, valueA, valueB
E567, valueA, valueB
F678, valueA, valueB
... (11 Replies)
Discussion started by: jyu3
11 Replies
6. Shell Programming and Scripting
Hi there,
I'm trying to use awk to print out the entire line that contains a match to a certain regex and then append some text,plus the match to the end of the line.
So far I have:
awk -F: '{print "RG:Z:" $2}' file
Which prints out the match I want plus the additional text, but I'm stuck... (3 Replies)
Discussion started by: jim_lad
3 Replies
7. Shell Programming and Scripting
Hello Help,
2356798 7689867 999 000
123678 20385907 9797 666
17978975 87468976 968978 98798
I am trying to have out put which actually look for the third column value of 9797 and then it insert line there after with first, second column value exactly as the previous line and replace the third... (3 Replies)
Discussion started by: Indra2011
3 Replies
8. Shell Programming and Scripting
The bash bash below extracts the oldest folder from a directory and stores it in filename
That result will match a line in bold in input. In the matching line there is an_xxx digit in italics that
(once the leading zero is removed) will match a line in link. That is the lint to print in output.... (2 Replies)
Discussion started by: cmccabe
2 Replies
9. Shell Programming and Scripting
In the awk I am trying to add :p.=? to the end of each $9 that matches the pattern NM_. The below executes andis close but I can not seem to figure out why the :p.=? repeats in the split as in the green in the current output. I have added comments as well. Thank you :).
file
... (4 Replies)
Discussion started by: cmccabe
4 Replies
10. UNIX for Beginners Questions & Answers
In the awk below I am trying to cp and paste each matching line in f2 to $3 in f1 if $2 of f1 is in the line in f2 somewhere. There will always be a match (usually more then 1) and my actual data is much larger (several hundreds of lines) in both f1 and f2. When the line in f2 is pasted to $3 in... (4 Replies)
Discussion started by: cmccabe
4 Replies
WC(1) BSD General Commands Manual WC(1)
NAME
wc -- word, line, and byte count
SYNOPSIS
wc [-c | -m] [-Llw] [file ...]
DESCRIPTION
The wc utility displays the number of lines, words, bytes and characters contained in each input file (or standard input, by default) to the
standard output. A line is defined as a string of characters delimited by a <newline> character, and a word is defined as a string of char-
acters delimited by white space characters. White space characters are the set of characters for which the iswspace(3) function returns
true. If more than one input file is specified, a line of cumulative counts for all the files is displayed on a separate line after the out-
put for the last file.
The following options are available:
-c The number of bytes in each input file is written to the standard output.
-L The number of characters in the longest line of each input file is written to the standard output.
-l The number of lines in each input file is written to the standard output.
-m The number of characters in each input file is written to the standard output.
-w The number of words in each input file is written to the standard output.
When an option is specified, wc only reports the information requested by that option. The default action is equivalent to all the flags
-clw having been specified.
The following operands are available:
file A pathname of an input file.
If no file names are specified, the standard input is used and no file name is displayed.
By default, the standard output contains a line for each input file of the form:
lines words bytes file_name
EXIT STATUS
The wc utility exits 0 on success, and >0 if an error occurs.
SEE ALSO
iswspace(3)
COMPATIBILITY
Historically, the wc utility was documented to define a word as a ``maximal string of characters delimited by <space>, <tab> or <newline>
characters''. The implementation, however, didn't handle non-printing characters correctly so that `` ^D^E '' counted as 6 spaces, while
``foo^D^Ebar'' counted as 8 characters. 4BSD systems after 4.3BSD modified the implementation to be consistent with the documentation. This
implementation defines a ``word'' in terms of the iswspace(3) function, as required by IEEE Std 1003.2 (``POSIX.2'').
The -L option is a non-standard extension, compatible with the -L option of the GNU and FreeBSD wc utilities.
STANDARDS
The wc utility conforms to IEEE Std 1003.2-1992 (``POSIX.2'').
BSD
February 18, 2010 BSD