Improve awk code that has three separate parts


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Improve awk code that has three separate parts
# 1  
Old 04-11-2016
Improve awk code that has three separate parts

I have a very inefficient awk below that I need some help improving. Basically, there are three parts, that ideally, could be combined into one search and one output file. Thank you Smilie.

Part 1:
Check if the user inputted string contains + or - in it and if it does the input is writting to a file "input" and displayed on screen.

Code:
awk '/+/{print $0,"is an intronic variant"}' c:/Users/cmccabe/Desktop/Python27/input.txt
awk '/-/{print $0,"is an intronic variant"}' c:/Users/cmccabe/Desktop/Python27/input.txt

Part 2:
Looks within that "input" file for any lines with a + or - and if found writes them to a new file (temp+ or temp-)
Code:
awk '/+/' c:/Users/cmccabe/Desktop/Python27/input.txt > c:/Users/cmccabe/Desktop/Python27/temp+.txt 
awk '/-/' c:/Users/cmccabe/Desktop/Python27/input.txt > c:/Users/cmccabe/Desktop/Python27/temp-.txt

Part 3:
Removes the + or - lines from the "input" file.
Code:
sed -i '/+/d' C:/Users/cmccabe/Desktop/Python27/input.txt
sed -i '/-/d' C:/Users/cmccabe/Desktop/Python27/input.txt


Last edited by cmccabe; 04-11-2016 at 05:23 PM.. Reason: added details
# 2  
Old 04-11-2016
You say you only want one output file, but your current procedures are producing three output files (plus the text written to standard output). Which one output file do you want? Is it C:/Users/cmccabe/Desktop/Python27/input.txt, c:/Users/cmccabe/Desktop/Python27/temp+.txt, or c:/Users/cmccabe/Desktop/Python27/temp.txt?
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 04-11-2016
Currently, there are three output files because the awk searches for the + and writes it to a the "input" file. Then searches for the - and writes it to "input" file. The sed then outputs a separate file for each... one for + one for - and the original input. Only the original "input" file is needed, but I am not sure how to search for both +/- at the same time. Then both, if found, would result in the "input" file. I hope this helps and thank you Smilie.
# 4  
Old 04-11-2016
Quote:
Originally Posted by cmccabe
Currently, there are three output files because the awk searches for the + and writes it to a the "input" file. Then searches for the - and writes it to "input" file. The sed then outputs a separate file for each... one for + one for - and the original input. Only the original "input" file is needed, but I am not sure how to search for both +/- at the same time. Then both, if found, would result in the "input" file. I hope this helps and thank you Smilie.
No...

If we go back and look at the steps in your 1st post:
Quote:
Part 1:
Check if the user inputted string contains + or - in it and if it does the input is writting to a file "input" and displayed on screen.
Code:
awk '/+/{print $0,"is an intronic variant"}' c:/Users/cmccabe/Desktop/Python27/input.txt
awk '/-/{print $0,"is an intronic variant"}' c:/Users/cmccabe/Desktop/Python27/input.txt

This copies lines found in the file input.txt (ignoring the directory part) that contain a + or a - and writes those lines to your terminal (or to whatever file you might redirect the output). Nothing is read from a user inputted string and nothing is written to any file (unless you redirect the output from the above commands to another file).

This can be simplified to:
Code:
sed -n '/[-+]/s/$/ is an intronic variant/p' c:/Users/cmccabe/Desktop/Python27/input.txt

if you don't mind having the output from lines containing + and - characters mixed together instead of having all + printed before any lines containing - and if you don't mind just getting one output line if a line in input.txt contains both a + and a -.
Quote:
Part 2:
Looks within that "input" file for any lines with a + or - and if found writes them to a new file (temp+ or temp-)
Code:
awk '/+/' c:/Users/cmccabe/Desktop/Python27/input.txt > c:/Users/cmccabe/Desktop/Python27/temp+.txt 
awk '/-/' c:/Users/cmccabe/Desktop/Python27/input.txt > c:/Users/cmccabe/Desktop/Python27/temp-.txt

Again, this looks at the same input file (not at any user supplied string and not at the output produced by Part 1). It copies lines from that input file (without the text added to the ends of the selected lines by Part 1) that contain + to temp+.txt and lines that contain - to temp-.txt. Part 1 and Part 2 can be combined into the single script:
Code:
cd c:/Users/cmccabe/Desktop/Python27
awk '
/[-+]/{	print $0, "is an intronic variant"}
/-/{	print > "temp-.txt"}
/+/{	print > "temp+.txt"}
' input.txt

Quote:
Part 3:
Removes the + or - lines from the "input" file.
Code:
sed -i '/+/d' C:/Users/cmccabe/Desktop/Python27/input.txt
sed -i '/-/d' C:/Users/cmccabe/Desktop/Python27/input.txt

Yes, that is what this does (assuming that you are using a system that has a sed that includes the -i option and that no errors occur while either of those sed commands are running).

So, if what you want to do is:
  1. Copy lines from input.txt that contain + to temp+.txt,
  2. copy lines from input.txt that contain - to temp-.txt,
  3. print lines from input.txt that contain + or - or both to standard output with the added text is an intronic variant, and
  4. remove lines from input.txt that contain a + or a - or both.
you could try just using:
Code:
cd c:/Users/cmccabe/Desktop/Python27
tmpf="input$$.txt"
awk -v tmpf="$tmpf" '
/[-+]/{	print $0, "is an intronic variant"}
/-/{	print > "temp-.txt"}
/+/{	print > "temp+.txt"}
!/[-+]/{print > tmpf}
' input.txt && cp "$tmpf" "input.txt" && rm -f "$tmpf"

This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 04-12-2016
I will try it out today. If the search of the input file contain both the + and -, then another command would remove any lines with a + or - in the input file and copy the + or - lines to a temp file. Thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Separate a hash variable into 2 parts in Perl

Dear Perl users/experts, Could somebody help me how to solve my problem, I have a hash variable that I want to convert into dot file (graphviz). I know how to convert it to dot file but I need some modification on the output of the hash variable before convert it to dot file. Eeach key of... (1 Reply)
Discussion started by: askari
1 Replies

2. Programming

Improve the performance of my C++ code

Hello, Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but... (11 Replies)
Discussion started by: yifangt
11 Replies

3. Shell Programming and Scripting

How to parse parts of 1 column into two separate columns?

I have a shell script that is currently transferring a csv file from a server into a Teradata database table. One of the 30 or so columns is called "destination_url". In that URL there are parameters, and it is possible for those parameters to be repeated because of referring companies copying... (3 Replies)
Discussion started by: craigwg
3 Replies

4. Shell Programming and Scripting

Incrementing parts of ten digits number by parts

I have number in file which contains date and serial number: 2013101000. The last two digits are serial number (00). So maximum of serial number is 100. After reaching 100 it becomes 00 with incrementing 10 which is day with max 31. after reaching 31 it becomes 00 and increments 10... (31 Replies)
Discussion started by: Natalie
31 Replies

5. Shell Programming and Scripting

Looking to improve the output of this awk one-liner

I have the following awk one-liner I came up with last night to gather some data. and it works pretty well (apologies, I'm quite new with awk, and don't know how to format this pretty-printed). You can see the output with it. awk '{if ($8 == 41015 && $21 == "requests") arr+=$20;if ($8 == 41015... (3 Replies)
Discussion started by: DeCoTwc
3 Replies

6. Shell Programming and Scripting

AWK splitting a string of equal parts of 500 chars

Hi , can someone help me how to make an AWK code for splitting a string of equal parts of 500 chars in while loop? Thank you! (4 Replies)
Discussion started by: sanantonio7777
4 Replies

7. Shell Programming and Scripting

Using bash to separate files files based on parts of a filename

Hey guys, Sorry for the basic question but I have a lot of files that I want to separate into groups based on filenames which I can then cat together. Eg I have: (a_b_c.txt) WB34_2_SLA8.txt WB34_1_SLA8.txt WB34_1_DB10.txt WB34_2_DB10.txt WB34_1_SLA8.txt WB34_2_SLA8.txt 77_1_SLA8.txt... (1 Reply)
Discussion started by: Breentax
1 Replies

8. Shell Programming and Scripting

Improve performance of echo |awk

Hi, I have a script which looks like this. Input file data1^20 data2^30 #!/bin/sh file"/home/Test.txt" while read line do echo $line |awk 'BEGIN { FS = "^" } ; { print $2 }' echo $line |awk 'BEGIN { FS = "^" } ; { print $1 }' | gzip | wc -c done <"$file" How can i... (4 Replies)
Discussion started by: chetan.c
4 Replies

9. Shell Programming and Scripting

awk printing only parts of file

I am afraid I don't understand awk well enough to do the following. I have a file with a bunch of select statements where the a line starts off with this pattern: "Last parsed SQL statement :", then continues with the select statement. At the first blank space I'd like it to stop, print that... (5 Replies)
Discussion started by: fwellers
5 Replies

10. Shell Programming and Scripting

Improve program efficiency (awk)

Hi !! I've finished an awk exercise. Here it is: #!/bin/bash function calcula { # Imprimimos el mayor tamaño de fichero ls -l $1 | awk ' BEGIN { max = $5; # Inicializamos la variable que nos guardará el máximo con el tamaño del primer archivo } { if ($5 > max){ #... (8 Replies)
Discussion started by: Phass
8 Replies
Login or Register to Ask a Question