awk to change value in field according to another


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to change value in field according to another
# 15  
Old 12-08-2018
I truely appreciate the helpful troubleshooting tips and explanations. I will use them to debug and post back. I did learn as a reseacher and am always trying to refine and improve my technique. I don't know if I will get there, but i will always try and practice. Biology and programming are essential and interesting. Thank you again 😀.

--- Post updated 12-08-18 at 07:39 AM ---

The variables are all being populated correctly but using set -xv, thank you... the script never gets past or to the done (it stalls on the bold line.

Code:
for file in /home/cmccabe/folder/less/*.txt ; do
     bname=$(basename "$file")
     pref="${bname%%_*.txt}"
     set -xv
     /home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 "$file" \> /home/cmccabe/folder/less/${pref}_output.txt
done
set +xv

So since the first .txt file never completes the second one never starts processing. Why this is seems to be my issue no its a matter off figuring out the why. Thank you for the helpful "talk", I appreciate it Smilie.

I ran bash -x

Code:
for file in /home/cmccabe/folder/less/*.txt ; do
     bname=$(basename "$file")
     pref="${bname%%_*.txt}"
     bash -x /home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 "$file" > /home/cmccabe/folder/less/${pref}_output.txt
done

and confirmed the .txt files are not being passed to the exon script as input.

Last edited by Don Cragun; 12-08-2018 at 10:22 AM.. Reason: Fix CODE tags.
# 16  
Old 12-08-2018
Did you try running the logical equivalent of:
Code:
for file in path/to/*.txt ; do
     echo "$file"
     bname=$(basename "$file")
     pref="${bname%%_*.txt}"
     echo "bname: \"$bname\"   pref:\"$pref\""
     # bash /path/to/exon.sh static "$file" > "path/to/${pref}_output.txt"
done

with your actual pathnames and operands, as bakunin suggested? Please show us the output that produced! Like bakunin, I find it hard to believe that pref is being set to the value that I would assume you are trying to set (which isn't at all clear to me).

And, when you run the loop:
Code:
for file in /home/cmccabe/folder/less/*.txt ; do
     bname=$(basename "$file")
     pref="${bname%%_*.txt}"
     set -xv
     /home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 "$file" \> /home/cmccabe/folder/less/${pref}_output.txt
done
set +xv

(which has the set +xv line after the done instead of before it like bakunin suggested), the whole purpose of enabling tracing on the invocations of exon.sh is so we can all see the trace output produced. But, you haven't shown us any of the trace output???

Please try this slight modification to the above, and show us the output that it produces:
Code:
for file in /home/cmccabe/folder/less/*.txt
do   bname=$(basename "$file")
     pref=${bname%%_*.txt}
     echo "file:\"$file\"    bname:\"$bname\"    pref:\"$pref\""
     echo "output will be directed to:\"/home/cmccabe/folder/less/${pref}_output.txt\""
     #/home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 "$file" > "/home/cmccabe/folder/less/${pref}_output.txt"
done

If, and only if, that produces the values that you expect for bname, pref, and the pathname of the file you want to produce, then also try running the following:
Code:
for file in /home/cmccabe/folder/less/*.txt
do   bname=$(basename "$file")
     pref=${bname%%_*.txt}
     #echo "file:\"$file\"    bname:\"$bname\"    pref:\"$pref\""
     #echo "output will be directed to:\"/home/cmccabe/folder/less/${pref}_output.txt\""
     set -xv
     /home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 "$file" > "/home/cmccabe/folder/less/${pref}_output.txt"
     set +xv
done

and show us the output that produces.

Note that I am not sure why bakunin suggested using \> in the exon.sh command. Doing that makes the redirection operator and the intended output file become operands to exon.sh instead of being a redirection.

I also note that the exon.sh can't be the script that I supplied in post #11. That script didn't look at any of its operands; it only used the presence of one or more operands as a flag to enable debugging printouts. I will assume that you removed the debugging printf statements and the d variable and are using the two operands you are passing to exon.sh as the two filenames processed by that script.
This User Gave Thanks to Don Cragun For This Post:
# 17  
Old 12-08-2018
There are two .txt files in the directory:

Code:
00-0000_regions.txt and 11-1111_regions.txt

these two .txtfiles are $bname and $pref is the digits after the _regions.txt is removed.

Code:
for file in /home/cmccabe/folder/less/*.txt ; do
     echo "$file"
     bname=$(basename "$file")
     pref="${bname%%_*.txt}"
     echo "bname: \"$bname\"   pref:\"$pref\""
     # /home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 $file > /home/cmccabe/folder/less/${pref}_output.txt
done
/home/cmccabe/folder/less/00-0000_regions.txt
bname: "00-0000_regions.txt"   pref:"00-0000"
/home/cmccabe/folder/less/11-1111_regions.txt
bname: "11-1111_regions.txt"   pref:"11-1111"

Code:
for file in /home/cmccabe/folder/less/*.txt ; do   
       bname=$(basename "$file")
       pref=${bname%%_*.txt}
       echo "file:\"$file\"    bname:\"$bname\"    pref:\"$pref\""
       echo "output will be directed to:\"/home/cmccabe/folder/less/${pref}_output.txt\""
       #/home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 "$file" > "/home/cmccabe/folder/less/${pref}_output.txt"
done
    file:"/home/cmccabe/folder/less/00-0000_regions.txt"    bname:"00-0000_regions.txt"    pref:"00-0000"
    output will be directed to:"/home/cmccabe/folder/less/00-0000_output.txt"
    file:"/home/cmccabe/folder/less/11-1111_regions.txt"    bname:"11-1111_regions.txt"    pref:"11-1111"

Code:
for file in /home/cmccabe/folder/less/*.txt ; do
       bname=$(basename "$file")
       pref=${bname%%_*.txt}
       #echo "file:\"$file\"    bname:\"$bname\"    pref:\"$pref\""
       #echo "output will be directed to:\"/home/cmccabe/folder/less/${pref}_output.txt\""
       set -xv
       /home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 "$file" > "/home/cmccabe/folder/less/${pref}_output.txt"
       set +xv
done
+ /home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 /home/cmccabe/folder/less/00-0000_output.txt

The process seems to stall and get stuck on the first file 00-0000_regions.txt.

I did also run bash -x /home/cmccabe/folder/less/exon.sh and can see that the .txt files are not getting passed to the script as input. I'm not sure why but I believe this may help:

Quote:
I also note that the exon.sh can't be the script that I supplied in post #11. That script didn't look at any of its operands; it only used the presence of one or more operands as a flag to enable debugging printouts. I will assume that you removed the debugging printf statements and the d variable and are using the two operands you are passing to exon.sh as the two filenames processed by that script.
I'm not sure I follow but I will re-read and maybe that will help. Thank you Smilie.

Code:
for file in /home/cmccabe/folder/less/*.txt ; do      bname=$(basename "$file");      pref="${bname%%_*.txt}";      bash -x /home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 "$file" > /home/cmccabe/folder/less/${pref}_output.txt; donefor file in /home/cmccabe/folder/less/*.txt ; do      bname=$(basename "$file");      pref="${bname%%_*.txt}";      bash -x /home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 "$file" > /home/cmccabe/folder/less/${pref}_output.txt; done
+ for file in '/home/cmccabe/folder/less/*.txt'
basename "$file"
++ basename /home/cmccabe/folder/less/00-0000_output.txt
+ bname=00-0000_output.txt
+ pref=00-0000
+ bash -x /home/cmccabe/folder/less/exon.sh /home/cmccabe/folder/less/all_cdsV2 /home/cmccabe/folder/less/00-0000_output.txt
+ awk -v d=2 '
BEGIN {	FS = "[\t_]"
	OFS = "\t"
}
FNR == NR {
	m[$1, $4, ++c[$1, $4]] = $2 + 0
	M[$1, $4, c[$1, $4]] = $3 + 0
	if(d) printf("m[%s,%s,%d]=%s,M[%s,%s,%d]=%s\n",
		$1, $4, c[$1, $4], m[$1, $4, c[$1, $4]],
		$1, $4, c[$1, $4], M[$1, $4, c[$1, $4]])
	next
}
{	if(d) printf("FNR=%d:\"%s\"\n",FNR,$0)
	for(i = 1; i <= c[$1, $4]; i++) {
		if(d) printf("m[%d]=%d,M[%d]=%d,$2=%d\n",
			i, m[$1, $4, i],
			i, M[$1, $4, i],
			$2)
		if(m[$1, $4, i] <= $2 && $2 <= M[$1, $4, i]) {
			$5 = "exon"
			break
		} else {if(m[$1, $4, i] > $2 + 0) {
				if(m[$1, $4, i] - 10 <= $2 + 0) {
					$5 = "splicing"
					break
				} else {$5 = "intron"
					break
				}
		}
	}
}
	if(i > c[$1, $4])
		$5 = "intron"
}
1'

# 18  
Old 12-09-2018
One might note that the script I provided for you in post #11 had file2 and file1 as operands to awk after the awk code operand. The trace output you have shown us for the awk command at the end of post #17 shows that awk is called with only one operand (the awk code as its first and only operand). That code is, as you say, stalling because it is waiting for you to type input into it since it has no file operands specifying what files it is supposed to process!

Clearly, the assumption I stated at the end of post #16 was wrong. I trust that you will fix it.
# 19  
Old 12-09-2018
Quote:
Originally Posted by Don Cragun
Note that I am not sure why bakunin suggested using \> in the exon.sh command. Doing that makes the redirection operator and the intended output file become operands to exon.sh instead of being a redirection.
Exactly this was the point. When i suggested debugging the command i suggested to use an echo command in front of it. This echo would have been redirected instead of displayed. This was why i suggested to escape the redirection to display - for debugging purposes - the commands which would be issued by the script.

Quote:
The last thing, if the variables do produce the correct values, is to test the command itself: instead of running it we just display it. Notice that we need to escape the redirection:
[...]
Code:
echo /path/to/exon.sh static "$file" \> "path/to/${pref}_output.txt"

Of course the escaped redirection has to be reverted back to normal once the debugging is over.

bakunin

Last edited by bakunin; 12-09-2018 at 06:59 AM..
This User Gave Thanks to bakunin For This Post:
# 20  
Old 12-10-2018
I the original code below the bold is the static or the file that is always used. The italics output is set by prefand $file is the underlined portion and would be dependent on each .txt file in the directory.

Quote:
I also note that the exon.sh can't be the script that I supplied in post #11. That script didn't look at any of its operands; it only used the presence of one or more operands as a flag to enable debugging printouts. I will assume that you removed the debugging printf statements and the d variable and are using the two operands you are passing to exon.sh as the two filenames processed by that script.
Since one of the operands will never change and the other is set by the for loop are you saying (sorry for my confusion).I added comments as well. Thank you Smilie.

Code:
#!/bin/sh
awk -v d=$# '    (the d=$# is removed because the files are dependent on the for loop)
BEGIN {	FS = "[\t_]"
	OFS = "\t"
}
FNR == NR {
	m[$1, $4, ++c[$1, $4]] = $2 + 0
	M[$1, $4, c[$1, $4]] = $3 + 0
	if(d) printf("m[%s,%s,%d]=%s,M[%s,%s,%d]=%s\n",
		$1, $4, c[$1, $4], m[$1, $4, c[$1, $4]],
		$1, $4, c[$1, $4], M[$1, $4, c[$1, $4]])
	next
}
{	#if(d) printf("FNR=%d:\"%s\"\n",FNR,$0) (remove this line as it assumes d is typed in)
	for(i = 1; i <= c[$1, $4]; i++) {
		if(d) printf("m[%d]=%d,M[%d]=%d,$2=%d\n",
			i, m[$1, $4, i],
			i, M[$1, $4, i],
			$2)
		if(m[$1, $4, i] <= $2 && $2 <= M[$1, $4, i]) {
			$5 = "exon"
			break
		} else {if(m[$1, $4, i] > $2 + 0) {
				if(m[$1, $4, i] - 10 <= $2 + 0) {
					$5 = "splicing"
					break
				} else {$5 = "intron"
					break
				}
		}
	}
}
	if(i > c[$1, $4])
		$5 = "intron"
}
1' all_cdsV2 00-0000low > 00-0000_filter


Last edited by cmccabe; 12-10-2018 at 04:05 PM.. Reason: corrected typo
# 21  
Old 12-10-2018
Hi cmccabe,
There were three debugging printfs in the code I suggested, you commented out one of them and removed the code that enabled those debugging statements to be controlled by the presence of operands passed to your script. That seems like a strange combination.

I would have thought that you would realize that we can't tell if your changes to exon.sh are going to work without seeing how you invoke exon.sh. Giving parameters to exon.sh that are ignored when you invoke it doesn't make much sense. (And it seems that the script you showed us in post #20 ignores any parameters that you give it when you invoke it.)

Having a for loop in a script that invokes exon.sh has no effect on exon.sh unless something in that loop passes one or more parameters to exon.sh that exon.sh uses to adjust its behavior (which the code you showed us in post #20 does not do) or you copy the input file you want to process into a file named 00-0000low before you invoke it and copy the data written into 00-0000_filter by exon.sh into whatever file you want to contain the output of that iteration through your loop. That seems like it is a lot of extra copying of data to get what you want, but it would mesh with the code you showed us in post #20.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to change contents of field based on condition in same file

In the awk below I am trying to copy the entire contents of $6 there may be multiple values seperated by a ;, to $8, if $8 is . (lines 1 and 3 are examples). If that condition $8 is not . (line2 is an example) then that line is skipped and printed as is. The awk does execute but prints the output... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

awk to change value of field using multiple conditions

In the below awk in the first step I default Classification NF-1 to VUS. Next, I am trying to change the value of Classification (NF) to whatever CLINSIG (NF-1) is. If there is only one condition everything works great, but if there are two conditions it does not work. Is the syntax used... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. Shell Programming and Scripting

awk :how to change delimiter without giving all field name

Hi Experts, i need to change delimiter from tab to "," sample test file cat test A0000368 A29938511 072569352 5 Any 2 for £1.00 BUTCHERS|CAT FOOD|400G Sep 12 2012 12:00AM Jan 5 2014 11:59PM Sep 7 2012 12:00AM M 2.000 group 5 ... (2 Replies)
Discussion started by: Lakshman_Gupta
2 Replies

4. UNIX for Dummies Questions & Answers

change field separator only from nth field until NF

Hi ! input: 111|222|333|aaa|bbb|ccc 999|888|777|nnn|kkk 444|666|555|eee|ttt|ooo|ppp With awk, I am trying to change the FS "|" to "; " only from the 4th field until the end (the number of fields vary between records). In order to get: 111|222|333|aaa; bbb; ccc 999|888|777|nnn; kkk... (1 Reply)
Discussion started by: beca123456
1 Replies

5. Shell Programming and Scripting

awk or sed? change field conditional on key match

Hi. I'd appreciate if I can get some direction in this issue to get me going. Datafile1: -About 4000 records, I have to update field#4 in selected records based on a match in the key field (Field#1). -Field #1 is the key field (servername) . # of Fields may vary # comment server1 bbb ccc... (2 Replies)
Discussion started by: RascalHoudi
2 Replies

6. Shell Programming and Scripting

AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2

First, thanks for the help in previous posts... couldn't have gotten where I am now without it! So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following: If $1... (4 Replies)
Discussion started by: right_coaster
4 Replies

7. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

8. Shell Programming and Scripting

awk,cut fields by change field format

Hi Everyone, # cat 1.txt 1321631,77770132976455,19,20091001011859,20091001011907 1321631,77770132976455,19,20091001011859,20091001011907 1321631,77770132976455,19,20091001011859,20091001011907 # cat 1.txt | awk -F, '{OFS=",";print $1,$3,$4,$5}' 1321631,19,20091001011859,20091001011907... (7 Replies)
Discussion started by: jimmy_y
7 Replies

9. Shell Programming and Scripting

dynamically change awk Field Separator FS

Hi All, I was wondering if anyone knew how to dynamically change the FS in awk to accept vairiable containing a field separator. the current code is as below and does not work when i introduce the dynamic FS change :-( validate_source_file() { source_file=$1 ... (2 Replies)
Discussion started by: satnamx
2 Replies

10. Shell Programming and Scripting

change field content awk

I have a line like this: I want to move HTTP/1.1 200 OK to the next line and put a blank line between the two lines i.e. How can i get it using awk? Thanks in advance (2 Replies)
Discussion started by: littleboyblu
2 Replies
Login or Register to Ask a Question