Read a lis, find items in a file from the list, change each item
Hello,
I have some tab delimited text data,
file: final_temp1
Code:
aname val
NAME;r'(1,) 3.28584
r'(2,)<tab>
NAME;r'(3,) 6.13003
NAME;r'(4,) 4.18037
r'(5,)<tab>
You can see that the data is incomplete in some cases. There is a trailing tab after the first column for each incomplete row. I have added the notation above to make that clear
I also have a list of the incomplete cases.
file: incomplete_case_list
Code:
r'(2,)
r'(5,)
What I need to do is to work through the list of incomplete cases to find the matching row in my file and alter it. I need to add "NAME;" as a prefix to the first column value, followed by tab, followed by the word "failed"
I thought I could just loop through the incomplete file list and make sed substitutions,
Code:
# loop through incomplete file list
while read line; do
# remove tab from end of line
clean_line=$(echo $line | sed "s/\t//1")
# create new line
new_line='NAME;'$clean_line'\t''failed'
# find original line and replace with modified version
sed "s/$line/$new_line/1" final_temp1 > final_temp2
# overwrite original file with modified file to propagate changes forward
mv final_temp2 final_temp1
done < incomplete_case_list
I am getting a sed error,
Code:
sed: -e expression #1, char 160: Invalid range end
sed: -e expression #1, char 168: Invalid range end
sed: -e expression #1, char 134: Invalid range end
I don't think this is from the first sed command (substituting the tab) but the error is not very clear to me. In my real files, the values in the name column can have a number of characters like comma, unmatched single quotes, parenthesis, square brackets, and curly braces. I am wondering if sed is rejecting some of these characters. I tried putting double quotes around $line and $new_line in the second sed command, but that doesn't help
awk: cmd. line:1: (FILENAME=final_temp1 FNR=1) fatal: Invalid range end: /1-[10-(4-amino-2-methylquinolyl)decyl]-2-methyl-4-quinolylamine_4Np.mol/
The is one of the messy names from actual data. Is there something in this string that needs to be handled differently. I frequently use both sed an awk with data like the this and I have not seen this error before.
I am not sure if sed will find the pattern because the line terminates with a tab and I am not sure that is being read into "line" during the while loop. I also don't know if there is still and end of line character there or not. I suppose I could strip out all trailing whitespace character first.
The repetitive overwriting of the files is also expensive but it is unlikely that there will ever by very many entries in the incomplete_case_list.
Are there any comments on what I am doing wrong here, or a better method all together?
Thanks,
LMHmedchem
Last edited by LMHmedchem; 12-02-2016 at 12:36 AM..
Thank you for the suggestion. I don't see the name of the file that I am processing here, just the name of the file with the failed rows. Am I missing something?
The script below works and is pretty fast.
Code:
#!/bin/sh
# file with list of name with incomplete output
incomplete_case_list=$1
# file being processed (replace incomplete rows with modified data)
final_temp1=$2
# output file
final_temp2=$3
# read in fail file and create array of names
while read line; do
# read tab separated line into array
unset FIELD; IFS=$'\t' read -a FIELD <<< "$line"
# add each name to array
fail_list=("${fail_list[@]}" "${FIELD[0]}")
done < $incomplete_case_list
# flag to avoid second print if line was replaced
replaced='0'
# loop through all rows of file to check for fail names# check the name for each row against all names in name array, look for match
while read line; do
# read tab separated line into array
unset FIELD; IFS=$'\t' read -a FIELD <<< "$line"
# check current line against each element in array of fail names
for fail_name in "${fail_list[@]}"
do
# check name filed (0), if a match is found, print modified line
if [ "${FIELD[0]}" == "$fail_name" ]; then
# output modified row to next temp file
echo -e 'NAME;'${FIELD[0]}'\t''failed' >> $final_temp2
# set flag to indicate row has been replaced, don't print again
replaced='1'
fi
done
# if name was not found in the fail array, print original line
if [ "$replaced" == '0' ]; then
echo -e ${FIELD[0]}'\t'${FIELD[1]} >> $final_temp2
fi
# reinitialize flag
replaced='0'
done < $final_temp1
For lines that were printed unchanged, I was going to just echo $line,
Code:
echo -e $line >> final_temp2
This works, but I get space delimited output and not tab. I thought that using echo -e would address that. It is almost like IFS=$'\t' read -a is converting the tabs to spaces when the line is read in. Is there a way to address that situation?
Hello,
I have a src code file where I need to uncomment many lines.
The lines I need to uncomment look like,
C CALL l_r(DESNAME,DESOUT, 'Gmax', ESH(10), NO_APP, JJ)
The comment is the "C" in the first column. This needs to be deleted so that there are 6 spaces preceding "CALL".... (7 Replies)
Hello,
I need to collect some statistical results from a series of files that are being generated by other software. The files are tab delimited. There are 4 different sets of statistics in each file where there is a line indicating what the statistic set is, followed by 5 lines of values. It... (8 Replies)
I have a combo.cgi here. this is linux environment
What i am going to do is this combobox will list down all the flatfile name in this /u/test/cgi-bin/List directory.
after that, i wanted it to open the flatfile and display the content of the flatfile into another listbox or textarea in this page... (0 Replies)
I have a dropdown menu built in perl tk (I am using active state perl). I want to select a value from the dropdown menu and I want to be able to perform some other actions depending upon what value is selected. I have all the graphical part made but I dont know how to get the selected value. Any... (0 Replies)
Hi there,
As you know, we can group a list of commands in either (list) form or { list;}. I know the difference between the two. (list) make the command list to be executed in a subshell while the commands gathered in { list;} are executed in the current shell. My question here is about the... (1 Reply)
I have a list of servers in a file called serverlist like this
server1
server2
server3
i need to have them (with no trailing comma, the program does not like that)
server1,server2,server3
so far i have been using
HOSTS=/tmp/serverlist
HOSTS=${HOSTS:-$(grep -Ev "^#|^$"... (2 Replies)
Dear all,
I'm have a sorted array like this:
177
220
1001
2000
2001
2003
2005
notice that 2002 and 2004 are NOT in array.
Then user input a number INPUT, our script should return OUTPUT value like this:
if INPUT is not in array => OUTPUT=INPUT
if INPUT is in array => OUTPUT is the... (4 Replies)
Hi all,
I have a file with over 10,000 line, but I would like to update/add some code number (such as 062 below) into the line with <phone number> below:
11111<name> john matin <name>
12345<phone number> 123456 <phone number>
34556 <address> 1234 lucky road <address>
11111<name> john... (7 Replies)
Hi guys.
Does anyone know if there is a mail command that will allow you to flag all mail items as being read? I've checked out the man mail pages, but can't seem to find what I'm looking for. :confused:
We are running UNIX AIX 5.2.
Thanks in advance,
Kev (3 Replies)