Read a lis, find items in a file from the list, change each item


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Read a lis, find items in a file from the list, change each item
# 1  
Old 12-02-2016
Read a lis, find items in a file from the list, change each item

Hello,

I have some tab delimited text data,
file: final_temp1
Code:
aname	val
NAME;r'(1,)	3.28584
r'(2,)<tab>
NAME;r'(3,)	6.13003
NAME;r'(4,)	4.18037
r'(5,)<tab>

You can see that the data is incomplete in some cases. There is a trailing tab after the first column for each incomplete row. I have added the notation above to make that clear

I also have a list of the incomplete cases.
file: incomplete_case_list
Code:
r'(2,)	
r'(5,)

What I need to do is to work through the list of incomplete cases to find the matching row in my file and alter it. I need to add "NAME;" as a prefix to the first column value, followed by tab, followed by the word "failed"
Code:
aname	val
NAME;r'(1,)	3.28584
NAME;r'(2,)	failed
NAME;r'(3,)	6.13003
NAME;r'(4,)	4.18037
NAME;r'(5,)	failed

I thought I could just loop through the incomplete file list and make sed substitutions,
Code:
# loop through incomplete file list
while read line; do 

   # remove tab from end of line
   clean_line=$(echo  $line | sed "s/\t//1")

   # create new line
   new_line='NAME;'$clean_line'\t''failed'

   # find original line and replace with modified version
   sed "s/$line/$new_line/1" final_temp1 > final_temp2

   # overwrite original file with modified file to propagate changes forward
   mv final_temp2  final_temp1

done < incomplete_case_list

I am getting a sed error,
Code:
sed: -e expression #1, char 160: Invalid range end
sed: -e expression #1, char 168: Invalid range end
sed: -e expression #1, char 134: Invalid range end

I don't think this is from the first sed command (substituting the tab) but the error is not very clear to me. In my real files, the values in the name column can have a number of characters like comma, unmatched single quotes, parenthesis, square brackets, and curly braces. I am wondering if sed is rejecting some of these characters. I tried putting double quotes around $line and $new_line in the second sed command, but that doesn't help

I tried replacing the sed line with awk,
Code:
awk -v var1="$line" -v var2="$new_line" '{gsub(var1, var2, $0); print}' final_temp1 > final_temp2

This gives me the error,
Code:
awk: cmd. line:1: (FILENAME=final_temp1 FNR=1) fatal: Invalid range end: /1-[10-(4-amino-2-methylquinolyl)decyl]-2-methyl-4-quinolylamine_4Np.mol/

The is one of the messy names from actual data. Is there something in this string that needs to be handled differently. I frequently use both sed an awk with data like the this and I have not seen this error before.

I am not sure if sed will find the pattern because the line terminates with a tab and I am not sure that is being read into "line" during the while loop. I also don't know if there is still and end of line character there or not. I suppose I could strip out all trailing whitespace character first.

The repetitive overwriting of the files is also expensive but it is unlikely that there will ever by very many entries in the incomplete_case_list.

Are there any comments on what I am doing wrong here, or a better method all together?

Thanks,

LMHmedchem

Last edited by LMHmedchem; 12-02-2016 at 01:36 AM..
# 2  
Old 12-02-2016
try this...

Code:
awk 'NF<2{$0="NAME;"$0"\tfailed"}1' incomplete_case_list
aname   val
NAME;r'(1,)     3.28584
NAME;r'(2,)     failed
NAME;r'(3,)     6.13003
NAME;r'(4,)     4.18037
NAME;r'(5,)     failed

This User Gave Thanks to itkamaraj For This Post:
# 3  
Old 12-02-2016
Quote:
Originally Posted by itkamaraj
Code:
awk 'NF<2{$0="NAME;"$0"\tfailed"}1' incomplete_case_list

Thank you for the suggestion. I don't see the name of the file that I am processing here, just the name of the file with the failed rows. Am I missing something?

The script below works and is pretty fast.
Code:
#!/bin/sh

# file with list of name with incomplete output
incomplete_case_list=$1
# file being processed (replace incomplete rows with modified data)
final_temp1=$2
# output file
final_temp2=$3

# read in fail file and create array of names
while read line; do 

   # read tab separated line into array
   unset FIELD;   IFS=$'\t' read -a FIELD <<< "$line"

   # add each name to array
   fail_list=("${fail_list[@]}" "${FIELD[0]}")

done < $incomplete_case_list

# flag to avoid second print if line was replaced
replaced='0'

# loop through all rows of file to check for fail names
# check the name for each row against all names in name array, look for match
while read line; do 

   # read tab separated line into array
   unset FIELD;   IFS=$'\t' read -a FIELD <<< "$line"

   # check current line against each element in array of fail names
   for fail_name in "${fail_list[@]}"
   do

      # check name filed (0), if a match is found, print modified line
      if [ "${FIELD[0]}" == "$fail_name" ]; then

         # output modified row to next temp file
         echo -e 'NAME;'${FIELD[0]}'\t''failed' >> $final_temp2

         # set flag to indicate row has been replaced, don't print again
         replaced='1'
      fi
   done

   # if name was not found in the  fail array, print original line
   if [ "$replaced" == '0' ]; then
      echo -e ${FIELD[0]}'\t'${FIELD[1]} >> $final_temp2
   fi
   # reinitialize flag
   replaced='0'

done < $final_temp1

For lines that were printed unchanged, I was going to just echo $line,
Code:
echo -e $line >> final_temp2

This works, but I get space delimited output and not tab. I thought that using echo -e would address that. It is almost like IFS=$'\t' read -a is converting the tabs to spaces when the line is read in. Is there a way to address that situation?

LMHmedchem
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to process a list of items and uncomment lines with that item in a second file

Hello, I have a src code file where I need to uncomment many lines. The lines I need to uncomment look like, C CALL l_r(DESNAME,DESOUT, 'Gmax', ESH(10), NO_APP, JJ) The comment is the "C" in the first column. This needs to be deleted so that there are 6 spaces preceding "CALL".... (7 Replies)
Discussion started by: LMHmedchem
7 Replies

2. Shell Programming and Scripting

Pass an array to awk to sequentially look for a list of items in a file

Hello, I need to collect some statistical results from a series of files that are being generated by other software. The files are tab delimited. There are 4 different sets of statistics in each file where there is a line indicating what the statistic set is, followed by 5 lines of values. It... (8 Replies)
Discussion started by: LMHmedchem
8 Replies

3. Shell Programming and Scripting

List content of item in the combobox

I have a combo.cgi here. this is linux environment What i am going to do is this combobox will list down all the flatfile name in this /u/test/cgi-bin/List directory. after that, i wanted it to open the flatfile and display the content of the flatfile into another listbox or textarea in this page... (0 Replies)
Discussion started by: chezy
0 Replies

4. Shell Programming and Scripting

retrieve what the currently selected item is in a dropdown select list using perl tk

I have a dropdown menu built in perl tk (I am using active state perl). I want to select a value from the dropdown menu and I want to be able to perform some other actions depending upon what value is selected. I have all the graphical part made but I dont know how to get the selected value. Any... (0 Replies)
Discussion started by: lassimanji
0 Replies

5. UNIX for Dummies Questions & Answers

Question regarding syntax of (lis) and { list;}

Hi there, As you know, we can group a list of commands in either (list) form or { list;}. I know the difference between the two. (list) make the command list to be executed in a subshell while the commands gathered in { list;} are executed in the current shell. My question here is about the... (1 Reply)
Discussion started by: hongwei
1 Replies

6. Shell Programming and Scripting

change list to comma seperated items

I have a list of servers in a file called serverlist like this server1 server2 server3 i need to have them (with no trailing comma, the program does not like that) server1,server2,server3 so far i have been using HOSTS=/tmp/serverlist HOSTS=${HOSTS:-$(grep -Ev "^#|^$"... (2 Replies)
Discussion started by: insania
2 Replies

7. Shell Programming and Scripting

find an available item in array

Dear all, I'm have a sorted array like this: 177 220 1001 2000 2001 2003 2005 notice that 2002 and 2004 are NOT in array. Then user input a number INPUT, our script should return OUTPUT value like this: if INPUT is not in array => OUTPUT=INPUT if INPUT is in array => OUTPUT is the... (4 Replies)
Discussion started by: fongthai
4 Replies

8. Shell Programming and Scripting

change some record item

Hi all, I have a file with over 10,000 line, but I would like to update/add some code number (such as 062 below) into the line with <phone number> below: 11111<name> john matin <name> 12345<phone number> 123456 <phone number> 34556 <address> 1234 lucky road <address> 11111<name> john... (7 Replies)
Discussion started by: happyv
7 Replies

9. UNIX for Dummies Questions & Answers

How to mark ALL mail items as read?

Hi guys. Does anyone know if there is a mail command that will allow you to flag all mail items as being read? I've checked out the man mail pages, but can't seem to find what I'm looking for. :confused: We are running UNIX AIX 5.2. Thanks in advance, Kev (3 Replies)
Discussion started by: Krispy
3 Replies

10. UNIX for Dummies Questions & Answers

list read only files using find

hi, how can i list read only files (for u,g,o) using find command? Thanks and Regards Vivek.s (1 Reply)
Discussion started by: vivekshankar
1 Replies
Login or Register to Ask a Question