Search, and add if present


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search, and add if present
# 1  
Old 02-07-2014
Search, and add if present

Dear All,

I have to find a way to reorganize a table file according to the last column. The input file looks like this:
Code:
cat Input1.txt:
ID:12:23:00Q    EU232    2342    234    123    231    aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    ac3

My output file should contain the information of the last column in newly created columns like this:
Code:
cat Output:
ID:12:23:00Q    EU232    2342    234    123    231    aa1    ab2    na aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    na    ab2    ac3    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    aa1    na    ac3    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    na    na    ac3    ac3

My solution:
In a first step I introduced three new columns containing "na" values.
Code:
awk '{ print $1,$2,$3,$4,$5,$6,"na","na","na",$7}' input1.txt > input2.txt

This resulted in the following output:
Code:
cat Input2.txt
ID:12:23:00Q    EU232    2342    234    123    231    na    na    na    aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    na    na    na    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    na    na    na    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    na    na    na    ac3

Now, I replaced the "na" if present in the last column.

Code:
awk '/aa1/{gsub($7, "aa1")};{print}' input2.txt > output_aa1.txt
awk '/ab2/{gsub($8, "ab2")};{print}' output_aa1.txt > output_aa1_ab2.txt
awk '/ac3/{gsub($9, "ac3")};{print}' output_aa1_ab2.txt >

This kind of works but is only feasible for a limited number of items (in my example three). Is there a way to upscale it? Something along the lines:

Code:
for ITEM in `cat item.list`
do
   ???
done

Thanks for your help!
# 2  
Old 02-07-2014
Nothing fancy in the below script, just an all-in-one task:
Code:
awk '{$10=$7;$7=$8=$9="na"}
$10~/aa1/{$7="aa1"}
$10~/ab2/{$8="ab2"}
$10~/ac3/{$9="ac3"}
1' input_file > output_file


Last edited by tukuyomi; 02-07-2014 at 10:04 AM..
These 2 Users Gave Thanks to tukuyomi For This Post:
# 3  
Old 02-08-2014
Dear tukuyomi,

thank you for you help. The problem is that my item list contains a few hundred entries for each position. Meaning at position $7 I have over 400, at position $8 there are about 100 and $9 has > 800. Any idea how I could do this?
# 4  
Old 02-08-2014
Just to be clear : input_file's 7th column is what you call 'item list', isn't it?
Do you also mean that you might have aa1 as well as foo, bar, and etc entries only for $7? ab2, fool, bars, etcs for $8, ...?
Thanks for clarifying.
# 5  
Old 02-08-2014
Quote:
Originally Posted by loba
Dear tukuyomi,

thank you for you help. The problem is that my item list contains a few hundred entries for each position. Meaning at position $7 I have over 400, at position $8 there are about 100 and $9 has > 800. Any idea how I could do this?
When we see one of your ~1300 values, how do we know whether it is supposed to go into field 7, 8, or 9?
# 6  
Old 02-08-2014
Dear tukuyomi,

yes, position $7 can have a list of possible values not just aa1. The same is true for $8 and $9. I was thinking I could first work on position $7. Save the file and continue with the next one. Your suggestion works the problem is I would have to create a text file for all possible values, safe it and run it. I was wondering if you know a better way? I tired a for loop (and an array) to get all the items for e.g. $7 but it did not work.

---------- Post updated at 12:37 PM ---------- Previous update was at 12:31 PM ----------

Dear Don Cragun,

You are right. I was thinking of doing one list after the other. Like I tried to describe in my last post.

Code:
for ITEMS in 'cat item_at_position_7.list'
do
   ???
done < in.txt > out7.txt

I know the for loop does not work Smilie
# 7  
Old 02-08-2014
Assuming that you have files item_at_position[789].list containing valid values for each of those fields (one value per line), such as:
item_at_position7.list:
Code:
aa1
aa2
aa3
aa4

item_at_position8.list:
Code:
ab1
ab2
ab3
ab4

and item_at_position9.list:
Code:
ac1
ac2
ac3
ac4

and an input file (Input1.txt) containing:
Code:
ID:12:23:00Q    EU232    2342    234    123    231    aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    ac3
ID:12:34:00D    DWC11    1    2    3    4    aa1;aa2;abc;ac1
ID:23:45:00D    DWC22    5    6    7    8    ad1;aa1;ab2;ac3
ID:23:59:00D    DWC33    9    10    11    12    aa1;aa2;aa3;ab2;ab3;ab4;ac1;ac3;ac4

then the following awk script:
Code:
awk -v outf="Output.txt" '
BEGIN { OFS="    " }
# Replace "na" in field fieldnum with value.
# With alternative else clause, print diagnostic if the field has already been set.
function add(value, fieldnum) {
        if($fieldnum == "na")   $fieldnum = value
        else                    $fieldnum = $fieldnum ";" value
# Replace above line with the following four lines if only one value is allowed per output field.
#       else {  printf("Line %d, multiple value %s for field %d dropped\n",
#                       FNR, value, fieldnum)
#               exitcode = 1
#       }
}
FNR == 1 {      # Increment input file number...
        file++
}
file <= 3 {     # Read values from one of the list files...
        list[file,$1]
        next
}
                # Process main input file...
{       # Split the last field into list values.
        n = split($7, values, /;/)
        # Initialize last four fields.
        $10 = $7
        $7 = $8 = $9 = "na"
        # Process values found on this line.
        for(i = 1; i <= n; i++)
                if((1,values[i]) in list) add(values[i], 7)
                else if((2,values[i]) in list) add(values[i], 8)
                else if((3,values[i]) in list) add(values[i], 9)
                else {  printf("Line %d: value: %s not recognized\n",
                                FNR, values[i])
                        exitcode = 1
                }
        # Print the updated line.
        print > outf
}
END {   exit exitcode
}' item_at_position_[789].list Input1.txt >&2

prints the following diagnostic to the standard error output:
Code:
Line 5: value: abc not recognized
Line 6: value: ad1 not recognized

and stores the following output in Output.txt:
Code:
ID:12:23:00Q    EU232    2342    234    123    231    aa1    ab2    na    aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    na    ab2    ac3    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    aa1    na    ac3    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    na    na    ac3    ac3
ID:12:34:00D    DWC11    1    2    3    4    aa1;aa2    na    ac1    aa1;aa2;abc;ac1
ID:23:45:00D    DWC22    5    6    7    8    aa1    ab2    ac3    ad1;aa1;ab2;ac3
ID:23:59:00D    DWC33    9    10    11    12    aa1;aa2;aa3    ab2;ab3;ab4    ac1;ac3;ac4    aa1;aa2;aa3;ab2;ab3;ab4;ac1;ac3;ac4

and, with your original sample input file, stores the foliowing output in Output.txt:
Code:
ID:12:23:00Q    EU232    2342    234    123    231    aa1    ab2    na    aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    na    ab2    ac3    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    aa1    na    ac3    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    na    na    ac3    ac3

without producing any diagnostic messages.

Is this what you wanted to do?

If you want to try this on a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of the default /usr/bin/awk.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

UNIX command to ignore replacing a search string if it is already present

Hello, I am searching the following string Folder^ in a file and replacing it with Folder^/ However if the file already contains Folder^/ I want to avoid replacing it with Folder^// To do this I have to do the following today: 1) echo "Folder^" | sed 's/Folder\^/Folder\^\//g' I get... (2 Replies)
Discussion started by: mehimadri12
2 Replies

2. Shell Programming and Scripting

Search a string in a file which is also present in another file in UNIX

Hi there, I am new to Unix and had below requirement to finish my task. I have file1.dat which has data as shown below. case1.txt case2.txt case3.txt case4.txt file1.dat has only file names I have folder which has above files mentioned in file1.dat ./all_files case1.txt... (6 Replies)
Discussion started by: raj028
6 Replies

3. Shell Programming and Scripting

Add comment if not present

I have a file cat /root/file #import node1 #import node2 import node2 import node4 After sed/awk operation the file should be as follows cat /root/file #import node1 #import node2 #import node2 #import node4 (6 Replies)
Discussion started by: anil510
6 Replies

4. UNIX for Dummies Questions & Answers

How to add to the search path - bin?

Hi, Now I have: /Users/okn/bin for my private shell scripts. How do I add /Users/okn/bin to my PATH? The PATH is right now: /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin The .bash_profile doesn't state anything about a PATH (yet). I read about this: PATH=/bin:$PATH
 export PATH or... (1 Reply)
Discussion started by: OmarKN
1 Replies

5. UNIX for Dummies Questions & Answers

Search and add the column in the file

Hi All, I have the Overview.csv file like below format Message ID Sendout Group Name Email Subject Name Type Rcpts Responses Response Rate Open Rate Click Rate 2000009723 01-22-2014 16:14 Test_GroupPQA2013 000123@yahoo.com INFO RISQUE D'INONDATION... (3 Replies)
Discussion started by: armsaran
3 Replies

6. Shell Programming and Scripting

search for string and add the second line below

Hi there, i have an /etc/hosts file that is organised in sections, like this # # Oracle Servers # 1.1.1.1 boxa 2.2.2.2 boxb 9.9.9.9 boxj # # Prod Sybase Servers # 6.6.6.6 boxt 4.4.4.4 boxz I am just trying to write a line of code that will ill be able to pass the comment block... (3 Replies)
Discussion started by: hcclnoodles
3 Replies

7. Shell Programming and Scripting

Search a pattern and add new line below

Hi, I have 2 files like below. File A: apple mango File B: start abc def apple ghi end start cba fed (4 Replies)
Discussion started by: jayadanabalan
4 Replies

8. UNIX for Dummies Questions & Answers

Search Pattern and add column

Hi, I have two files. file1 contents: aaa bbb ccc ddd eee fff ggg ddd www eee ggg dde qqq zzz hhh ddd file2 contents: mmm mmm mmm mmm Now I want to add file2 contents to end of lines in file1 where a line contains pattern "ddd" and it should look like this: file3 contents: aaa... (3 Replies)
Discussion started by: harjitsingh
3 Replies

9. Shell Programming and Scripting

search pattern present in second field

Hi All, I have a file with following list. example 1 ======== cat 1.txt -------- 0000cab4752c 0000dab47c2c ... ... ... Also i have another file 2.txt in which the data is in this format as shown: cat 2.txt ---------... (6 Replies)
Discussion started by: imas
6 Replies

10. UNIX for Dummies Questions & Answers

search a word in a file present in tar file

i need to search a word in the tar file. if it is present just give me the file name please help me (1 Reply)
Discussion started by: junkbuster
1 Replies
Login or Register to Ask a Question