Search, and add if present

02-07-2014

Registered User

6, 0

Join Date: Aug 2011

Last Activity: 10 February 2014, 2:11 AM EST

Posts: 6

Thanks Given: 5

Thanked 0 Times in 0 Posts

Search, and add if present

Dear All,

I have to find a way to reorganize a table file according to the last column. The input file looks like this:

Code:

cat Input1.txt:
ID:12:23:00Q    EU232    2342    234    123    231    aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    ac3

My output file should contain the information of the last column in newly created columns like this:

Code:

cat Output:
ID:12:23:00Q    EU232    2342    234    123    231    aa1    ab2    na aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    na    ab2    ac3    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    aa1    na    ac3    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    na    na    ac3    ac3

My solution:
In a first step I introduced three new columns containing "na" values.

Code:

awk '{ print $1,$2,$3,$4,$5,$6,"na","na","na",$7}' input1.txt > input2.txt

This resulted in the following output:

Code:

cat Input2.txt
ID:12:23:00Q    EU232    2342    234    123    231    na    na    na    aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    na    na    na    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    na    na    na    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    na    na    na    ac3

Now, I replaced the "na" if present in the last column.

Code:

awk '/aa1/{gsub($7, "aa1")};{print}' input2.txt > output_aa1.txt
awk '/ab2/{gsub($8, "ab2")};{print}' output_aa1.txt > output_aa1_ab2.txt
awk '/ac3/{gsub($9, "ac3")};{print}' output_aa1_ab2.txt >

This kind of works but is only feasible for a limited number of items (in my example three). Is there a way to upscale it? Something along the lines:

Code:

for ITEM in `cat item.list`
do
   ???
done

Thanks for your help!

loba

View Public Profile for loba

Find all posts by loba

02-07-2014

Registered User

320, 81

Join Date: Aug 2009

Last Activity: 14 May 2019, 11:07 AM EDT

Location: France

Posts: 320

Thanks Given: 19

Thanked 81 Times in 76 Posts

Nothing fancy in the below script, just an all-in-one task:

Code:

awk '{$10=$7;$7=$8=$9="na"}
$10~/aa1/{$7="aa1"}
$10~/ab2/{$8="ab2"}
$10~/ac3/{$9="ac3"}
1' input_file > output_file

Last edited by tukuyomi; 02-07-2014 at 10:04 AM..

These 2 Users Gave Thanks to tukuyomi For This Post:

tukuyomi

View Public Profile for tukuyomi

Find all posts by tukuyomi

02-08-2014

Registered User

6, 0

Join Date: Aug 2011

Last Activity: 10 February 2014, 2:11 AM EST

Posts: 6

Thanks Given: 5

Thanked 0 Times in 0 Posts

Dear tukuyomi,

thank you for you help. The problem is that my item list contains a few hundred entries for each position. Meaning at position $7 I have over 400, at position $8 there are about 100 and $9 has > 800. Any idea how I could do this?

loba

View Public Profile for loba

Find all posts by loba

02-08-2014

Registered User

320, 81

Join Date: Aug 2009

Last Activity: 14 May 2019, 11:07 AM EDT

Location: France

Posts: 320

Thanks Given: 19

Thanked 81 Times in 76 Posts

Just to be clear : input_file's 7th column is what you call 'item list', isn't it?
Do you also mean that you might have aa1 as well as foo, bar, and etc entries only for $7? ab2, fool, bars, etcs for $8, ...?
Thanks for clarifying.

tukuyomi

View Public Profile for tukuyomi

Find all posts by tukuyomi

02-08-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by loba

Dear tukuyomi,

thank you for you help. The problem is that my item list contains a few hundred entries for each position. Meaning at position $7 I have over 400, at position $8 there are about 100 and $9 has > 800. Any idea how I could do this?

When we see one of your ~1300 values, how do we know whether it is supposed to go into field 7, 8, or 9?

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

02-08-2014

Registered User

6, 0

Join Date: Aug 2011

Last Activity: 10 February 2014, 2:11 AM EST

Posts: 6

Thanks Given: 5

Thanked 0 Times in 0 Posts

Dear tukuyomi,

yes, position $7 can have a list of possible values not just aa1. The same is true for $8 and $9. I was thinking I could first work on position $7. Save the file and continue with the next one. Your suggestion works the problem is I would have to create a text file for all possible values, safe it and run it. I was wondering if you know a better way? I tired a for loop (and an array) to get all the items for e.g. $7 but it did not work.

---------- Post updated at 12:37 PM ---------- Previous update was at 12:31 PM ----------

Dear Don Cragun,

You are right. I was thinking of doing one list after the other. Like I tried to describe in my last post.

Code:

for ITEMS in 'cat item_at_position_7.list'
do
   ???
done < in.txt > out7.txt

I know the for loop does not work

loba

View Public Profile for loba

Find all posts by loba

02-08-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Assuming that you have files item_at_position[789].list containing valid values for each of those fields (one value per line), such as:
item_at_position7.list:

Code:

aa1
aa2
aa3
aa4

item_at_position8.list:

Code:

ab1
ab2
ab3
ab4

and item_at_position9.list:

Code:

ac1
ac2
ac3
ac4

and an input file (Input1.txt) containing:

Code:

ID:12:23:00Q    EU232    2342    234    123    231    aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    ac3
ID:12:34:00D    DWC11    1    2    3    4    aa1;aa2;abc;ac1
ID:23:45:00D    DWC22    5    6    7    8    ad1;aa1;ab2;ac3
ID:23:59:00D    DWC33    9    10    11    12    aa1;aa2;aa3;ab2;ab3;ab4;ac1;ac3;ac4

then the following awk script:

Code:

awk -v outf="Output.txt" '
BEGIN { OFS="    " }
# Replace "na" in field fieldnum with value.
# With alternative else clause, print diagnostic if the field has already been set.
function add(value, fieldnum) {
        if($fieldnum == "na")   $fieldnum = value
        else                    $fieldnum = $fieldnum ";" value
# Replace above line with the following four lines if only one value is allowed per output field.
#       else {  printf("Line %d, multiple value %s for field %d dropped\n",
#                       FNR, value, fieldnum)
#               exitcode = 1
#       }
}
FNR == 1 {      # Increment input file number...
        file++
}
file <= 3 {     # Read values from one of the list files...
        list[file,$1]
        next
}
                # Process main input file...
{       # Split the last field into list values.
        n = split($7, values, /;/)
        # Initialize last four fields.
        $10 = $7
        $7 = $8 = $9 = "na"
        # Process values found on this line.
        for(i = 1; i <= n; i++)
                if((1,values[i]) in list) add(values[i], 7)
                else if((2,values[i]) in list) add(values[i], 8)
                else if((3,values[i]) in list) add(values[i], 9)
                else {  printf("Line %d: value: %s not recognized\n",
                                FNR, values[i])
                        exitcode = 1
                }
        # Print the updated line.
        print > outf
}
END {   exit exitcode
}' item_at_position_[789].list Input1.txt >&2

prints the following diagnostic to the standard error output:

Code:

Line 5: value: abc not recognized
Line 6: value: ad1 not recognized

and stores the following output in Output.txt:

Code:

ID:12:23:00Q    EU232    2342    234    123    231    aa1    ab2    na    aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    na    ab2    ac3    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    aa1    na    ac3    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    na    na    ac3    ac3
ID:12:34:00D    DWC11    1    2    3    4    aa1;aa2    na    ac1    aa1;aa2;abc;ac1
ID:23:45:00D    DWC22    5    6    7    8    aa1    ab2    ac3    ad1;aa1;ab2;ac3
ID:23:59:00D    DWC33    9    10    11    12    aa1;aa2;aa3    ab2;ab3;ab4    ac1;ac3;ac4    aa1;aa2;aa3;ab2;ab3;ab4;ac1;ac3;ac4

and, with your original sample input file, stores the foliowing output in Output.txt:

Code:

ID:12:23:00Q    EU232    2342    234    123    231    aa1    ab2    na    aa1;ab2
ID:11:22:00E    EU112    1232    211    112    233    na    ab2    ac3    ab2;ac3
ID:19:24:00S    EU121    569    100    101    244    aa1    na    ac3    aa1;ac3
ID:11:33:00S    EU456    332    120    99    221    na    na    ac3    ac3

without producing any diagnostic messages.

Is this what you wanted to do?

If you want to try this on a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of the default /usr/bin/awk.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Search, and add if present

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

UNIX command to ignore replacing a search string if it is already present

Discussion started by: mehimadri12

2. Shell Programming and Scripting

Search a string in a file which is also present in another file in UNIX

Discussion started by: raj028

3. Shell Programming and Scripting

Add comment if not present

Discussion started by: anil510

4. UNIX for Dummies Questions & Answers

How to add to the search path - bin?

Discussion started by: OmarKN

5. UNIX for Dummies Questions & Answers

Search and add the column in the file

Discussion started by: armsaran

6. Shell Programming and Scripting

search for string and add the second line below

Discussion started by: hcclnoodles

7. Shell Programming and Scripting

Search a pattern and add new line below

Discussion started by: jayadanabalan

8. UNIX for Dummies Questions & Answers

Search Pattern and add column

Discussion started by: harjitsingh

9. Shell Programming and Scripting

search pattern present in second field

Discussion started by: imas

10. UNIX for Dummies Questions & Answers

search a word in a file present in tar file

Discussion started by: junkbuster