Extract value from delimited file base on white list


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract value from delimited file base on white list
# 1  
Old 08-01-2010
Extract value from delimited file base on white list

I would like to use a variable to store the IDs that I would like to extract.

I would like to extract a list of values of the IDs from a delimited string. Using bash here.

file format would be
Code:
id1=we1;id2=er2;id3=rt3;id4=yu4

The number of fields and records is not fixed. There could be just one id stored, or it could go on and on. Each record is separated by a new line and Each record starts with some random redundant text that I am not interested in..
Code:
~redundant text not interested in ~
id1=1;id2=2;id3=4;
.................;id80=8

~redundant text not interested in ~
id1=1; ................;id9=11

I am trying out the code below
Code:
list="id1,id2" #extract only id1 and id2, can be changed. each id is delimited by ','

#loop the list and extract the values of the id out
for i in $list
...
#store the value (eg 1 for id1) so that I can do a comparison 
#

if the id is less than/equal 10 i would like to print the value
if the id value is more than 10.. i would like to print the id name out as well. eg (id9=11)

How do I approach this problem? Thank you.
# 2  
Old 08-01-2010
This will work for what you have described above. It will not work if the junk at the beginning of each line contains " id" or if the idn= strings contain characters that are not legal shell variables (e.g. id2.3=33). If the latter is true you'll have to use an array rather than a variable and frankly awk or ruby would be better suited (easier) than bash.

Code:
#!/usr/bin/env bash

# command line args are: ["list"] data-file

list=" id1 id2 "                # default list: leading/trailing blanks are important
if [[ $# > 1 ]]                 # assume both list and filename supplied on command line
then
        list="$1"
        list=" ${list//,/ } "           # blank separate; lead/trail blanks important
        shift
fi

while read buf
do
        buf="id${buf#* id}"     # lop off leading garbage (assuming it does not contain ' id'
        buf="${buf//;/ }"       # ditch the semicolons
        for x in $buf
        do
                name=${x%%=*}           # split idx=y into name value pair
                value=${x#*=}
                if [[ $list == *" $name "* ]]   # its in the list if true
                then
                        eval $name=$value       # capture value
                fi
        done
done <$1

# you could print these as you capture them above, but if you need to do 
# other work before printing, etc., then this illustrates how you can dig them 
# out and print them later. 
for x in $list
do
        eval value=\$$x                 # suss out captured value
        
        if [[ ! -z $value ]]            # allow for cases where requested id was not in data (silently ignore)
        then
                if (( ${x#id} > 10 ))
                then
                        echo "$x=$value"
                else
                        echo $value
                fi
        fi
done


Last edited by agama; 08-01-2010 at 04:20 PM.. Reason: Corrected a typo
This User Gave Thanks to agama For This Post:
# 3  
Old 08-01-2010
Like this ?

Code:
# ./justdoit
id1 has more value -> Printing Values.. " 1 1 "
Printing id2 Value = 2
Printing id3 Value = 4
Printing id4 Value = 1
Printing id5 Value = 2
Printing id6 Value = 4
Printing id7 Value = 1
Printing id8 Value = 21
id9 has more value -> Printing Values.. " 1 11 "
Printing id10 Value = 2
 
Printing id(s) and value(s) id11=4
Printing id(s) and value(s) id12=1
Printing id(s) and value(s) id13=2
Printing id(s) and value(s) id14=44
id15 value is nondefined
id16 value is nondefined
id17 value is nondefined
id18 value is nondefined
id19 value is nondefined

Code:
# cat infile
~redundant text not interested in ~
id1=1;id2=2;id3=4;id4=1;id5=2;id6=4;id7=1;id8=21;id9=1;id10=2;
id11=4;id12=1;id13=2;id14=44;
.................;id80=8
~redundant text not interested in ~
id1=1; ................;id9=11


Code:
 # justdoit
list="id1,id2,id3,id4,id5,id6,id7,id8,id9,id10,id11,id12,id13,id14,id15,id16,id17,id18,id19"
IFS=','
 for i in $list
  do
findit=`grep -o "$i=[0-9]*" infile | tr '\n' ' '`
   if [ `echo "$i" | sed 's/id//'` -lt 11 ] ; then
       if [ `echo "$findit" |tr -dc ' ' | wc -c` -gt 1 ] ; then
         echo $i has more value -> Printing Values.. \" `echo $findit | sed 's/id[0-9]*=//g'`\"
       else
     echo Printing $i Value = `grep -o "$i=[0-9]*" infile | sed 's/.*=//' `
       fi
   fi
        if [ "$i" = "id11" ] ; then
         echo ""
        fi
   if [ `echo $i | sed 's/id//'` -gt 10 ] ; then
     if [[ `grep -o "$i=[0-9]*" infile` == "" ]] ;then
         echo "$i value is nondefined"
          else
         echo "Printing id(s) and value(s) `grep -o "$i=[0-9]*" infile` "
     fi
   fi
  done

This User Gave Thanks to ygemici For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a content in a file with specific interval base on the delimited values using UNIX command

Hi All, we have a requirement to split a content in a text file every 5 rows and write in a new file . conditions: if 5th line falls between center of the statement . it should look upto after ";" files are below format: 1 UPDATE TABLE TEST1 SET VALUE ='AFDASDFAS' 2 WHERE... (3 Replies)
Discussion started by: KK230689
3 Replies

2. Shell Programming and Scripting

List and Delete Files which are older than 7 days, but have white spaces in file name

I need to list and delete all files in current older which are olderthan 7 days. But my file names have white spaces. Before deleting I want to list all the files, so that I can verify.find . -type f -mtime +7 | xargs ls -l {} But the ls command is the working on the files which have white... (16 Replies)
Discussion started by: karumudi7
16 Replies

3. Shell Programming and Scripting

Oracle table extract: all columns are not converting into pipe delimited in flat file

Hi All, I am writing a shell script to extract oracle table into a pipe dilemited flat file. Below is my code and I have attached two files that I have abled to generate so far. 1. Table.txt ==> database extract file 2. flat.txt ==> pipe delimited after some manipulation of the original db... (5 Replies)
Discussion started by: express14
5 Replies

4. Shell Programming and Scripting

Extract a nth field from a comma delimited file

Hi, In my file (which is "," delimited and text qualifier is "), I have to extract a particualr field. file1: 1,"aa,b",4 expected is the 2nd field: aa,b I tried the basic cut -d "," -f 2 file 1, this gave me aa alone instead aa,b. A small hint ot help on this will be very... (5 Replies)
Discussion started by: machomaddy
5 Replies

5. Shell Programming and Scripting

Extract second column tab delimited file

I have a file which looks like this: 73450 articles and news developmental psychology 2006-03-30 16:22:40 1 http://www.usnews.com 73450 articles and news developmental psychology 2006-03-30 16:22:40 2 http://www.apa.org 73450 articles and news developmental psychology 2006-03-30... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

6. Shell Programming and Scripting

bash extract all occurences delimited from <name> and </name> tags from an xml file

I need to extract all text delimited from <name> and </name> tags from an xml file, but not only first occurence. I need to extract all occurences. I've tried with this command: awk -F"<name>|</name>" 'NF>2{print $2}' but it give only first occurence. How can i modify it? (18 Replies)
Discussion started by: ingalex
18 Replies

7. Shell Programming and Scripting

Separating delimited file by pattern with exclusion list

I have a file with the contents below jan_t=jan;feb_t=feb;mar_t=mar;year=2010 jan_t=null;feb_t=feb;mar_t=mar;year=2010 jan_t=jan;feb_t=feb;mar_t=mar;year=2010 I want to extract out all the fields values ending with "_t" , however, i want to exclude feb_t and mar_t from the results In... (6 Replies)
Discussion started by: alienated
6 Replies

8. Shell Programming and Scripting

Extract lines between 2 strings add white space

I'm trying to extract all the lines between 2 strings (including the lines containing the strings) To make the strings unique I need to include white space if possible. I'm not certain how to do that. sed -n '/ string1 /,/string2/p' infile > outfile & (4 Replies)
Discussion started by: dcfargo
4 Replies

9. UNIX for Dummies Questions & Answers

Extract records by column value - file non-delimited

the data in my file is has no delimiters. it looks like this: H52082320024740010PH333200612290000930 0.0020080131 D5208232002474000120070306200703060580T1502 TT 1.00 H52082320029180003PH333200702150001 30 100.0020080205 D5208232002918000120070726200707260580T1502 ... (3 Replies)
Discussion started by: jclanc8
3 Replies

10. Shell Programming and Scripting

how to extract a tilde delimited file in unix

i have a file in unix in which datas are like this 07 01 abc data entry Z3 data entry ASSISTANT Z3 39 08 01 POD peadiatrist Z4 POD PeDIATRY Z4 67 01 operator specialist 00 operator UNSPECIFIED A0 00 ... (12 Replies)
Discussion started by: trichyselva
12 Replies
Login or Register to Ask a Question