Help to retrieve data from two files matching a string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help to retrieve data from two files matching a string
# 1  
Old 09-23-2011
Java Help to retrieve data from two files matching a string

Hello Experts,

I have come back to this forum after a while now, since require a better way to get my result.. My query is as below..

I have 3 files -- 1 Input file, 2 Data files .. Based on the input file, data has to be retreived matching from two files which has one common key..
For EX:

Input file

Code:
 
919868191075
919868191028

Data File1

Code:
 
 
subsD,00 0A 02 48 4A 6B 92 01 58 06 5E 00,hlr,common,404686600492447,919868191075,919868191075,00000000,FALSE,FALSE,3,10,11|
subsD,01 40 46 85 50 47 15 87 40 00 00 00,hlr,common,404685504715874,919868191008,919868191008,00000000,FALSE,FALSE,1,10,11|
subsD,01 40 46 86 60 04 72 06 80 00 00 00,hlr,common,404686600473790,919868191085,919868191085,00000000,FALSE,FALSE,1,10,11|
subsD,00 0A 01 4A FB A6 C4 01 CD 0E 9A 00,hlr,common,404685505866829,919868191028,919868191028,00000000,FALSE,FALSE,6,10,11|

Data file2

Code:
imsiD,404686600667311,,7F0413BE=subsD|70400F2=00 0A 02 48 4A 6B 92 01 58 06 5E 00
imsiD,404686600369463,,7F0413BE=subsD|70400F2=01 40 46 85 50 47 15 87 40 00 00 00
imsiD,404685507343909,,7F0413BE=subsD|70400F2=01 40 46 86 60 04 72 06 80 00 00 00
imsiD,404685504666094,,7F0413BE=subsD|70400F2=01 40 46 85 50 46 66 09 40 00 00 00
imsiD,404686600708986,,7F0413BE=subsD|70400F2=00 0A 01 4A FB A6 C4 01 CD 0E 9A 00

Output should be like

Code:
 
Number Data1 Data2
919868191075 000A02484A6B920158065E00 404686600667311
919868191028 000A014AFBA6C401CD0E9A00 404686600708986

First column is the string from input file(919868191075) , next is the 2nd field in CSV file1 matching string in 6th field(000A02484A6B920158065E00 ), last one is 2nd field in csv file2 here matching string would be output retreived from first file(404686600667311
)...

Thanks in advance.. Looking forward for your help...
# 2  
Old 09-23-2011
Code:
for i in `cat inputfile`
do
        val1=`grep $i datafile1 | awk -F, '{print $2}'`
        echo "$i $val1 `grep "$val1" datafile2 | awk -F, '{print $2}'`"
done

# 3  
Old 09-23-2011
Code:
printf "%10s%25s%25s\n" "Number" "Data1" "Data2"
while read -r nrs ; do
for d1 in "$(sed -n "/$nrs/p" file1|sed 's/subsD,\([^,]*\),.*/\1/')"; do
d2=$(sed -n "/$d1/s/imsiD,\([^,]*\),.*/\1/p" file2);done
d1n=$(echo "$d1"|sed 's/ //g');printf "%10s %30s %20s\n" "$nrs" "$d1n" "$d2"
done<numbers
    Number                    Data1                    Data2
919868191075       000A02484A6B920158065E00      404686600667311
919868191028       000A014AFBA6C401CD0E9A00      404686600708986

# 4  
Old 09-23-2011
@jayan_jay : Thanks a lot.. It does work for entries if exists in datafile, if it doesnt match then getting error --

Code:
 
919968531998 00 0A 02 48 4A 6B 92 01 58 06 5E 00 404686600667311
grep: RE error 41: No remembered search string.
7878787878787 
919968754673 00 0A 01 4A FB A6 C4 01 CD 0E 9A 00 404686600708986

In actual file their are millions of lines...

@ygemi: Thanks a lot , i wil check and get back to you Smilie
# 5  
Old 09-23-2011
join the files

I think join is the easiest (and most elegant) solution for this. Join requires that the files be sorted by the joining column, so ... some pre-work.

Code:
#!/bin/sh
sort input > sorted.input
sort -t, -k6 file1 > sorted.file1
tr = , <file2 | sort -t, -k6 > sorted.file2
join -t, -o 1.1,2.2,2.5 -1 1 -2 6 sorted.input sorted.file1 | sort -t, -k2 > sorted.temp1
echo "Number\tData1\tData2"
join -t, -o 1.1,1.2,2.2 -1 2 -2 6 sorted.temp1 sorted.file2 | sed 's/ //g; s/,/\t/g'

The question is: how do you want to handle lines which do not match? You can handle this in different ways, depending on whether there is a line in "input" with no line in file1, and then again the other way round if there is no line in input but some lines in file1 or file2. If you are sure there will always be pairs, then my work here is done.

But you have already hinted at lines that wasn't atched, so have a look at the options for -e as well as -a 1 and/or -a 2 in the man page of the join command to see what is possible. It is called UNPAIRABLE lines.

If the options confuse you, explain what you need and someone will surely help.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Field matching in two data files

Hello, I am looking to output all of the lines from file2 whose 11th field is present in the first field in file1. Then the second field from file1 should be appended as such: file1: 2222 0.35 4444 0.25 5555 0.75 file2: col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 1111 col1 col2... (4 Replies)
Discussion started by: palex
4 Replies

2. Shell Programming and Scripting

How can I retrieve the matching records from data file mentioned?

XYZNA0000778800Z 16123000012300321000000008000000000000000 16124000012300322000000007000000000000000 17234000012300323000000005000000000000000 17345000012300324000000004000000000000000 17456000012300325000000003000000000000000 9 XYZNA0000778900Z 16123000012300321000000008000000000000000... (8 Replies)
Discussion started by: later_troy
8 Replies

3. Shell Programming and Scripting

Compare columns of two files and retrieve data

Hi guys, I need your help. I have two files: file1 1 3 5 file2 1,XX 2,AA 3,BB 4,CC 5,DD I would like to compare the first column and where they are equal to write that output in a new file: 1,XX 3,BB (7 Replies)
Discussion started by: apenkov
7 Replies

4. Shell Programming and Scripting

Compare two files and get matching data

Hi, Can anyone help me to compare two files and get the matching data... say i have file1 and file2 ... file1 has 300 unique data with that i need to match with file2 to see how may are matching.. file2 have 1000 records. (4 Replies)
Discussion started by: zooby
4 Replies

5. Shell Programming and Scripting

Matching string on two files based on match rules.

Hi, How to check if a string on file2 exactly matches with a part or complete string on file1, and return a match indicator based on some match rules. 1) only records on file1 with category A should be matched. for other category, the output match indicator should default to 'N' 2) on file2... (13 Replies)
Discussion started by: effay
13 Replies

6. Shell Programming and Scripting

search string in a file and retrieve 10 lines including string line

Hi Guys, I am trying to write a perl script to search a string "Name" in the file "FILE" and also want to create a new file and push the searched string Name line along with 10 lines following the same. can anyone of you please let me know how to go about it ? (8 Replies)
Discussion started by: sukrish
8 Replies

7. Shell Programming and Scripting

How to find the matching data b/w 2 files in perl?

Hi friends,, i have find the matching data between 2files. My file1 have a data like rs3001336 rs3984736 rs2840532 File2 have a data like rs3736330 1 2359237 A G 0.28 1.099 0.010 rs2840532 1 2359977 G A 0.363 0.3373 1.123 rs3001336 1 ... (4 Replies)
Discussion started by: sureshraj
4 Replies

8. Programming

How to find the matching data b/w 2 files in perl?

Hi friends,, i have find the matching data between 2files. My file1 have a data like rs3001336 rs3984736 rs2840532 File2 have a data like rs3736330 1 2359237 A G 0.28 1.099 0.010 rs2840532 1 2359977 G A 0.363 0.3373 1.123 rs3001336 1 2365193 G A 0.0812 0.07319 1.12 ... (1 Reply)
Discussion started by: sureshraj
1 Replies

9. Shell Programming and Scripting

matching string in two files of different length

Dear all, I have the following problem (it originates in the domain of bio-inf, but it is a general problem). I have two files of one column each and of different length: a.txt and b.txt. a.txt contains alphanumeric strings (around 30 digit) and there are 300 rows b.txt contains alphanumeric... (2 Replies)
Discussion started by: ad_meis
2 Replies

10. Shell Programming and Scripting

Retrieve data from a file

Hello guys I want to retrieve two data from a file, like this: bash-2.03$ cat numtest 123456 123457 bash-2.03$ more ./test_num #!/bin/bash num1= num2= cnt=1 while read x do num${cnt}=$x cnt=$(($cnt+1)) done <$1 echo $num1 "\n" $num2 But when i executed this script, error... (2 Replies)
Discussion started by: tpltp
2 Replies
Login or Register to Ask a Question