Find matches and write the data before it


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find matches and write the data before it
# 1  
Old 11-07-2012
Find matches and write the data before it

Hi all

I am here for help once again

I have two files

One file is like this with one columns

Code:
F2
B2
CAD
KGM
HTC
CSP


Second file is like this in 5 columns where firs column contain sometime entries of first file with space and other entries


Code:
F2 XYZ CDT CAD          it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

KGM HTC CSP      it is part of agriculture    it is part of university   it is part of ...             it is used for....

If there is a match then I have to separate like this in 5 columns

Code:
F2  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 
CAD  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 


KGM it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

HTC  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

CSP  it is part of agriculture    it is part of university   it is part of ...             it is used for....

please help me out
# 2  
Old 11-07-2012
awk

Code:
gawk '{
if(NR==FNR){
	_[$1] = 1
}
else{
	for(i=1;i<=NF;i++){
		if(_[$i] == 1){
			for(j=i;j<=NF;j++){
				printf $j" "
			}
			print ""
		}
	}
}
}
' a b

# 3  
Old 11-08-2012
Thankyou very much dear.

Its seemd good code but its not working completely as my output is like this if F2 matches or HTc matches

Code:
F2 XYZ CDT CAD          it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

KGM HTC CSP      it is part of agriculture    it is part of university   it is part of ...             it is used for...

But I want to remove other non matched entries of first column so that output wilbe

Code:
F2  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 
CAD  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 


KGM it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

HTC  it is part of agriculture    it is part of university   it is part of ...             it is used for.... 

CSP  it is part of agriculture    it is part of university   it is part of ...             it is used for....

Means there should be only matched entry in the first columnin the output.

Guide me please if possible
# 4  
Old 11-08-2012
awk

Hi,

Try this one,

Code:
awk 'BEGIN{FS=OFS="\t";}NR==FNR{a[$0]=1;next;}{split($1,f," ");for(i=1;i<=length(f);i++){p=f[i];if(a[p]==1){print p,$2,$3,$4,$5;}}}' file1 file2

Assumptions:

1. The field separator is tab(\t).
2. The field length is fixed(5 fields).

Cheers,
Ranga Smilie
# 5  
Old 11-08-2012
Hi

Thanks for reply.

but this time output file is completely blank!Smilie


but yeah, in the input second file there are more than 5 columns therefore, what I wanted is just write whatever is front of common match is present as it is and for sure in columns as input!

And, I checked in the previous output file there are not at all any columns rather entries of 5 columns are row wise..

and regarding tab seaparation entries are like this here each colur represent each column so in input file there are 8 columns.

Quote:
FCGR2A FCGR2B FCGR2C EGFR FCGR3B C1R C1QA C1QB C1QC FCGR3A C1S FCGR1A Cetuximab Erbitux FCGR2A FCGR2B FCGR2C EGFR FCGR3B C1R C1QA C1QB C1QC FCGR3A C1S FCGR1A Cetuximab binds to the epidermal growth factor receptor (EGFr) on both normal and tumor cells. EGFr is over-expressed in many colorectal cancers. Cetuximab competitively inhibits the binding of epidermal growth factor (EGF) and TGF alpha, thereby reducing their effects on cell growth and metastatic spread. Epidermal growth factor receptor binding FAB. Cetuximab is composed of the Fv (variable; antigen-binding) regions of the 225 murine EGFr monoclonal antibody specific for the N-terminal portion of human EGFr with human IgG1 heavy and kappa light chain constant (framework) regions. For treatment of EGFR-expressing metastatic colorectal cancer in patients who are refractory to other irinotecan-based chemotherapy regimens. Cetuximab is also indicated for treatment of squamous cell carcinoma of the head and neck in conjucntion with radiation therapy. Used in the treatment of colorectal cancer, cetuximab binds specifically to the epidermal growth factor receptor (EGFr, HER1, c-ErbB-1) on both normal and tumor cells. EGFr is over-expressed in many colorectal cancers. Cetuximab competitively inhibits the binding of epidermal growth factor (EGF) and other ligands, such as transforming growth factor-alpha. Binding of cetuximab to the EGFr blocks phosphorylation and activation of receptor-associated kinases, resulting in inhibition of cell growth, induction of apoptosis, decreased matrix metalloproteinase secretion and reduced vascular endothelial growth factor production.
so if FCGR2A is present in first file then output will be

Quote:
FCGR2A etuximab Erbitux FCGR2A FCGR2B FCGR2C EGFR FCGR3B C1R C1QA C1QB C1QC FCGR3A C1S FCGR1A Cetuximab binds to the epidermal growth factor receptor (EGFr) on both normal and tumor cells. EGFr is over-expressed in many colorectal cancers. Cetuximab competitively inhibits the binding of epidermal growth factor (EGF) and TGF alpha, thereby reducing their effects on cell growth and metastatic spread. Epidermal growth factor receptor binding FAB. Cetuximab is composed of the Fv (variable; antigen-binding) regions of the 225 murine EGFr monoclonal antibody specific for the N-terminal portion of human EGFr with human IgG1 heavy and kappa light chain constant (framework) regions. For treatment of EGFR-expressing metastatic colorectal cancer in patients who are refractory to other irinotecan-based chemotherapy regimens. Cetuximab is also indicated for treatment of squamous cell carcinoma of the head and neck in conjucntion with radiation therapy. Used in the treatment of colorectal cancer, cetuximab binds specifically to the epidermal growth factor receptor (EGFr, HER1, c-ErbB-1) on both normal and tumor cells. EGFr is over-expressed in many colorectal cancers. Cetuximab competitively inhibits the binding of epidermal growth factor (EGF) and other ligands, such as transforming growth factor-alpha. Binding of cetuximab to the EGFr blocks phosphorylation and activation of receptor-associated kinases, resulting in inhibition of cell growth, induction of apoptosis, decreased matrix metalloproteinase secretion and reduced vascular endothelial growth factor production.
# 6  
Old 11-08-2012
hmmm seems complex!
# 7  
Old 11-08-2012
Considering your inputs from post 1 this should work..

Code:
awk 'NR==FNR{X[$1]=$0;next}{n=split($1,P," ");sub($1,"",$0);for(i=1;i<=n;i++){if(X[P[i]]){print P[i],$0}}}' file1 FS="  +" file2

If not, Please provide real inputs from your files.

pamu
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find a text and if condition matches then replace it

Need a script that can find text in a file and replace it accordingly. This is the file I have: while IFS=',' read -r f1 f2 f3 do { nohup /home/testuser/dbmaintenance/sys_offline_maintenance.sh $f1 $f2 $f3 > $f2.out & } done < "/home/testuser/dbmaintenance/week1offlineserver.txt" In... (4 Replies)
Discussion started by: singhhe
4 Replies

2. Shell Programming and Scripting

Find Data in test file and write each out to a line

I have a .csv file that has been create from a google form and I need to extract the data from it that has been entered by users. The CSV will have anywhere between 100 and 1000 lines which comprise entr data for a sports carnival A few typical line is shown here to show the problem I have ... (19 Replies)
Discussion started by: kcpoole
19 Replies

3. Shell Programming and Scripting

Find common entries in 2 list and write data before it

Hi all, I have 2 files: second file I want if entries in one file will match in other file. It shuld wite approve before it so output shuld be (1 Reply)
Discussion started by: manigrover
1 Replies

4. Shell Programming and Scripting

zgrep cannot find all matches

$ cat 1.csv abc in csv $ cat 1.mnf abc in mnf $ zip 1.zip 1.csv 1.mnf adding: 1.csv (stored 0%) adding: 1.mnf (stored 0%) $ zgrep abc 1.zip abc in csv How come zgrep cannot find "abc in mnf"? Thanks in advance. (5 Replies)
Discussion started by: carloszhang
5 Replies

5. Homework & Coursework Questions

Shell script calling Perl function, sort and find data, write to new files

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: I must write a shell script that calls two external Perl functions--one of which sorts the data in a file, and... (6 Replies)
Discussion started by: kowit010
6 Replies

6. Shell Programming and Scripting

Find directories that contains more than n matches of a certain filename

I need to construct a command that finds directories which contains more than n matches of a certain filename. E.g. I have many directories at different locations and want to find all directories that has 2 or more .dat-files. I thought of using find and maybe the exec parameter to issue an... (5 Replies)
Discussion started by: odyssey
5 Replies

7. Shell Programming and Scripting

find/grep returns no matches

Hi all! I've faced with very unintelligible error using find/grep like this: root@v29221:~# find /var/www/igor/data/www/lestnitsa.ru | grep u28507I get nothing as a result, but: root@v29221:~# grep u28507 /var/www/igor/data/www/lestnitsa.ru/_var.inc $db_name = 'u28507';... (2 Replies)
Discussion started by: ulrith
2 Replies

8. Shell Programming and Scripting

find matches in file

Hi, im have log file ~100000 lines, 192.168.29.1 at 10/08/09 13:58:55 192.168.60.1 at 10/08/09 14:11:28 192.168.58.171 at 10/08/09 14:12:45 192.168.61.12 at 10/08/09 14:15:44 192.168.60.1 at 10/08/09 14:16:36 192.168.60.1 at 10/08/09 14:17:43 192.168.61.12 at 10/08/09 14:18:08... (9 Replies)
Discussion started by: Trump
9 Replies

9. Shell Programming and Scripting

Find all files with group read OR group write OR user write permission

I need to find all the files that have group Read or Write permission or files that have user write permission. This is what I have so far: find . -exec ls -l {} \; | awk '/-...rw..w./ {print $1 " " $3 " " $4 " " $9}' It shows me all files where group read = true, group write = true... (5 Replies)
Discussion started by: shunter63
5 Replies

10. Shell Programming and Scripting

Extarcting data from a file that matches two different patterns

Say I have data like shown below. I want all the data that contains XXXX and where it finds XXXX then I also want the Timestamp and the Error fields. I have tried egrep "XXXX|Timestamp" > test.txt. That pulls out the data correctly but gives me two records for each message and it also pulls the... (4 Replies)
Discussion started by: gugs
4 Replies
Login or Register to Ask a Question