Extract Pattern Sequence


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract Pattern Sequence
# 1  
Old 12-10-2007
Java Extract Pattern Sequence

Dear Collegues
I have to extract Some pattern from raw text file using perl

The input will be raw text.
Pattern to get - Sequence of Capital Letter Words ( e.g. he is working in Center for Perl Studies. He will come tomorrow...) from thos I have to extract sequences like "Center for Perl Studies " and "C.P.S".

Any logic to impliment it in perl

Jaganadh G
Linguist
# 2  
Old 12-10-2007
Define sequence. You left "He" out of processing. Do you require 2,3,4...? items to be capitalized?
# 3  
Old 12-10-2007
sequence : any sequence of Cpital letter e.g name of companies etc and abbriviations.
I have to extract Name of Companies and Abbriviations from a text using perl

Jaganadh G
# 4  
Old 12-10-2007
It is a very non-trivial problem,
for example:


Code:
billym.>X="he is working in Center for Perl Studies. He will come tomorr>
billym.>echo $X |perl -pe 's/([A-Z])\S*/$1\./g'                          
he is working in C. for P. S. H. will come tomorrow.

you could have a large list of ignore words maybe, like for, the, he, it, and

interesting, will I get paid?
# 5  
Old 12-10-2007
Code:
#!/usr/bin/perl
#
my @arraytest = ("Test", "for", "Capital", "letters");
my @resulttest = grep /[A-Z]/, @arraytest;
print "@resulttest\n"

# 6  
Old 12-10-2007
from the sentence i have to get Center for Perl Studies only.
Sequences like Center for Perl Studies
Jaganadh G
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to find a specific sequence pattern in a fasta file?

I have to mine the following sequence pattern from a large fasta file namely gene.fasta (contains multiple fasta sequences) along with the flanking sequences of 5 bases at starting position and ending position, AAGCZ-N16-AAGCZ Z represents A, C or G (Except T) N16 represents any of the four... (3 Replies)
Discussion started by: dineshkumarsrk
3 Replies

2. Shell Programming and Scripting

Extract whole word preceding a specific character pattern with first occurence of the pattern

Hello. Here is a file contents : declare -Ax NEW_FORCE_IGNORE_ARRAY=(="§" ="§" ="§" ="§" ="§" .................. ="§"Here is a pattern =I want to extract 'NEW_FORCE_IGNORE_ARRAY' which is the whole word before the first occurrence of pattern '=' Is there a better solution than mine :... (3 Replies)
Discussion started by: jcdole
3 Replies

3. Shell Programming and Scripting

Extract distinc sequence of letters

Hallo, I need to extract distinct sequence of letters for example from 136 to 193 Files are quite big, so I would prefer not to use "fold -w1" Thank you very much Input file look like this: 1 cttttacctt catgtgtttt tgcagatatt tgttcataat aacatcttct ttttaagtta 61 ttaaaatctt... (4 Replies)
Discussion started by: kamcamonty
4 Replies

4. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

6. Shell Programming and Scripting

Search for a pattern,extract value(s) from next line, extract lines having those extracted value(s)

I have hundreds of files to process. In each file I need to look for a pattern then extract value(s) from next line and then search for value(s) selected from point (2) in the same file at a specific position. HEADER ELECTRON TRANSPORT 18-MAR-98 1A7V TITLE CYTOCHROME... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

7. Shell Programming and Scripting

sed: Find start of pattern and extract text to end of line, including the pattern

This is my first post, please be nice. I have tried to google and read different tutorials. The task at hand is: Input file input.txt (example) abc123defhij-E-1234jslo 456ujs-W-abXjklp From this file the task is to grep the -E- and -W- strings that are unique and write a new file... (5 Replies)
Discussion started by: TestTomas
5 Replies

8. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat... (7 Replies)
Discussion started by: solli
7 Replies

9. Shell Programming and Scripting

pattern extract

Hi I have a pattern like : SYSTEM_NAME-232-S7-200810060949.LOG Here I need to extract system name and the timestamp and also the numeric number after "-S" i.e 7 here . I am not very sure of whether I should use sed / awk for this ?:confused: Thanks, Priya. (6 Replies)
Discussion started by: priyam
6 Replies

10. Shell Programming and Scripting

How to extract a sequence of n lines from a file

Hi I want to be able to extract a sequence of n lines from a file. ideas, commands and suggestions would be highly appreciated. Thanks (4 Replies)
Discussion started by: 0ktalmagik
4 Replies
Login or Register to Ask a Question