regular expression with shell script to extract data out of a text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting regular expression with shell script to extract data out of a text file
# 1  
Old 06-26-2012
regular expression with shell script to extract data out of a text file

hi
i am trying to extract some specific data out of a text file using regular expressions with shell script

that is using a multiline grep .. and the tool i am using is pcregrep so that i can get compatibility with perl's regular expressions

Quote:
[58]Walid Chamoun Architects WLL
* [59]Map
* [60]Website
* [61]Email
* [62]Profile
* [63]Display Ad

Walid Chamoun Architects WLL

PO Box:
55803, Doha, Qatar

Location:
D-Ring Road, New Salata Shamail 40, Villa 340, Doha, Qatar

Tel:
(00974) 44568833

Fax:
(00974) 44568811

Mob:
(00974) 44568822

* Accurate Budget Costing
* Eco-Friendly Structural Design
* Exclusive & Unique Design
* Quality Architecture & Design

Company Profile

Walid Chamoun Architects (WCA) was founded in Beirut, Lebanon, in 1992,
committed to the concept of fully integrated design-build delivery of
projects. In late '90s, company established in-house architectural and
engineering services. As a full service provider, WCA expanded from
multi-family projects to industrial and office construction, which
added development services, including site acquisition and financing.
In 2001, WCA had opportunity and facilities to experience European
market and establish office in Puerto Banus, Marbella, Spain. By 2005,
WCA refined its structure to focus on specific market segments and new
office was opened in Doha, state of Qatar. From a solid foundation and
reputation built over eighteen years, WCA continually to provide
leadership in design-build through promotion of benefits and education
to its practitioners.
Project Planning: Project planning and investigation occurs before
design begins has greatest impact on cost, schedule and ultimately the
success of project. Creativity in Design: You can rely on our in-house
designers for design excellence in all aspects of the project. Our
designs have received recommendations and appreciations on national and
international levels. Creativity in Execution: Experienced in close
collaboration with the designers as part of the integrated team, our
construction managers, superintendents and field staff create value
throughout the project. Post Completion Services: Your needs can be
served through our skills and experience long after the last
construction crew has left the site. Performance: Corporate and
institutional clients, developers and public agencies repeatedly select
WCA on the basis of its consistent record of performance excellence.
Serving clients throughout the Middle East and GCC, WCA provides
complete planning for architectural, interior design and construction
on a single-responsibility basis. Our expertise spans industrial,
commercial, institutional, public and residential projects. Benefits of
Design-Build: Design-build is a system of contracting under which one
entity performs both design and construction. Benefits of design-build
project delivery include: Single point responsibility Early knowledge
of cost Time and Cost savings

Classification:
Architects - [64]Architects

[65]Al Ali Consulting & Engineering
* [66]Map
* Website
* Email
* Profile
* Display Ad

Is this your company?
[67]Upgrade this free listing here

PO Box:
467, Doha, Qatar

Tel:
(00974) 44360011

Company Profile

Classification:
Architects - [68]Architects

[69]Al Gazeerah Consulting Engineering
* [70]Map
* Website
* Email
* Profile
* Display Ad

Is this your company?
[71]Upgrade this free listing here

PO Box:
22414, Doha, Qatar

Tel:
(00974) 44352126

Company Profile

Classification:
Architects - [72]Architects

[73]Al Murgab Consulting Engineering
* [74]Map
* Website
* Email
* Profile
* Display Ad

Is this your company?
[75]Upgrade this free listing here

PO Box:
2856, Doha, Qatar

Tel:
(00974) 44448623

Company Profile

Classification:
Architects - [76]Architects
References

Visible links
1. Login
2. Register for an account
3. The BIGGEST business website in Qatar, Qatcom is The Online Yellow Pages Directory, with over 36'000 adverts for companies in Doha
4. The BIGGEST business website in Qatar, Qatcom is The Online Yellow Pages Directory, with over 36'000 adverts for companies in Doha
5. Interactive Map of Doha
6. Qatcom | Qatars most comprehensive business database
7. Qatar Advertising Solutions for YOUR Business
8. Advertiser_testimonials
9. Login
10. Register for an account
11. Get in touch with Qatcom today
12. Companies
13. Qatar Business and Directory Listings
14. Qatar Business and Directory Listings
15. Qatar Business and Directory Listings
16. Qatar Business and Directory Listings
17. Qatar Business and Directory Listings
18. Qatar Business and Directory Listings
19. Qatar Business and Directory Listings
20. Qatar Business and Directory Listings
21. Qatar Business and Directory Listings
22. Qatar Business and Directory Listings
23. Qatar Business and Directory Listings
24. Qatar Business and Directory Listings
25. Qatar Business and Directory Listings
26. Qatar Business and Directory Listings
27. Qatar Business and Directory Listings
28. Qatar Business and Directory Listings
for a sample data like this, i am trying to grab the details of companies namely
  • company name
  • po box
  • Tel
  • fax
  • mobile
  • company profile
into a .csv file
i am new to regular expressions and linux too..
all i could manage to get was something like this



Code:
\[\d*\][^\.]*[\(\d*\)\s\d*)]




can anyone help me out with this please..

Last edited by radoulov; 06-26-2012 at 06:38 AM.. Reason: web links removed
# 2  
Old 06-26-2012
How to get company name? If it begins with the numbers in square brackets, The below doesn't seem to be a company name

Code:
[75]Upgrade this free listing here

# 3  
Old 06-26-2012
Error

Quote:
Originally Posted by clx
How to get company name? If it begins with the numbers in square brackets, The below doesn't seem to be a company name

Code:
[75]Upgrade this free listing here

i had the same problem buddy...
i tried the regular expressions.. i've been doing that since yesterday.. but in vain!
and thats the reason why i posted it here!
# 4  
Old 06-26-2012
Ok, Does this seems to work for you?

Code:
$ awk '/^\[/ && ! /Upgrade this free listing/ {print $0} /:$/ && ! /Classification/ {printf $0 ;  getline x ; print x}' file
[58]Walid Chamoun Architects WLL
PO Box:55803, Doha, Qatar
Location:D-Ring Road, New Salata Shamail 40, Villa 340, Doha, Qatar
Tel:(00974) 44568833
Fax:(00974) 44568811
Mob:(00974) 44568822
[65]Al Ali Consulting & Engineering
PO Box:467, Doha, Qatar
Tel:(00974) 44360011
[69]Al Gazeerah Consulting Engineering
PO Box:22414, Doha, Qatar
Tel:(00974) 44352126
[73]Al Murgab Consulting Engineering
PO Box:2856, Doha, Qatar
Tel:(00974) 44448623
$

# 5  
Old 06-26-2012
Quote:
Originally Posted by clx
Ok, Does this seems to work for you?

Code:
$ awk '/^\[/ && ! /Upgrade this free listing/ {print $0} /:$/ && ! /Classification/ {printf $0 ;  getline x ; print x}' file
[58]Walid Chamoun Architects WLL
PO Box:55803, Doha, Qatar
Location:D-Ring Road, New Salata Shamail 40, Villa 340, Doha, Qatar
Tel:(00974) 44568833
Fax:(00974) 44568811
Mob:(00974) 44568822
[65]Al Ali Consulting & Engineering
PO Box:467, Doha, Qatar
Tel:(00974) 44360011
[69]Al Gazeerah Consulting Engineering
PO Box:22414, Doha, Qatar
Tel:(00974) 44352126
[73]Al Murgab Consulting Engineering
PO Box:2856, Doha, Qatar
Tel:(00974) 44448623
$


the data looks good but when i tried using the awk command you posted, it gave me only po box numbers, fax numbers and phone numbers

something like this!
and i need a separator so that i can save it as a csv file
Quote:
PO Box: 1151, Doha, Qatar
Tel: (00974) 44664396
PO Box: 3529, Doha, Qatar
Tel: (00974) 44417477
PO Box: 24863, Doha, Qatar
Tel: (00974) 44954607
PO Box: 22160, Doha, Qatar
Tel: (00974) 44131818
PO Box: 9553, Doha, Qatar
Tel: (00974) 44311758
# 6  
Old 06-26-2012
and this is the file i am working on...
check the attachment
# 7  
Old 06-26-2012
I used the sample you posted above. Are you sure to use file having valid patterns?
For company name, I used the trick, when
Quote:
A line starts with a "[" AND does not contain the text "Upgrade this free listing"
PO Box fields already contains some commas. You must use some other separator. But that is the later part. first we need to get the correct output.



EDIT : Ok, I saw the next post. I will check the attachment.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell script to extract data in a file

I have this 5GB file, and i want to extract from the file particulars pattern. this is my script: // count=`grep -wc "MSISDN" file_name` k=1 >OUTPUT >OUTPUT_Final while do cat file_name | awk -F":" -v var="$k" '$1=="MSISDN" {m++}m==var{print; exit}' >> OUTPUT cat file_name |awk -F":"... (33 Replies)
Discussion started by: gillesi
33 Replies

2. UNIX for Beginners Questions & Answers

Shell - Read a text file with two words and extract data

hi I made this simple script to extract data and pretty much is a list and would like to extract data of two words separated by commas and I would like to make a new text file that would list these extracted data into a list and each in a new line. Example that worked for me with text file... (5 Replies)
Discussion started by: dandaryll
5 Replies

3. UNIX for Dummies Questions & Answers

Shell script to extract data from csv file

Hi Guys, I am new to shell script.I need your help to write a shell script. I need to write a shell script to extract data from a .csv file where columns are ',' separated. The file has 7 columns having values say column 1,column 2.....column 7 as below along with their values. Name, Address,... (7 Replies)
Discussion started by: Vivekit82
7 Replies

4. Shell Programming and Scripting

incorporating a regular expression statement in a shell script (.sh)

I do have a shell file where I call many unix commands . I would like to add a regular expression step in that shell file, where a text file, say Test.txt has to be openned and all the :'s should be replaced. Basically apply the follwoing regular expression: :%s/://g to that particular text... (2 Replies)
Discussion started by: Lucky Ali
2 Replies

5. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

6. Shell Programming and Scripting

need a shell script to extract data from a log file.

If I have a log like : Mon Jul 19 05:07:34 2010; TCP; eth3; 52 bytes; from abc to def Mon Jul 19 05:07:35 2010; UDP; eth3; 46 bytes; from aaa to bbb Mon Jul 19 05:07:35 2010; TCP; eth3; 52 bytes; from def to ghi I will need an output like this : Time abc to def... (1 Reply)
Discussion started by: hitha87
1 Replies

7. Shell Programming and Scripting

problem with Regular expression as input in shell script

Hi, I have script which will take a string as input and search in a file. But when I want to search a pattern which has special characters script is ignoring it. For example: I want to search a pattern "\.tumblr\.com". shell script is removing \ (backslah) and trying to search... (7 Replies)
Discussion started by: Anjan1
7 Replies

8. Shell Programming and Scripting

Help with shell script to extract data from XML file

Hello Scripting Gurus, I need help with extracting data from the XML file using shell script. The data is in a large XML and I need to extract the id values of all completedworkflows. Here is a sample of it. Input and output data is also in the attached text files. <wfregistry>... (5 Replies)
Discussion started by: yajaykumar
5 Replies

9. Shell Programming and Scripting

shell-script which extract data from log file

give me a shell-script which extract data from log file on a server by giving date and time as input (for both start time and end time) and it will give the logs generated during the given time as output. (4 Replies)
Discussion started by: abhishek27
4 Replies

10. Shell Programming and Scripting

using regular expression an shell script!!

I want to check if the first argument of my shell script starts with a specifiec string? Any Idea?? Thank u (3 Replies)
Discussion started by: andy2000
3 Replies
Login or Register to Ask a Question