regex to split a line correctly


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers regex to split a line correctly
# 1  
Old 12-08-2010
regex to split a line correctly

Hello,

I'm working on a little project to extract weather data from website.
At this point, I've stored each weather information on one line in a UNIX standard file.
Each line has the structure:

Code:
dd-mm-yy:City:dd-mm-yy:kind Of Data:value Of Data

the first dd-mm-yy (it's the french format because I'm frenchSmilie) is the date of the download of the forecasts weather, the second dd-mm-yy is the day for the forecasts (it could be the day of the download, the day after the download, the day after the day of the download etc...).

I'll now give an example of forecast, I use 5 lines:

Code:
08-12-10:Paris:10-12-10:Windspeed:20 km/h
08-12-10:Paris:10-12-10:Windcourse:Wind north north-east
08-12-10:Paris:10-12-10:Sky:Some breaks
08-12-10:Paris:10-12-10:Max:5°C
08-12-10:Paris:10-12-10:Min:-3°C

Well, now my problem, I need to catch the different values on some variables:
for example, I'd like to have in 5 variables ($downloadDay, $City, $forDate etc...) the 5 arguments of a line (only for 1 line!) and I need a little tips.

Actually, I get the line using the sed command and I store the line in a variable.
Then I get the values with the expr command, my commands are:

Code:
downloadDay=`expr "$myline" : '\(........\)'`
city=`expr "$myline" : '.*\([A-Z][a-z]*\):[0-9]'`
forecastday= # I cut it to display the command, in the script it's linked
`expr "$myline" : '.*[a-z]:\([0-9][0-9]-[0-9][0-9]-[0-9][0-9]-[0-9][0-9]\):[A-Z]'`
kindOfData=`expr "$myline" : '.*[0-9]:\([A-Z][a-z]*\)'` 
valueOfData=`expr "$myline" : '.*[a-z]:\(.*\)'`

the two elements which fail are: the forecastday and the valueOfData

for the forecastday, I've more than 100 lines in my weather base, sometimes it's run, but sometimes it fails:

Run:

Code:
03-12-10:Mulhouse:04-12-10:Ciel:Soleil voilé

Fail:
Code:
03-12-10:Versaille:03-12-10:Ciel:Quelques éclaircies

for the valueOfData:

Actually, the data is cut when I've a space: 50 km/h -> 50
some breaks-> some etc...

I think that my problem comes from the 5 regex with expr, I'm new in this domain, so if somebody could have a look and explain me what's wrong and how to fix it (if possible with a regex explanation to understand why that not run) It'd be very very very nice!Smilie

Big thanks in advance!Smilie

Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 12-08-2010 at 04:24 PM.. Reason: code tags, please!
# 2  
Old 12-08-2010
One way....
Depending on the final objective, you might be better off with other tools ( e.g. awk) - but that's the start.
Code:
#!/bin/ksh

while IFS=: read downloadDay City forecastDay kindOfData valueOfData junk
do
     # do your stuff with the extracted fields here
     echo "downloadDay -> [${downloadDay}]  City -> [${City}]"
done < myForecastFile


Last edited by vgersh99; 12-08-2010 at 06:32 PM..
# 3  
Old 12-08-2010
Thanks for this first answer,

actually, my "strategy" was to store the data line by line and to separate each element by a flag, typically the
Code:
:

So my idea was to read the content between two flags and to do this for each element.
Unfortunately, I don't know how to proceed and I don't know advanced tools such as awk, I commonly use expr, sed and grep.
So I've tried to make some regex but that doesn't work as well as I would like...

When I extracted the data from the web page, I used the same regex structure using the mark out and tag to do the extraction such that:
Code:
'FlagOne\(.*\)FlagTwo'

and that was good but in my case here, I've not the slightest idea...

Are my regex not valid or not efficient and should I try another method?

Thanks!
# 4  
Old 12-08-2010
You have right strategy, but a bad choice of tools that are both are insufficient and that would consume quite a bit of the CPU.

Think of your data as records/lines separated by the InputFileSeparator (IFS) ':'. Shell itself (without using any other tools) provides the vehicles to deal with field separated records.

If/when your processing requirements become more 'mature' you might consider migrating to using other *NIX scripting tools, e.g. awk, perl, python etc...

Take a look at what I've provided and try to run it to see if it satisfies your requirements 'as-is'.
# 5  
Old 12-08-2010
Thanks for the advice, I think it's going well!

I've just found this example:

http://www.livefirelabs.com/unix_tip...3/10132003.htm and that's typically what I want to do, with one exception... I don't want to "echo" but I'd like to store in 5 variables... If I play the handyman again, I'll set my element's variable and a "tmp" var out of the for loop and increase its value for each passage in the loop, and by using a "if" structure I could select in the loop what variable will be fill by the current record.

But I find that very dirty and spoiling the CPU time (exactly like you said).

Have you a more elegant method to set and fill my element's variables?

Thanks again!SmilieSmilie
# 6  
Old 12-08-2010
look at what I've posted originally.
This User Gave Thanks to vgersh99 For This Post:
# 7  
Old 12-08-2010
Woo oo,
I though that was the awk command but you gave me the solution actually by the shell itselfSmilie

I'll try it tomorrow because it's late in my country and I'm tired.
I'll come back later if necessary...Smilie

You've probably fix the problem, so thanks a lot againSmilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using regex's from file1, print line and line after matches in file2

Good day, I have a list of regular expressions in file1. For each match in file2, print the containing line and the line after. file1: file2: Output: I can match a regex and print the line and line after awk '{lines = $0} /Macrosiphum_rosae/ {print lines ; print lines } ' ... (1 Reply)
Discussion started by: pathunkathunk
1 Replies

2. Shell Programming and Scripting

Regex to split a string and write the output in another file.

hi, i am trying to write a script to generate ouput in the following format: ##### buildappi abcd_sh nodebug.##### ##### buildappi ijk_sh nodebug.##### The given string is as follows: xtopSharedDLLs = "abcd_sh def_sh ijk_sh " \ + "jkl_sh any_sh... (15 Replies)
Discussion started by: Rashid Khan
15 Replies

3. UNIX for Dummies Questions & Answers

read regex from ID file, print regex and line below from source file

I have a file of protein sequences with headers (my source file). Based on a list of IDs (which are included in some of the headers), I'd like to print out only the specified sequences, with only the ID as header. In other words, I'd like to search source.txt for the terms in IDs.txt, and print... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

4. UNIX for Dummies Questions & Answers

How to specify beginning-of-line/end-of-line characters inside a regex range

How can I specify special meaning characters like ^ or $ inside a regex range. e.g Suppose I want to search for a string that either starts with '|' character or begins with start-of-line character. I tried the following but it does not work: sed 's/\(\)/<do something here>/g' file1 ... (3 Replies)
Discussion started by: jawsnnn
3 Replies

5. Shell Programming and Scripting

CRON shell script only runs correctly on command line

Hi, I'm new to these forums, and I'm hoping that someone can solve this problem... To make things short: I have DD-wrt set up on a router. I'm trying to run a script in CRON that fetches the daily password from my database using SSH. CRON is set like so(in web interface): * * * *... (4 Replies)
Discussion started by: louieaw
4 Replies

6. Shell Programming and Scripting

split single line into two line or three lines

Dear All, I want to split single line into two line or three lines wherever “|” separated values comes using Input line test,DEMTEMPUT20100404010012,,,,,,,,|0070086|0070087, output shoule be test,DEMTEMPUT20100404010012,,,,,,,,0070086, test,DEMTEMPUT20100404010012,,,,,,,,0070087, (14 Replies)
Discussion started by: arvindng
14 Replies

7. Shell Programming and Scripting

Split a line on positions before reading complete line

Hi, I want to split before reading the complete line as the line is very big and its throwing out of memory. can you suggest. when i say #cat $inputFile | while read eachLine and use the eachLine to split its throwing out of memory as the line size is more than 10000000 characters. Can you... (1 Reply)
Discussion started by: vijaykrc
1 Replies

8. Shell Programming and Scripting

sed: delete regex line and next line if blank

Hi, I want to write a sed script which from batiato: batiato/giubbe: pip_b.2.txt pip_b.3.txt pip_b.3mmm.txt bennato: bennato/peterpan: 123.txt consoli: pip_a.12.txt daniele: (2 Replies)
Discussion started by: one71
2 Replies

9. Shell Programming and Scripting

Awk: first line not processed correctly

Hey, I've made a little awk-script which reorders lines. Works okay, only problem is that is doesn't process the first line correctly. If I switch lines in the Input file it doesn't proces this first line either. Somebody please help! Here's is the code and the input file! thanx ... (1 Reply)
Discussion started by: BartleDuc
1 Replies

10. Shell Programming and Scripting

Unable to display correctly the contents of a file without a line feed

I am using AIX and ksh. I need to display the contents of a file that has a pid (process id). Because the file is open, it doesn't have the line feed or new line, so for some reason if I do this: `cat $pid` , where $pid is the name of the fully qualified file, it displays test3.sh: 426110:... (1 Reply)
Discussion started by: Gato
1 Replies
Login or Register to Ask a Question