I'm working on a little project to extract weather data from website.
At this point, I've stored each weather information on one line in a UNIX standard file.
Each line has the structure:
the first dd-mm-yy (it's the french format because I'm french) is the date of the download of the forecasts weather, the second dd-mm-yy is the day for the forecasts (it could be the day of the download, the day after the download, the day after the day of the download etc...).
I'll now give an example of forecast, I use 5 lines:
Well, now my problem, I need to catch the different values on some variables:
for example, I'd like to have in 5 variables ($downloadDay, $City, $forDate etc...) the 5 arguments of a line (only for 1 line!) and I need a little tips.
Actually, I get the line using the sed command and I store the line in a variable.
Then I get the values with the expr command, my commands are:
the two elements which fail are: the forecastday and the valueOfData
for the forecastday, I've more than 100 lines in my weather base, sometimes it's run, but sometimes it fails:
Run:
Fail:
for the valueOfData:
Actually, the data is cut when I've a space: 50 km/h -> 50
some breaks-> some etc...
I think that my problem comes from the 5 regex with expr, I'm new in this domain, so if somebody could have a look and explain me what's wrong and how to fix it (if possible with a regex explanation to understand why that not run) It'd be very very very nice!
Big thanks in advance!
Moderator's Comments:
Please use code tags when posting data and code samples!
Last edited by vgersh99; 12-08-2010 at 04:24 PM..
Reason: code tags, please!
actually, my "strategy" was to store the data line by line and to separate each element by a flag, typically the
So my idea was to read the content between two flags and to do this for each element.
Unfortunately, I don't know how to proceed and I don't know advanced tools such as awk, I commonly use expr, sed and grep.
So I've tried to make some regex but that doesn't work as well as I would like...
When I extracted the data from the web page, I used the same regex structure using the mark out and tag to do the extraction such that:
and that was good but in my case here, I've not the slightest idea...
Are my regex not valid or not efficient and should I try another method?
You have right strategy, but a bad choice of tools that are both are insufficient and that would consume quite a bit of the CPU.
Think of your data as records/lines separated by the InputFileSeparator (IFS) ':'. Shell itself (without using any other tools) provides the vehicles to deal with field separated records.
If/when your processing requirements become more 'mature' you might consider migrating to using other *NIX scripting tools, e.g. awk, perl, python etc...
Take a look at what I've provided and try to run it to see if it satisfies your requirements 'as-is'.
http://www.livefirelabs.com/unix_tip...3/10132003.htm and that's typically what I want to do, with one exception... I don't want to "echo" but I'd like to store in 5 variables... If I play the handyman again, I'll set my element's variable and a "tmp" var out of the for loop and increase its value for each passage in the loop, and by using a "if" structure I could select in the loop what variable will be fill by the current record.
But I find that very dirty and spoiling the CPU time (exactly like you said).
Have you a more elegant method to set and fill my element's variables?
Good day,
I have a list of regular expressions in file1. For each match in file2, print the containing line and the line after.
file1:
file2:
Output:
I can match a regex and print the line and line after
awk '{lines = $0} /Macrosiphum_rosae/ {print lines ; print lines } '
... (1 Reply)
hi,
i am trying to write a script to generate ouput in the following format:
##### buildappi abcd_sh nodebug.#####
##### buildappi ijk_sh nodebug.#####
The given string is as follows:
xtopSharedDLLs = "abcd_sh def_sh ijk_sh " \
+ "jkl_sh any_sh... (15 Replies)
I have a file of protein sequences with headers (my source file). Based on a list of IDs (which are included in some of the headers), I'd like to print out only the specified sequences, with only the ID as header.
In other words, I'd like to search source.txt for the terms in IDs.txt, and print... (3 Replies)
How can I specify special meaning characters like ^ or $ inside a regex range. e.g
Suppose I want to search for a string that either starts with '|' character or begins with start-of-line character.
I tried the following but it does not work:
sed 's/\(\)/<do something here>/g' file1
... (3 Replies)
Hi, I'm new to these forums, and I'm hoping that someone can solve this problem...
To make things short:
I have DD-wrt set up on a router.
I'm trying to run a script in CRON that fetches the daily password from my database using SSH.
CRON is set like so(in web interface):
* * * *... (4 Replies)
Dear All,
I want to split single line into two line or three lines wherever “|” separated values comes using
Input line
test,DEMTEMPUT20100404010012,,,,,,,,|0070086|0070087,
output shoule be
test,DEMTEMPUT20100404010012,,,,,,,,0070086,
test,DEMTEMPUT20100404010012,,,,,,,,0070087, (14 Replies)
Hi,
I want to split before reading the complete line as the line is very big and its throwing out of memory. can you suggest.
when i say
#cat $inputFile | while read eachLine
and use the eachLine to split its throwing out of memory as the line size is more than 10000000 characters.
Can you... (1 Reply)
Hi,
I want to write a sed script which from
batiato:
batiato/giubbe:
pip_b.2.txt
pip_b.3.txt
pip_b.3mmm.txt
bennato:
bennato/peterpan:
123.txt
consoli:
pip_a.12.txt
daniele: (2 Replies)
Hey,
I've made a little awk-script which reorders lines.
Works okay, only problem is that is doesn't process the first line correctly.
If I switch lines in the Input file it doesn't proces this first line either.
Somebody please help!
Here's is the code and the input file!
thanx
... (1 Reply)
I am using AIX and ksh.
I need to display the contents of a file that has a pid (process id). Because the file is open, it doesn't have the line feed or new line, so for some reason if I do this:
`cat $pid` , where $pid is the name of the fully qualified file, it displays
test3.sh: 426110:... (1 Reply)