The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
PING - Unknown host 127.0.0.1, Unknown host localhost - Solaris 10 Przemek SUN Solaris 4 05-26-2008 01:11 AM
replacing a line of unknown charecters in a file malavm Shell Programming and Scripting 12 07-26-2007 05:25 AM
Unknown File Type error rohitsz SUN Solaris 3 07-15-2007 06:57 PM
Automatic name file with increase steiner Shell Programming and Scripting 6 05-29-2007 04:14 AM
text serach in unknown file ted UNIX for Advanced & Expert Users 11 10-23-2003 04:20 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rating: Thread Rating: 1 votes, 4.00 average. Display Modes
  #1 (permalink)  
Old 07-08-2007
tkrahn tkrahn is offline
Registered User
  
 

Join Date: Jul 2007
Location: Houston TX
Posts: 2
automatic tar xf of file with unknown name

Hi all,
With curl I can fetch a tar archive from a web server which contains a file ending with .scf which I am interested in. Unfortunately the file name may vary and the subdirectory inside the tar archive may change. I can manually browse the directory structure and extract the file and then rename it to a name which can be read by another program by certain hard coded rules.
Here's my actual example:

curl "http://www.ncbi.nlm.nih.gov/Traces/trace.fcgi?cmd=retrieve&save=1&srcf=1&scfrcf=scf&file=trace&val=%EID%&ti=%EID%\" -s -o .tracecache/%EID%.tar

Where %EID% is for example 458767001

This tar archive contains the file
/2007_07_08_02h41m29s/BACILLUS_ANTHRACIS_STR._AMES_0581/TIGR/traces/BAEI181TF.scf
which I want to copy to the .tracecache directory with the filename 458767001.

The only fixed parameter is the ending of the file I'm interested in is always .scf and there should be always exactly one file of this kind in an arbitary subdirectory. tar shouldn't create a coresponding directory but it should place all files in the same directory.

Is there any way to automatize this?

Thanks in advance,

Thomas
  #2 (permalink)  
Old 07-08-2007
era
Guest
  
 

Posts: n/a
Bits: 0 [Banking]
There are some things you could try, but I think they will all require two passes through the tar file; one to read the file names and locate the one you want, and another to actually extract it.

To find the one you want, something like "tar tvf $tarfile | grep '\.scf$'" should work. That gets the listing of archive members (in a slightly obfuscated way; the "t" command is to "test" the archive, i.e. read through it and check for errors, and the "v" option is the usual "verbose" option to list the file names of archive members as they are process) and greps for one with the required extension. You might want to check that there is exactly one match.

Given the file name of the archive member you want to extract, either extract that to a temporary location, move where you want it, rm -rf the temporary tree; or, at least with GNU tar, there's an option -O to extract to standard output, so you can redirect the output to a convenient place.

So to summarize, something like this perhaps.

#!/bin/sh

# TODO: check that the EID is passed in as the sole argument
EID=$1

curl "http://www.ncbi.nlm.nih.gov/Traces/trace.fcgi?cmd=retrieve&save=1&srcf=1&scfrcf=scf&file=trace&val=$EID&ti=$EID" -s -o .tracecache/$EID.tar
tar tvf .tracecache/$EID.tar | grep '\.scf$' | xargs tar xOf .tracecache/$EID.tar >.tracecache/$EID

I'm assuming that the backslash in the curl command line was a mistake, and that the DOS-style variable name %EID% is not a symptom of something more sinister.
  #3 (permalink)  
Old 07-09-2007
tkrahn tkrahn is offline
Registered User
  
 

Join Date: Jul 2007
Location: Houston TX
Posts: 2
Red face

Thanks very much era!

The curl string was embedded inside a C source code and was hard coded compiled with the application (Hawkeye viewer for Amos, see http://amos.sourceforge.net). The author has used the % signs for replacing string segments by variables with a string function. It has nothing to do with DOS I think.
For this reason the slash was in front of the " to hide it and I forgot to delete it before I posted the thread.

Hawkeye has only space for a single command line, so I decided to put everything into an external script "fetchscf.sh" as you started it already. The command line in Hawkeye is now only

/usr/local/bin/fetchscf.sh %EID% %TRACECACHE%

fetchscf.sh contains:

<source>
#!/bin/sh

EID=$1
tracecache=$2

curl "http://www.ncbi.nlm.nih.gov/Traces/trace.fcgi?cmd=retrieve&save=1&srcf=1&scfrcf=scf&file=trace&val=$EID&ti=$EID" -s -o $tracecache/$EID.tar
tar tvf $tracecache/$EID.tar | grep ' \.scf$' | cut -d: -f2 | cut -b4- | xargs tar xOf $tracecache/$EID.tar >$tracecache/$EID.scf
</source>

Because tar tvf returned the whole info line including the date etc.
-rw-rw-r-- 0/0 106207 2007-07-09 00:29 2007_07_09_01h29m40s/BACILLUS_ANTHRACIS_STR._AMES_0581/TIGR/traces/BAEAT42TR.scf

I had to cut the string from the left side until the first character of the path name.
I wasn't sure if this length was always exactly constant, but I assumed that the time was always separated by ":" and then the fourth byte is the beginning of the path name. This explains the double cut in the pipe.
Maybe this is not perfectly elegant, but it works fine. It isn't worth making it more perfect, because NCBI will change the path names and URL parameters every three month or so.

Thanks again for your help!

Thomas
  #4 (permalink)  
Old 07-11-2007
era
Guest
  
 

Posts: n/a
Bits: 0 [Banking]
Ah yes, sorry for missing the mangling of the output from tar; glad I could help.
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 01:27 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0