![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| PING - Unknown host 127.0.0.1, Unknown host localhost - Solaris 10 | Przemek | SUN Solaris | 4 | 05-26-2008 01:11 AM |
| replacing a line of unknown charecters in a file | malavm | Shell Programming and Scripting | 12 | 07-26-2007 05:25 AM |
| Unknown File Type error | rohitsz | SUN Solaris | 3 | 07-15-2007 06:57 PM |
| Automatic name file with increase | steiner | Shell Programming and Scripting | 6 | 05-29-2007 04:14 AM |
| text serach in unknown file | ted | UNIX for Advanced & Expert Users | 11 | 10-23-2003 04:20 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread |
Rating:
|
Display Modes |
|
|
|
||||
|
automatic tar xf of file with unknown name
Hi all,
With curl I can fetch a tar archive from a web server which contains a file ending with .scf which I am interested in. Unfortunately the file name may vary and the subdirectory inside the tar archive may change. I can manually browse the directory structure and extract the file and then rename it to a name which can be read by another program by certain hard coded rules. Here's my actual example: curl "http://www.ncbi.nlm.nih.gov/Traces/trace.fcgi?cmd=retrieve&save=1&srcf=1&scfrcf=scf&file=trace&val=%EID%&ti=%EID%\" -s -o .tracecache/%EID%.tar Where %EID% is for example 458767001 This tar archive contains the file /2007_07_08_02h41m29s/BACILLUS_ANTHRACIS_STR._AMES_0581/TIGR/traces/BAEI181TF.scf which I want to copy to the .tracecache directory with the filename 458767001. The only fixed parameter is the ending of the file I'm interested in is always .scf and there should be always exactly one file of this kind in an arbitary subdirectory. tar shouldn't create a coresponding directory but it should place all files in the same directory. Is there any way to automatize this? Thanks in advance, Thomas |
|
||||
|
There are some things you could try, but I think they will all require two passes through the tar file; one to read the file names and locate the one you want, and another to actually extract it.
To find the one you want, something like "tar tvf $tarfile | grep '\.scf$'" should work. That gets the listing of archive members (in a slightly obfuscated way; the "t" command is to "test" the archive, i.e. read through it and check for errors, and the "v" option is the usual "verbose" option to list the file names of archive members as they are process) and greps for one with the required extension. You might want to check that there is exactly one match. Given the file name of the archive member you want to extract, either extract that to a temporary location, move where you want it, rm -rf the temporary tree; or, at least with GNU tar, there's an option -O to extract to standard output, so you can redirect the output to a convenient place. So to summarize, something like this perhaps. #!/bin/sh # TODO: check that the EID is passed in as the sole argument EID=$1 curl "http://www.ncbi.nlm.nih.gov/Traces/trace.fcgi?cmd=retrieve&save=1&srcf=1&scfrcf=scf&file=trace&val=$EID&ti=$EID" -s -o .tracecache/$EID.tar tar tvf .tracecache/$EID.tar | grep '\.scf$' | xargs tar xOf .tracecache/$EID.tar >.tracecache/$EID I'm assuming that the backslash in the curl command line was a mistake, and that the DOS-style variable name %EID% is not a symptom of something more sinister. |
|
||||
|
Thanks very much era!
The curl string was embedded inside a C source code and was hard coded compiled with the application (Hawkeye viewer for Amos, see http://amos.sourceforge.net). The author has used the % signs for replacing string segments by variables with a string function. It has nothing to do with DOS I think. For this reason the slash was in front of the " to hide it and I forgot to delete it before I posted the thread. Hawkeye has only space for a single command line, so I decided to put everything into an external script "fetchscf.sh" as you started it already. The command line in Hawkeye is now only /usr/local/bin/fetchscf.sh %EID% %TRACECACHE% fetchscf.sh contains: <source> #!/bin/sh EID=$1 tracecache=$2 curl "http://www.ncbi.nlm.nih.gov/Traces/trace.fcgi?cmd=retrieve&save=1&srcf=1&scfrcf=scf&file=trace&val=$EID&ti=$EID" -s -o $tracecache/$EID.tar tar tvf $tracecache/$EID.tar | grep ' \.scf$' | cut -d: -f2 | cut -b4- | xargs tar xOf $tracecache/$EID.tar >$tracecache/$EID.scf </source> Because tar tvf returned the whole info line including the date etc. -rw-rw-r-- 0/0 106207 2007-07-09 00:29 2007_07_09_01h29m40s/BACILLUS_ANTHRACIS_STR._AMES_0581/TIGR/traces/BAEAT42TR.scf I had to cut the string from the left side until the first character of the path name. I wasn't sure if this length was always exactly constant, but I assumed that the time was always separated by ":" and then the fourth byte is the beginning of the path name. This explains the double cut in the pipe. Maybe this is not perfectly elegant, but it works fine. It isn't worth making it more perfect, because NCBI will change the path names and URL parameters every three month or so. Thanks again for your help! Thomas |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|