awk to extract digit in line of text and create link


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to extract digit in line of text and create link
# 1  
Old 10-28-2016
awk to extract digit in line of text and create link

I am trying to extract the number in bold (leading zero removed) after Medexome_xx_numbertoextractin file and create an output using that extracted number. In the output the on thing that will change is the number the other test is static and will be the same each time. Thank you Smilie.

file
Code:
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2

desired output
Code:
http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf

awk
Code:
awk {
    A[Q]=substr($0,RSTART,RLENGTH);
    next
}
    print "http://xxx.xx.xxx.xx/report/latex/"A[substr($0,RSTART,RLENGTH)]"$0".pdf";
delete A[substr($0,RSTART,RLENGTH)]
}' file

# 2  
Old 10-28-2016
Hello cmccabe,

If you have each time exactly the same Input_file text then following may help you in same.
Code:
awk '{match($0,/.*\/output/);VAL=substr($0,RSTART,RLENGTH);match($0,/Auto.*_[0-9]+\//);VAL1=substr($0,RSTART,RLENGTH);gsub(/.*_0|.*_|\//,X,VAL1);print VAL"/report/latex/" VAL1".pdf"}'   Input_file

Output will be as follows.
Code:
http://xxx.xx.xxx.xx/output/report/latex/32.pdf
http://xxx.xx.xxx.xx/output/report/latex/28.pdf

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 10-28-2016
Code:
awk -F'/' '
{
  n=split($6,a,"_")
  pdf=a[n]+0
  print $1"//"$3 "/report/latex/" pdf ".pdf"
}' myFile

This User Gave Thanks to vgersh99 For This Post:
# 4  
Old 10-28-2016
A very short script :
Code:
$awk -F'[/_]' -vOFS=/ '{$10=$10+0 ;print "http:","",$3,"report/latex",$10 ".pdf"  }' urls.txt
http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf
$

This User Gave Thanks to blastit.fr For This Post:
# 5  
Old 10-28-2016
With any POSIX-conforming shell, you can do this just using shell variable expansions without needing to invoke awk:
Code:
while IFS= read -r url
do	head=${url%%/output/*}/report/latex/
	number=${url%%/plugin*}
	number=${number##*_}
	number=${number#0}
	number=${number#0}
	printf '%s%s.pdf\n' "$head" "$number"
done < file

which, if file contains:
Code:
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67/R_2016_09_20_10_12_41_user_S5-00580-6-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_728/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_008/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2
http://xxx.xx.xxx.xx/output/Home/Auto_user_S5-00580-4-Medexome_65_000/plugin_out/FileExporter_out.52/R_2016_09_01_10_24_52_user_S5-00580-4-Medexome.tar.bz2

produces the output:
Code:
http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf
http://xxx.xx.xxx.xx/report/latex/728.pdf
http://xxx.xx.xxx.xx/report/latex/8.pdf
http://xxx.xx.xxx.xx/report/latex/0.pdf

This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 10-28-2016
Hi,
For fun with sed (work with example input):
If url source as url destination:
Code:
sed -e 's/^\(\([^/]*\/\)\{3\}\).*_0*\([0-9]\+\)\/.*/\1report\/latex\/\3.pdf/' file

If url source not as url destination:
Code:
sed -e 's/^.*_0*\([0-9]\+\)\/.*/http:\/\/xxx.xx.xxx.xx\/report\/latex\/\1.pdf/' file

Regards.
This User Gave Thanks to disedorgue For This Post:
# 7  
Old 10-28-2016
In case, the format isn't so much fixed, you could try something like this:
Code:
awk -F'/plug.*|/outp|_' '{print $1 "/report/latex/" $(NF-1)+0 ".pdf"}' file



--
Quote:
Originally Posted by blastit.fr
A very short script :
Code:
$awk -F'[/_]' -vOFS=/ '{$10=$10+0 ;print "http:","",$3,"report/latex",$10 ".pdf"  }' urls.txt
http://xxx.xx.xxx.xx/report/latex/32.pdf
http://xxx.xx.xxx.xx/report/latex/28.pdf
$

Yet, it could be reduced a little bit further still ... :
Code:
awk -F'[/_]' '{print "http://" $3 "/report/latex", $10+0 ".pdf"}' file


Last edited by Scrutinizer; 10-29-2016 at 07:10 AM..
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to create link, download, and extract in sub-directory

The awk below will create sub-directories in a directory (which is always the last line of file1, each block separated by an empty line), if the number in line 2 (always the first 6 digits in the format xx-xxxx) of file2 is found in $2 of file1. This is the current awk output. If there is a... (0 Replies)
Discussion started by: cmccabe
0 Replies

2. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies

3. Shell Programming and Scripting

awk to create variables to pass into a bash loop to create a download link

I have created one file that contains all the necessary info in it to create a download link. In each of the lines /results/analysis/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67... (8 Replies)
Discussion started by: cmccabe
8 Replies

4. Shell Programming and Scripting

Extract 4 digit characters

* hdisk99 U5791.001.9920BZ4-P1-C05-T1-W500507630E060C14-L401140BA00000000 IBM MPIO FC 1750 * hdisk100 U5791.001.9920BZ4-P1-C05-T1-W500507630E060C14-L401140BB00000000 IBM MPIO FC 1750 * hdisk185 U5791.001.9920BZ4-P1-C05-T1-W500507630E060C14-L401140A000000000 IBM MPIO FC... (2 Replies)
Discussion started by: Daniel Gate
2 Replies

5. Shell Programming and Scripting

awk length of digit and print at most right digit

Have columns with digits and strings like: input.txt 3840 3841 3842 Dav Thun Tax Cahn 146; Dav. 3855 3853 3861 3862 Dav Thun Tax 2780 Karl VI., 3873 3872 3872 Dav Thun Tax 3894 3893 3897 3899 Dav Thun Tax 403; Thun 282. 3958 3959 3960 Dav Thun Tax 3972 3972 3972 3975 Dav Thun Tax... (8 Replies)
Discussion started by: sdf
8 Replies

6. Shell Programming and Scripting

get the fifth line of a text file into a shell script and trim the line to extract a WORD

FOLKS , i have a text file that is generated automatically of an another korn shell script, i want to bring in the fifth line of the text file in to my korn shell script and look for a particular word in the line . Can you all share some thoughts on this one. thanks... Venu (3 Replies)
Discussion started by: venu
3 Replies

7. Shell Programming and Scripting

Extract pattern from text line

The text line has the following formats: what.ever.bla.bla.C01G06.BLA.BLA2 what.ever.bla.bla.C11G33.BLA.BLA2 what.ever.bla.bla.01x03.BLA.BLA2 what.ever.bla.bla.03x05.BLA.BLA2 what.ever.bla.bla.Part01.BLA.BLA2 and other similar ones, I need a way to select the "what.ever.bla.bla" part out... (4 Replies)
Discussion started by: TehOne
4 Replies

8. UNIX for Dummies Questions & Answers

How to extract text from a line in file

I have a file abc.txt as below : <dbport oa_var="s_dbport" oa_type="EXT_PORT" base="1521" step="1" range="-1" label="Database Port">1616</dbport> <rpc_port oa_var="s_rpcport" oa_type="PORT" base="1626" step="1" range="-1" label="RPC Port">1721</rpc_port> <web_ssl_port oa_var="s_webssl_port"... (7 Replies)
Discussion started by: findprakash
7 Replies

9. Shell Programming and Scripting

Extract pattern from text line

Hi, the text line looks like this: "test1" " " "test2" "test3" "test4" "10" "test 10 12" "00:05:58" "filename.bin" "3.3MB" "/dir/name" "18459" what's the best way to select any of it? So I can for example get only the time or size and so on. I was trying awk -F""" '{print $N}' but... (3 Replies)
Discussion started by: TehOne
3 Replies

10. Shell Programming and Scripting

Extract pattern from text line

Gents, from these sample lines: ZUCR.MI ZUCCHI SPA RISP NC 2,5000 6 ott 0,0000 ZV.MI ZIGNAGO VETRO 3,6475 16:36 Up 0,0075 is it possible to get this: ZUCR.MI 2,5000 ZV.MI 3,6475 i.e. the first field, a separator and the first decimal number? (in Europe we... (9 Replies)
Discussion started by: vampirodolce
9 Replies
Login or Register to Ask a Question