View previous topic :: View next topic |
Author |
Message |
alex.blackbit Advocate
Joined: 26 Jul 2005 Posts: 2397
|
Posted: Mon Sep 05, 2011 5:14 pm Post subject: wget - filename after redirect [SOLVED] |
|
|
hi,
i now ask here because i didn't find a solution elsewhere. i have a shell script that downloads quite a list of URLs, all from the same site, all redirect. the name of the downloaded file is that of the original URL, not of the URL after redirect. I would like to have the files named as the URLs calls the file _after_ the redirect. is that possible ?
Last edited by alex.blackbit on Tue Sep 06, 2011 11:22 am; edited 1 time in total |
|
Back to top |
|
|
truc Advocate
Joined: 25 Jul 2005 Posts: 3199
|
Posted: Mon Sep 05, 2011 5:38 pm Post subject: |
|
|
Code: | while read url ; do
wget -O "${url##*/}" "$url"
done<URL_LIST.txt |
_________________ The End of the Internet! |
|
Back to top |
|
|
alex.blackbit Advocate
Joined: 26 Jul 2005 Posts: 2397
|
Posted: Mon Sep 05, 2011 7:08 pm Post subject: Re: wget - filename after redirect |
|
|
alex.blackbit wrote: | _after_ the redirect. is that possible ? |
trac, thanks for your answer, but i think you got me wrong. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 23062
|
Posted: Mon Sep 05, 2011 9:52 pm Post subject: |
|
|
Use wget --trust-server-names. If I recall correctly, there are some potential security concerns with this in that a malicious server could refer you to a URL which causes you to overwrite something important. |
|
Back to top |
|
|
truc Advocate
Joined: 25 Jul 2005 Posts: 3199
|
Posted: Tue Sep 06, 2011 6:48 am Post subject: Re: wget - filename after redirect |
|
|
alex.blackbit wrote: | alex.blackbit wrote: | _after_ the redirect. is that possible ? |
trac, thanks for your answer, but i think you got me wrong. |
Oh , you're right, sorry _________________ The End of the Internet! |
|
Back to top |
|
|
alex.blackbit Advocate
Joined: 26 Jul 2005 Posts: 2397
|
Posted: Tue Sep 06, 2011 10:54 am Post subject: |
|
|
Hu wrote: | Use wget --trust-server-names. If I recall correctly, there are some potential security concerns with this in that a malicious server could refer you to a URL which causes you to overwrite something important. |
Code: |
--trust-server-names
If this is set to on, on a redirect the last component of the redirection URL will
be used as the local file name. By default it is used the last component in the
original URL. |
yes. it's really amazing how one can overlook such obvious things in a manpage.
thanks a lot for the pointer!
wget now uses the last file name it sees (where it was redirected too).
surprisingly the result is very different from firefox.
unfortunately the result of firefox is desirable in contrast to the result of wget.
i am talking about the flac files from here.
when i click in firefox on a download button (the down arrow), the file is saved as e.g. "2011.05.07 - Essential Mix - Seth Troxler.flac".
with wget --trust-server-names the file is saved as this fucking string "oBPmrhuqWBJV?AWSAccessKeyId=AKIAJBHW5FB4ERKUQUOQ&Expires=1315306316&Signature=WYbHo1xaEnWc5ECCqQFuK9BFQxA=&__gda__=1315306316_05d3a805fa3acfa25baf7f55c7de46d8".
with plain, optionless wget the file is just saved as "download", as in the original URL.
any ideas left? |
|
Back to top |
|
|
alex.blackbit Advocate
Joined: 26 Jul 2005 Posts: 2397
|
Posted: Tue Sep 06, 2011 11:21 am Post subject: |
|
|
i got it.
the missing advice was found when looking at the http traffic.
the filename is contained in a "content disposition" header.
wget has a command line option --content-disposition, marked experimental, but working as expected.
Code: | --content-disposition
If this is set to on, experimental (not fully-functional) support for
"Content-Disposition" headers is enabled. This can currently result in extra
round-trips to the server for a "HEAD" request, and is known to suffer from a few
bugs, which is why it is not currently enabled by default.
This option is useful for some file-downloading CGI programs that use
"Content-Disposition" headers to describe what the name of a downloaded file
should be. |
again, the hints are in the manpage. what a shame.
as a reference, i am using this commandline: Code: |
$ lynx -dump -listonly "http://soundcloud.com/das-boy/sets/essential-mix/" | grep "^[[:digit:]]\{1,4\}\.\ http" | grep "/download$" | awk '{ print $2; }' | while read i; do wget --content-disposition "${i}"; done |
thanks for your thoughts. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 23062
|
Posted: Wed Sep 07, 2011 2:16 am Post subject: |
|
|
alex.blackbit wrote: | Code: | $ lynx -dump -listonly "http://soundcloud.com/das-boy/sets/essential-mix/" | grep "^[[:digit:]]\{1,4\}\.\ http" | grep "/download$" | awk '{ print $2; }' | while read i; do wget --content-disposition "${i}"; done |
thanks for your thoughts. | You could compact that down to a single gawk with two patterns instead of a pair of greps and an awk. Try (untested): Code: | lynx ... | gawk '/^[[:digit:]]\{1,4\}\.\ http.*\/download$/ { print $2; }' | while ... |
|
|
Back to top |
|
|
|