My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
UrlExtract  
Extract URLs from a web page
Script, Shell
Updated Aug 3, 2011

About

This is a very simple script to extract URLs from a web page.

Requirements

Script

url-extract.sh:

#!/bin/sh

for URL
do
	lynx -dump "$URL" | sed -n -e '/^References$/,$s/^ *[0-9]\+\. \+//p' | sed -e 's/ /%20/g'
done

Usage

Usage is straightforward:

url-extract.sh <url> ...

but I find its real usefulness when combining with grep, xargs, and wget, to download only the URLs containined in a page that match a certain regular expression:

url-extract.sh <url> | grep <regexp> | xargs wget

Sign in to add a comment
Powered by Google Project Hosting