My favorites | Sign in
Project Logo
                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
#!/usr/bin/perl

######################################################################
# Swignition/0.1-alpha15 - a toolkit for the semantic web
# Copyright (c) 2008, 2009 Toby Inkster.
######################################################################
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
######################################################################

use utf8;
use strict;
use lib '.';
use Swignition::Misc;
use Getopt::Long
qw(:config no_ignore_case bundling permute);
use Pod::Usage;

my ($givenUrl, $parser, %opts, $output_format, $help, $man, $ver, $from_stdin, $stdin, @nofollow);
my ($host, $port, $proto) = ('localhost', '26464', 'tcp');
GetOptions(
'option|o=s%' => \%opts,
'format|f=s' => \$output_format,
'help|usage|h' => \$help,
'man' => \$man,
'version|V' => \$ver,
'host=s' => \$host,
'port=i' => \$port,
'proto=s' => \$proto,
'nofollow=s' => \@nofollow
);
pod2usage(-verbose => 2) && exit if defined $man;
pod2usage(-verbose => 1) && exit if defined $help;

if (defined $ver)
{
print $Swignition::Misc::version . ' ' . $Swignition::Misc::releaseDate . "\n";
exit;
}

$givenUrl = shift @ARGV or (pod2usage(-verbose=>1) && exit);
if ($givenUrl eq '-')
{
$givenUrl = shift @ARGV || $Swignition::Misc::defaultBaseURI;
while (<>) { s/^\.\r?$/ ./; $stdin .= $_; }
$from_stdin = 1;
}
$givenUrl = "http://localhost/~tai/$1.html" if ($givenUrl =~ /^T:(.*)$/);


# First, attempt to connect to Swignition daemon.
if (length $host && length $proto && $port>0)
{
require IO::Socket::INET;
my $socket = IO::Socket::INET->new(
PeerHost => $host,
PeerPort => $port,
Proto => $proto,
);
if ($socket)
{
$socket->autoflush(1);
print $socket "SET AUTOCLOSE 1\r\n";
print $socket "SET FORMAT $output_format\r\n" if (length $output_format);
foreach my $k (%opts)
{
print $socket sprintf("SET OPTION %s %s\r\n", $k, $opts{$k})
if (length $k && length $opts{$k});
}
foreach my $h (@nofollow)
{
print $socket sprintf("SET NOFOLLOW %s\r\n", $h)
if (length $h);
}
if ($from_stdin)
{
print $socket "COGNIFY STDIN AS $givenUrl\r\n";
print $socket "$stdin\r\n.\r\n";
}
else
{
print $socket "COGNIFY $givenUrl\r\n";
}
while(<$socket>)
{
print $_;
}
close($socket);
exit;
}
else
{
warn("Using Swignition HTML parsing library locally. This will be slow.");
}
}

# Otherwise, use local library (slow)
require Swignition::GenericParser;
require Swignition::Export::Contact;
require Swignition::Export::Feed;
require Swignition::Export::Location;
require Swignition::Export::Calendar;
require Swignition::Export::CalComponent;
require Swignition::Export::Recording;
require Swignition::Export::Recipe;
require HTTP::Request;

my ($pageUrl, $subjectUrl) = Swignition::Misc::url_split($givenUrl);

if ($from_stdin)
{
$opts{base} = $pageUrl;
$opts{nofollow} = \@nofollow;
$parser = Swignition::GenericParser::new_by_type($stdin, \%opts);
}
else
{
$opts{ua} = Swignition::Misc::get_ua;
$opts{request} = HTTP::Request->new(GET => $pageUrl);
$opts{response} = $opts{ua}->request($opts{request});
$opts{nofollow} = \@nofollow;
$parser = Swignition::GenericParser::new_by_type($opts{response}->content, \%opts);
}
print Swignition::Misc::do_export($output_format, $parser, $subjectUrl);


############################################################################

__END__

=head1 NAME

swignition.pl

=head1 SYNOPSIS

Usage: swignition.pl [options] url

Options:

--option KEY=VALUE pass additional options to semantics extractor
--format=FMT specify output format
--help, --usage basic help message
--man detailed help message
--version print current version number
--host host name or IP address for daemon
--port port for daemon
--proto protocol for daemon
--nofollow=HOST don't recursively follow links to this host

=head1 OPTIONS

=over 8

=item B<--option>, B<-o>

Sets an additional option to pass to the semantics extraction engine. Currently
supported options:

* erdf_strict_profiles
Require eRDF profile to be explicitly linked to.
Boolean; default on.
* grddl_fetch Fetch profile URLs and parse for profileTransformation links.
Boolean; default off.
* grddl_strict_profiles
Require GRDDL profile to be explicitly linked to.
Boolean; default off.
* p_comments Check for RDF in <!--comments-->. Boolean; default on.
* p_erdf Parse eRDF. Boolean; default on.
* p_grddl Attempt GRDDL gleaning. Boolean; default on.
* p_http Parse HTTP headers; default on.
* p_metatags Parse <meta> and <title> tags. Boolean; default on.
* p_rdf Support RDF embedded in XHTML. Boolean; default on.
* p_rdfa Parse RDFa. Boolean; default on.
* p_rdfx Support <link rel="meta" type="application/rdf+xml">.
Boolean; default on.
* p_role Parse XHTML role attribute; default on.
* p_structure Parse document heading structure. Boolean; default on.
* p_uf Support for some microformats. Boolean; default on.
* rdfa_strict_doctype
Require XHTML+RDFa DOCTYPE. Boolean; default off.
* rdfa_strict_version
Require <html version>. Boolean; default off.
* rdfa_strings Convert nodes to strings using RDFa-specified method.
* uf_strict_profiles
Require microformat profile to be explicitly linked to.
Boolean; default off.

Boolean options should be set to '0' (off) or '1' (on). Options are
case-sensitive.

Example:
swignition.pl -o p_erdf=1 -o p_structure=0 [url]

=item B<--format>, B<-f>

Set output format. Currently supported formats:

* atom Output first feed found as Atom 1.0 (RFC 4287).
* ics Output events as iCalendar (RFC 2445).
* jcard Output contacts as jCard.
* kml Output locations as Keyhole Markup Language (Google/OGC).
* m3u Output recordings as M3U playlist.
* n3 Output as Notation3.
* rdf-json Output as RDF/JSON (Talis).
* recipebook Output recipes as RecipeBook XML.
* trix Output as TriX.
* turtle Output as Turtle.
* uf-json Output microformats as JSON.
* vcf Output contacts as vCard (RFC 2426).
* xml Output as RDF/XML.

Format strings are case-insensitive.

=item B<--host>, B<--port>, B<--proto>

These default to 'localhost', '26464' and 'tcp' respectively. Alternative
values for proto include 'udp' and 'unix'.

The swignition command-line tool will attempt to use a daemonised copy of
swignition (a.k.a. swignitiond), and only if that can't be found will it then
try to parse the URL itself. To prevent Swignition from looking for swignitiond,
set port to -1.

Generally, swignitiond will provide a significant speed-up, as the Swignition
library will not need to be reparsed by Perl.

=item B<--help>, B<--usage>, B<--man>, B<--version>, B<-h>, B<-V>

Various forms of help. Hopefully self-explanatory.

=back

=head1 DESCRIPTION

Swignition aims to be a full-fledged graphical semantic web browser. At the
moment it is a command-line semantic web tool, able to extract semantic data
from HTML files and output it in several formats.

=head1 AUTHOR

Toby Inkster <http://tobyinkster.co.uk/>.

=head1 CREDITS

This project makes extensive use of the GNOME XML library (libxml2) and the
related XSLT library (libxslt).

<http://xmlsoft.org/contribs.html>
<http://xmlsoft.org/XSLT/contribs.html>

The project also makes use of LibWWW, developed by the World Wide Web
Consortium.

<http://www.w3.org/Library/>

Numerous suggestions have been gratefully received from contributors to the
W3C semantic web mailing list and Microformats discussion list.

<http://lists.w3.org/Archives/Public/semantic-web/>
<http://microformats.org/mailman/listinfo/microformats-discuss/>

=head1 SEE ALSO

Semantic Web <http://www.w3.org/2001/sw/>.
Microformats <http://microformats.org/>.
Buzzword UK <http://buzzword.org.uk/>.
Swignition Services <http://srv.buzzword.org.uk/>.

=head1 BUGS

Yes.

Show details Hide details

Change log

r146 by m...@tobyinkster.co.uk on Jan 31, 2009   Diff
Lots of propset stuff; a few updates to
S::DM::Node.
Go to: 
Project members, sign in to write a code review

Older revisions

r133 by m...@tobyinkster.co.uk on Jan 27, 2009   Diff
XHTML @role in new data model.
r125 by m...@tobyinkster.co.uk on Jan 25, 2009   Diff
Prepare for Swignition/0.1-alpha15
release.
r77 by m...@tobyinkster.co.uk on Jan 12, 2009   Diff
NoFollow fixes.
All revisions of this file

File info

Size: 8904 bytes, 285 lines

File properties

svn:executable
*
svn:keywords
Id
Hosted by Google Code