My favorites | Sign in
Project Logo
                
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
Version History
===============


cognition/0.1-alpha1 (2008-02-15) :-

Initial release.

* initial release
* metadata: <meta>, <link>, <title>, @role, eRDF
* eRDF does not support rdf:type syntax
* RFC 2731 is supported for namespaces
* microformats: hcard, hcalendar, adr, geo
- hcalendar support assumes page is one giant calendar
- no support for rel-tag, so no support for categories in hcard or
hcalendar
- geo support includes body, altitiude and reference-frame extensions
- microformats patterns: include-pattern, abbr-pattern, extensions
+ include-pattern supports my alternative syntax
+ abbr-pattern supports Andy Mabbett's alternative
* RDF output of namespaced metadata


cognition/0.1-alpha2 (2008-02-20) :-

Stop using XML::XPath; support for @xmlns; support hCalendar, rel=tag,
rel=license, figure, XOXO; parse document structure from headings.

* drop usage of XML::XPath module, using XML::DOM instead
- might use XML::DOM::XPath in future if XPath support is needed
* support XML namespaces used as metadata namespaces.
* microformats: hcalendar (complete), rel-tag, rel-license, figure, xoxo
- rel-licence extended to support searches for 'license' in CC or
DCTERMS namespaces; or 'rights.license' in DC or DCTERMS namespaces
- experimental figure microformat based on current brainstorming
* parse document structure (headings + semantic tables + semantic
images/figures microformat? + xoxo lists)


cognition/0.1-alpha2.1 (2008-02-21) :-

Bugfixes.

* Fix handling for entities.
* Fix delay on LWP::RobotUA.


cognition/0.1-alpha3 (2008-03-01) :-

Use GNOME XML library; support for CURIEs; use RDF triples to internally
represent data; RDFa support!

* Switch from XML::DOM to XML::LibXML. Should be my last big parser change!
* Restructure object to be more tuple-like.
* URLs:
- Support for CURIEs.
- support for geo: and tag: URIs
- use XPointer to provide URLs for document fragments without identifiers
* RDF:
- use <rdf:Bag> to wrap multiple tuples with the same subject and property
- Remove duplicate values within bags
- add support for microformats to RDF output
- RDF subjects may have multiple URIs defined to help match up properties
that actually belong to the same subject (e.g. some properties might be
attached to a fragment identifier, and others to an hcard, but if we
know that the hcard root element has an id attribute which matches the
fragment identifier, then we can equate the subjects)
- support "vocabularies" for RDF
- convert document structure to RDF <http://purl.org/dc/terms/hasPart>,
<http://purl.org/dc/terms/isPartOf>.
* Improve STRINGIFY to prevent all these leading and trailing spaces
* Recognise (X)HTML predefined link types and put them in XHTML namespace.
* More reliable support for namespaces.
* Microformats:
- Properly parse DateTimes found in microformats.
- support table cell header pattern
- support hcalendar 1.1 draft
* Complete support for RDFa
* Much improved support for eRDF, support rdf:type. Any bugs?
* Improved support for XHTML role attribute


cognition/0.1-alpha4 (2008-03-07) :-

Rudimentary GRDDL; better charset handling; better support for tag soup.

* Support rel=meta: retrieve additional document metadata, parse as RDF
* GRDDL:
- Beginnings of GRDDL support.
- Support for rel=transformation linking to XSLT to transform doc to RDF
- Support for grddl:transformation="" style transformations.
- No support for <head profile> yet.
* Microformats:
- Table cell header pattern has been changed on wiki. Implement changes.
- Better microformat nesting handling.
* Improvements in charset handling and support for tag-soup HTML.
* Comment out pre-RDFa <link rel>, <a rel> support. It's not really useful.
* Disable eRDF by default as it seems to generate too many false positives.


cognition/0.1-alpha5 (2008-03-16) :-

vCard export; KML export; improved command-line client; support commented-out
RDF in (X)HTML.

* Various minor improvements to hCard and hCalendar parsing.
* Export framework
- Add vCard export option.
+ Parses data: URIs and outputs as base64 embedded data.
+ Pulls in data from full gamut of supported semantics, so that, say,
RDFa FOAF data may end up as part of the vCard output.
+ Test input: <http://examples.tobyinkster.co.uk/hcard>.
- Add KML export option.
+ Data can come from hCard, (e)RDF(a) vCard, (e)RDF(a) GeoRSS, etc.
* Re-enabled eRDF by default, but eRDF parsing is now stricter. It *requires*
a profile of <http://purl.org/NET/erdf/profile> to be found on the <head>
element.
* Improved command-line client. Use GetOpt::Long, Pod::Usage.
* Support RDF embedded in HTML <!-- comments -->. (Trackback uses this.)


cognition/0.1-alpha6 (2008-03-29) :-

Profile URIs; Support for hAtom; Improved GRDDL; Atom and iCalendar output;
Improved stringification.

* Microformats:
- Add option (disabled by default) to require <head profile> for microformat
support. Microformat profiles are treated as OPAQUE STRINGS! Supports th
following profiles:
+ http://purl.org/uF/2008/03/
+ http://www.w3.org/2006/03/hcard or http://purl.org/uF/hCard/1.0/
+ http://dannyayers.com/microformats/hcalendar-profile or
http://purl.org/uF/hCalendar/1.0/
+ http://purl.org/uF/hAtom/0.1/
+ http://purl.org/uF/rel-tag/1.0/
+ http://purl.org/uF/rel-license/1.0/
+ No profiles required for rel-enclosure, adr or geo (yet).
- Support for hAtom, WebSlices.
+ In addition to hAtom 0.1, rel-enclosure is supported within hEntries.
- Improve include-pattern support to prevent some infinite loops.
* GRDDL:
- Add option (disabled by default) to require <head profile> for GRDDL.
- Add option to check profile URLs for profileTransformation links.
* Export:
- Atom output. (Supports RDF/RSS and hAtom as input.)
- iCalendar export option.
+ hCalendar 1.1 events.
+ hCalendar 1.1 todo items
+ hCalendar 1.1 freebusy info.
+ hCalendar 1.1 alarms.
+ hAtom entries (as VJOURNAL).
+ W3C's iCal RDF vocab (but see note in Cognition/Export/Calendar.pm)
+ RSS Event Module <http://web.resource.org/rss/1.0/modules/event/>
* Added a "--nofollow" option to prevent secondary fetching from particular
hosts. (Secondary fetching = requesting <head profile>, <link rel="meta">,
<link rel="transformation">.)
* Support <rdf:RDF> elements found directly in (X)HTML.
* Much improved HTML->Text convertion. Namely: word wrapping, line breaks added
after block elements, quote marks around <q> elements, bullet points and
numbers before <li> elements in unordered and ordered lists, brackets around
superscript text, parentheses around subscripts, tab characters between table
cells, usenet-style quoting for <blockquote>, alt text from <img> and <input
type="img">, values from other <input> tags. Should be able to handle nested
elements like //ul/li/ol/li/dl/dd/blockquote/img[@alt]. Won't be completely
foolproof, but should be an improvement over what was there before!
* Fix so that the entire page is not given a rdf:type of ical:vcalendar unless
it contains some bona fide vevent/vtodo/valarm/vfreebusy nodes.


cognition/0.1-alpha7 (2008-04-21) :-

hCard extensions using vCard 4.0; XFN support; jCard export; RDF/XML output is
refactored; RDF/JSON export; improved @lang handling; BNodes.

* Set '_xmllang' attribute on all elements, a la '_xpath'.
* Microformats:
- hCard:
+ Rename date-of-death "dday", and implement other properties from vCard
4.0 draft <http://www.ietf.org/internet-drafts/draft-resnick-vcarddav-
vcardrev-01.txt>.
+ Empty TEL, EMAIL and IMPP no longer parsed. (e.g. telephone numbers
with usages but no actual number.)
+ Automatically detect the representative hCard and contact hCard.
<http://microformats.org/wiki/representative-hcard>
- hCalendar:
+ support rel="vcalendar-(parent|sibling|child)" and class="related-to".
+ support implicit relationships gleaned from nesting.
+ Explicitly set RDF datatype for integers.
+ Better support for vfreebusys.
+ @title on root element parsed as dc:title.
+ Support x-wr-calname/x-wr-caldesc/calscale/prodid/method.
- XFN: <http://microformats.org/wiki/xfn-to-foaf>.
* Exports:
- Cognition::Export::findSubject - I won't go into an explanation of why
this is important, but it is.
- jCard export.
- vCard improvements:
+ Set TYPE parameter when ENCODING=b.
+ Output vCard 4.0 properties. Detect instant messaging protocols which
have been forced into the URLs and output them as IMPP properties.
- iCalendar improvements:
+ Set TYPE parameter when ENCODING=b.
+ Add RELATED-TO properties.
+ Support X-WR-CALDESC/CALSCALE/PRODID/METHOD/VERSION.
+ Big improvements for ATTENDEE/CONTACT/ORGANIZER.
- RDF output no longer handled by HTMLParser -- it is in an Export module:
+ Output RDF datatypes (e.g. <http://www.w3.org/2001/XMLSchema#date>).
+ Output xml:lang where we can.
+ s/rdf:Description/FOO/ where FOO is the rdf:type.
+ Improved output for rdf:XMLLiterals.
+ Instead of <foo:bar rdf:nodeID="X">, nest the RDF description for X.
- RDF JSON <http://n2.talis.com/wiki/RDF_JSON_Specification> export.
* RDFa:
- RDFa DTD has s/instanceof/typeof/. Cognition supports both (for now), but
prefers @typeof. Fixed this attribute to allow whitespace-delimited list
of (CURIE|URI)s.
- In accordance with RDFa rules, drop resolution of absolute URIs from
relative URIs specified in @xmlns. This actually makes parsing dumber, but
it's in the recommended algorithm.
- Improved parsing of rdf:XMLLiterals.
- Extension to RDFa: @title parsed as rdfs:label.
* When parsing and outputting dates, retain "resolution".
* Create a data type Cognition::MagicString used in place of strings in many
places which retains the language and XML representation of a string.
MagicString-aware code can then pick up this data and use it if required.
non-MagicString-aware code should usually be able to treat the MagicString
as if it were a string, and not notice any difference, as MagicString
overloads the stringify function.
* More improvements to STRINGIFY:
- Better algorithm for inserting whitespace between CDATA and inline element
nodes. Should prevent words from accidentally running together.
- Implement @start and @type for lists. For unordered lists, disc markers are
implemented as asterisks, circle markers as hyphens, and square markers as
plus signs. (Much like the markers used in this ChangeLog.) For ordered
lists, roman numeral markers work up to 3999, and alphabetical markers up
to 26 -- after that, the list will revert to numeric markers.
- Better support for microformats "value excerpting".
- Stringify now takes care of value excerpting and the ABBR pattern.
* Better HTML->XHTML conversion routine.
* Better framework for namespaces. Old system didn't handle scoped namespaces
(e.g. xmlns attribute on a non-root element).
* Introduce a BNode concept into the Cognition RDF model. Stored in the RDF
triple store with dummy URIs like <bnode:///string>. This pretty much
eliminates those ugly XPointers which littered the RDF output previously. As
a deliberate change, <div class="vcard vcalendar"> will now result in two
different RDF subjects, however they can be united into one subject by giving
that node an ID attribute (because then they have proper URIs, not node IDs).
- Adjust "->uri" methods for microformats.
- Adjust RDFa parser to create BNodes instead of #fakeid URIs.
- Adjust RDF export to use rdf:nodeID instead of rdf:resource/rdf:about.
* Document structure parsing was disabled in alpha4 as it made the RDF output
ugly. Because of improvements in RDF output, and ability to use BNodes, it
is now re-enabled by default without uglying everything up. It can still be
disabled via options.


cognition/0.1-alpha8 (2008-05-04) :-

xFolk support; ICBM; OpenURL COinS.

* Microformats:
- XFN:
+ Fix XFN rel values to match case-insensitively.
+ Smarter support for "mailto:", "urn:sha1:" and pictorial link targets.
- hCalendar:
+ Fix Cognition::uF::hFreebusy::fb::uri to issue BNodes instead of
XPointers.
+ Modify rdf:type URIs s/^([a-z])/uc($1)/ which is more best-practicey.
+ Fix bug with documents being given rdf:type of ical:Vcalendar, even if
they do not use hCalendar.
- hCard:
+ Modify rdf:type URIs s/^([a-z])/uc($1)/ which is more best-practicey.
+ Implement Andy Mabbett's suggestion allowing the "fn" class to be
attached to address sub-properties, thus allowing hCards to easily
represent places rather than organisations or people.
- xFolk: introduce support for this microformat. Using a similar internal
representation to the model used by Digg's new RDFa -- i.e. dc:source,
dc:title and dc:abstract. Perhaps should extend xFolk to allow for
dc:date and dc:creator?
- Rel-Tag: restructured RDF output to mostly use Dublin Core.
- figure:
+ Improvements to title/legend minimisation.
+ Restructured RDF output to use Dublin Core and FOAF.
- geo: parse <meta name="ICBM"> as if it were an instance of geo.
* Exports:
- Corrections to support for both of the W3C RDF vocabs, and also the W3C
iCalendar vocab.
* Fix white space trimming bug in STRINGIFY.
* Fix contact exporters to use foaf:name when no better name is available.
* Support for COinS <http://ocoins.info/>, including obsolete rel="Z3988".


cognition/0.1-alpha9 (2008-06-01) :-

Switch to client/server model. Add support for hReview.

* Introduce (optional) client/server model for Cognition. cognitiond.pl runs in
the background; cognition.pl attempts to connect to it, asks the daemon to
parse the URL, consumes the result and returns it. In many cases this
significantly speeds up results. By default cognition.pl looks for a server
using TCP on localhost:26464, but --host, --port and --proto parameters may be
used to configure a different daemon to connect to. cognitiond.pl will look at
/etc/cognition/cognitiond.conf to read its options. See sample config file.
* Parsing improvements:
- Improvements to white space handling.
- Improvements to oddball ISO date formats such as 2 digit years, missing
years, dates specified by week number or by ordinal day number.
* Exports:
- vCard:
+ Multiple vCard output now returns hCard contacts in same order as
encountered on the page.
+ Cope better with more structured names.
- jCard:
+ Multiple jCard output now returns hCard contacts in same order as
encountered on the page.
+ Cope better with more structured names.
- iCalendar:
+ Add VCARDURL parameter support for CONTACT, ORGANIZER and ATTENDEE
properties, as described in this draft spec:
<http://xml.coverpages.org/draft-royer-ical-vcard-01.txt>
+ Datetime fixes: convert to UTC and format correctly.
* Microformats:
- Implement support for hReview.
- Rewrote support for N (structured names) in hCard parser to create vcard:N
objects to wrap vcard:given-name, etc.
- Allow explicit plus signs in geo microformat.


cognition/0.1-alpha10 (2008-06-27) :-

Document structure parsing overhaul; improvements to rel=tag; better support for
some RDF nuances like rdf:value and rdfs:subPropertyOf.

* Completely rewritten document structure parsing, using HTML 5 outlines
algorithm <http://www.whatwg.org/specs/web-apps/current-work/#outlines> as
a guide. Thanks to Ryan King and Geoffrey Sneddon for pointing me towards
this algorithm. I also used Geoffrey's python implementation as a crib sheet
to help me figure out what was supposed to happen when the HTML 5 spec was
ambiguous.
<http://hg.gsnedders.com/spec-gen/file/tip/specGen/processes/outliner.py>
* Microformats:
- rel-tag:
+ Support for class="tag".
+ Internal representation now uses Richard Newman's RDF Tag ontology.
<http://www.holygoat.co.uk/owl/redwood/0.1/tags/>
- XFN:
+ Explicit XFN 1.0 support. If you give an explicit profile URI pointing
to the XFN 1.0 profile, but not to the XFN 1.1 profile, then newer XFN
terms such as 'me', 'kin' and 'contact' are ignored. (But rel="me" is
still used for determining the representative hCard of a page.)
- hCard:
+ Support for fax: and modem: URIs.
+ Support "type"/"value" subproperties for "label" properties.
- hCalendar:
+ Support for XOXO vtodo-list optimisation. Very nifty.
- Experimental support for data-X classes.
<http://purl.org/uF/pattern-data-class/1>
- xFolk:
+ Merged support for xFolk into hReview. xFolk.pm is gone now.
<http://buzzword.org.uk/cognition/uf-plus.html#xfolk-hreview>
- hReview:
+ Support "xfolkentry" as an alias for "hreview".
+ Support "taggedlink" as an alias for "item".
+ Allow multiple instances of class "description".
* Exports:
- Special support for rdf:value, such that if an export module is
looking for a literal value, but finds a resource which itself has an
rdf:value literal, will use that literal. Indeed, it is capable of
drilling down through rdf:value properties several layers deep. e.g. the
following RDFa can be sucessfully exported as vCard:
<div typeof="foaf:Person">
<div rel="foaf:name">
<p rel="rdf:value">
<b property="rdf:value">Toby Inkster</b>
</p>
</div>
</div>
- vCard: add support for vCard 4.0 "RELATED" property. XFN, foaf:knows and
the RDF relationship vocab <http://vocab.org/relationship/> can all be
used to supply the data.
* Cognition understands rdfs:subPropertyOf, and will make use of a list of any
rdfs:subPropertyOf relationships found in "~/.cognition/subPropertyOf.rdf".
(It will also take heed of any such relationships found parsing the page, but
won't go looking for them specially.) That is, if Cognition is outputting a
vCard, so is looking for a foaf:name for a person, and you have stated that
custom:moniker is an rdfs:subPropertyOf of foaf:name, and this person has a
custom:moniker property defined, then the custom:moniker property is used.
(Note: this was a lot more work than it should be. I'm on the lookout for a
third-party triple store that can take the headache out of this sort of thing
for me.)


cognition/0.1-alpha11 (2008-07-24) :-

Improved microformats parsing across the board. Add support for hAudio, hResume,
hMeasure, species and XEN. Datetime parsing improvements.

* Microformats:
- Improved and more consistent parsing. A lot of parsing code that was
repeated between the different microformat modules has been moved to
Cognition::uF::simple_parse(). It includes better support for embedded
microformats like:
<div class="vcard">
<div class="agent">
<p class="vcard"></p>
</div>
</div>
and proper support for ISO 8601 durations (not just treated as strings).
- hResume
+ Add support for this draft <http://microformats.org/wiki/hResume>.
+ Mostly uses DOAC <http://ramonantonio.net/doac/0.1/doac.rdfs> to map
to RDF.
+ LanguageSkills can be specified as ".hresume .contact.vcard .lang".
+ "affiliation" translated to vCard 4.0 draft "MEMBER" property.
- hAudio:
+ Add support for this draft <http://microformats.org/wiki/hAudio>.
- hMeasure / hMoney:
+ Add support for this draft <http://microformats.org/wiki/measure>.
+ Units currently treated as an opaque string, though I do have some
experimental unit-conversion code that I may include in a future
release of Cognition.
+ Nest within an hCard or hCalendar event to associate the measurement
with that contact/event.
- species:
+ Add experimental support for this proposed microformat.
+ Use the "biota" class to mark up a binomial/trinomial, plus (optionally)
other taxonomic data.
+ Nest within an hCard to mark up the species of the hCard's owner.
+ Include class="attendee biota" within an hCalendar event to mark up a
sighting of a member of the species.
- XFN:
+ Refinements to implied foaf:knows. e.g. if Alice is Bob's parent, it is
not necessarily implied that Alice and Bob know each other. For just a
handful of relationships (e.g. friend, spouse, etc), foaf:knows is still
implied.
+ Implements the XHTML Enemies Network (XEN). It's a spoof, but some
people may find it useful. XEN relationships are only processed on
pages that include the profile URI <http://xen.adactio.com/>.
- figure:
+ Support rel-tag and rel-license nested inside figures.
- hCard:
+ Make "lang" plural.
+ Support vCard 4.0 "member" property - either contains a nested hCard
or a URI.
* Exports:
- vCard: keep up with improvements to hCard.
- jCard: keep up with improvements to hCard.
* DateTime parsing:
- General datetime parsing improvements - I've bundled the Perl
DateTime::Format::ISO8601 module within the Cognition distribution,
renaming it to Cognition::DTParse. It includes several modifications to
make it more tolerant, especially in the case of timezone handling and
dealing with whitespace.
- Support HTML 5 <time> element.
- In conjunction with the smarter microformat parsing mentioned above, the
STRINGIFY function now know when the property it's reading is supposed to
be a datetime and can tailor its behaviour accordingly. In particular it
will attempt to read values from the "datetime" attribute if it exists.
This allows, in hCalendar:
<time class="dtstart" datetime="2008-07-24">Thursday</time>
and also:
<span class="dtstart">
<time class="value" datetime="2008-07-24">Thursday</time> at
<time class="value" datetime="21:00:00">9pm</time>
<time class="value" datetime="+0100">(UK)</time>
</span>
Note that <time> is not the only HTML element that supports a "datetime"
attribute. The following might be useful in hCard:
<ins class="tel rev" datetime="2008-07-24T21:00:00">
My new <span class="type">home</span> phone number is
<span class="value">01632 960 123</span>
</ins>


cognition/0.1-alpha12 (2008-08-20) :-

Tonnes and tonnes of bugfixes, little improvements, and refactoring,
particularly in RDFa parsing and handling nested microformats. Turtle output;
M3U output; intelligent parsing and output of durations and intervals.

* Bugfix work...
- Fix XEN namespace.
- In document structure, if <header> is found and contains a heading
element (e.g. <h2>) then let <header>'s rank be the same as the contained
heading.
- Species, figure MFO.
- Eliminate unneeded 'use' lines.
- Ability to export individual calendar components in iCalendar format. This
is some old functionality that disappeared a few versions ago, but is now
back.
- KML export bugfixes.
- HTML (Detect) export bugfixes.
- Last version broke rel=me => representative hCard detection. Fixed.
- RDF/XML output sometimes tried to define xmlns:rdf twice. Fixed.
- Lots of RDFa bug fixes. Cognition nearly passes all the tests in the W3C
test suite. The ones it fails are:
+ 0032: Weakness in test suite. Cognition performs URI canoicalisation,
but test suite fails to check for this.
+ 0033: See 0032.
+ 0093: Cognition's text/html to text/plain conversion is different from
the one specified by RDFa. I'm not changing this - it would be a
regression IMHO.
+ 0094
+ 0099: See 0093.
+ 0100
+ 0101
+ 0108: See 0093.
+ 0112: See 0093.
- Resolved conflict over hCard 'member' property. When hCard is parsed as an
attendee/contact/organizer within hCalendar, then 'member' is treated as
per RFC 2445. Otherwise, treated as vCard 4.0.
* Exports
- RDF/Turtle added.
- KML: when a geo microformat is nested within an adr microformat, only
output one placemark for them both.
- HTML (Detect) improved, includes <pre> elements containing turtle.
- M3U output from audio:Recording and audio:Album, including media:position
support for ordering and media:duration support. Some very basic support
for the music ontology <http://musicontology.com/>.
- jCard 'rev' should be an array.
* cognitiond has a new SHA1 command. Given a URI it will return the SHA1 of the
URI. Given a URI structured like <foo#subject(bar)> it will return the SHA1
of "bar". This is used by the Cognition web service to provide SHA1-based
filenames.
* Microformats
- hReview:
+ Set "type" to "place" if item hCard appears to be for a place, unless
"type" is explicitly set.
+ Ditto to "product" if item is an hAudio.
+ Find the "reviewer" if it is outside the root hReview element.
+ Support for "inside-out ratings" where the rel=tag is wrapped *around*
the rating.
- Improvements dealing with tripley-embedded microformats. e.g. in the
following, Jane Doe is no longer considered an agent of Joe Bloggs.
<div class="vcard">
<span class="fn">Joe Bloggs</span>,
<span class="birth vcard">
Born at
<span class="fn org">Kingdom Hospital</span>
<span class="agent vcard">
(<span class="role">Midwife</span>:
<span class="fn">Jane Doe</span>)
</span>
</span>
</div>
- Added a few more profile URIs.
- No longer use "uid" property as RDF URI. It simply doesn't work well with
most examples in the wild. As somebody once said: "The creator or me is my
mother. The creator of my web page is me. If you get me mixed up with my
web page, then you would conclude that I am my own mother."
- Better efficiency parsing microformats. Previously an element with classes
"agent vcard" would be parsed twice - once in its own right, and again
as the agent for its parent vcard. Now it should be parsed just once,
resulting in faster parsing and reduced memory consumption.
- hAtom entries will now take the page's title as their own title if their
own title is blank, and they are the sole hAtom entry on the page, and
there is no interleaving hfeeds.
- Support for three new hCard properties:
+ Support vCard 4.0 draft "fburl" property. This may be either a link, or
an embedded hCalendar. (Note: not an embedded hCalendar event, or
hCalendar freebusy. The embedded hCalendar must have class name
"vcalendar".) This is a plural property.
+ Support vCard 4.0 draft "caluri" property with same parsing rules as
"fburl". This is a plural property.
+ Support vCard 4.0 draft "caladruri" property. This should be a link.
This is a plural property.
* Refactoring:
- Removed a few dependencies.
+ s/URI::Escape::uri_escape/CGI::Util::escape/g.
- Moved RDFa implementation to Cognition::HTMLParser::RDFa.
- Moved eRDF implementation to Cognition::HTMLParser::eRDF.
- Moved RDF/GRDDL implementation to Cognition::HTMLParser::RDF::*.
- Moved some metadata stuff to Cognition::HTMLParser::Metadata.
- Moved @role support to Cognition::HTMLParser::RoleAttr.
- Rearranged much of Cognition::HTMLParser.
* Use <http://www.w3.org/2006/link#uri> instead of dcterms:identifier to
internally represent (alternative) RDF subject URIs.
* Durations are now a first-class citizen in Cognition. That is, much like
datetime values have been handled for a while, durations are now parsed and
represented as their own data type (not simply a string). This will allow for
more intelligent handling of durations in the future.
- Microformat durations now support not just ISO 8601 strings as duration but
also:
+ A simple duration measured in seconds
<span class="duration">123 s</span> (SI-style, using seconds only)
+ Using ISO 31-1-style class names:
<span class="duration">
<span class="h">1</span> hour,
<span class="min">23</span> minutes and
<span class="s">45.6</span> seconds
</span>
(Classes are: d, h, min, s.)
+ Embedded hMeasure. The hMeasure must have "type" equal to "duration" or
null, and item set to null. Units can be "seconds"/"s", "minutes"/"min"
"hours"/"h" or "days"/"d". The numeric component does not need to be an
integer.
- This introduces a new dependency on DateTime::Duration, but as that's
bundled with DateTime, it shouldn't be a problem. (Cognition already had
a dependency on DateTime.)
- Non-ISO-8601 durations should be seen as EXPERIMENTAL for now.
* Intervals are also now first-class citizens. As it happens, the only
microformat that *uses* intervals is hCalendar's freebusy objects.
- Intervals may be specified using:
+ ISO 8601 format.
+ An hMeasure duration (see above) plus one of a 'start', 'end', 'before'
or 'after' class, which contain ISO 8601 datetimes. 'start'/'end' are
inclusive.
+ An ISO 31-1-style duration as above, with one of 'start', 'end',
'before' or 'after'.
+ Both 'start'/'after' and 'before'/'end', with no duration.
* Understands <meta http-equiv="Content-Language"> and HTTP header.


cognition/0.1-alpha14 (2008-12-14) :-

Ability to parse HTML from STDIN; integrate validation; refactored and improved
namespace and CURIE handling; improved rel=meta support; approximate datetimes;
better HTML 5 support; hRecipe support and RecipeBook XML export; integrated
Google SocialGraph Node Mapper; less namespace squatting; HTTP in RDF vocab;
Notation3 output and specialised JSON output for Microformats.

* Microformats:
- Cognition has had hCard validation functions built in for a while, but no
interface to access them. I've started adding this information to the RDF
output now. Also, simple_parse is able to log some validation errors.
- hAudio:
+ remove rel=license support
+ title of work is now the "fn" property
- figure:
+ "legend" plural
+ remove rel=license support
+ @longdesc support
+ profile URI
- hRecipe: experimental support for this draft microformat.
- hAtom:
+ entries now support class="hfeed replies" and class="in-reply-to"
allowing Atom threading support. This feature is EXPERIMENTAL.
+ Use <http://bblfish.net/work/atom-owl/2006-06-06/#> namespace instead
of squatting on <urn:ietf:rfc:4287#>.
+ Improve the "author" hunt.
- rel-enclosure: Use <http://www.iana.org/assignments/relation/enclosure>.
- rel-tag:
+ Support for class="tag" is now contingent upon finding a profile
URI of <http://purl.org/uF/rel-tag/class>.
+ Use <http://bblfish.net/work/atom-owl/2006-06-06/#scheme> to represent
tagspaces instead of squatting on
<http://microformats.org/wiki/rel-tag#tagSpace>.
- XFN: Switch to using Sindice's XFN vocabulary instead of squatting on the
XFN profile document as a namespace.
- hReview: support profile <http://www.purl.org/stuff/rev#>.
- hCard now uses <http://www.w3.org/2006/vcard/ns#> as namespace instead of
squatting on <urn:ietf:rfc:2426#>.
- hCalendar now uses <http://www.w3.org/2002/12/cal/ical#> as a namespace
instead of squatting on <urn:ietf:rfc:2445#>.
- geo no longer uses <http://microformats.org/wiki/geo-extension-nonWGS84#>
as its namespace for non-WGS84 coördinates.
* RDFa:
- Add support for @prefix as alternative method of defining RDFa prefixes.
(See <http://rdfa.info/wiki/RDFainHTML4>.)
- Spec-compliant whitespace handling if "rdfa_strings" option is set to "1".
- Correctly ignore @id for subjects.
- See also CURIE handling improvements.
* RDF Input:
- Improved support for rel="meta".
- Can handle links to RDF/XML, N3, Turtle and (X)HTML. The last of those is
triplified by calling another instance of Cognition on it.
- Sends better HTTP "Accept" header on request.
* HTTP: Parse HTTP headers. (Yes, some older versions of Cognition did HTTP
headers, but not very well.)
- Support for the latest draft of Mark Nottingham's HTTP Header Linking
standard. (And yes, HTTP Link headers with rel=profile are a supported
mechanism for linking to metadata profiles.)
- Uses HTTP Vocabulary in RDF <http://www.w3.org/TR/HTTP-in-RDF/>.
- Most datetime-related headers are properly parsed. This introduces a
dependency on HTTP::Date, but that's a standard part of LWP, which is
already widely used.
- <meta http-equiv> parsed similarly.
* HTML5: added support for new elements such as <section> to html2xhtml.
Previously, this function (based on HTML::TreeBuilder) would have stripped
those elements as they would not have been recognised. Input code is only
passed through html2xhtml if it cannot be parsed as well-formed XML, so
strict XHTML5 would have worked already.
* Doc structure:
- Changed result of parsing <h2 id="foo"> so that instead of this being
interpreted as "#foo is a section", it is interpreted as meaning "there
is a section, which has a heading #foo". This seems to be semantically
the most sensible interpretation, and works better in practice too.
* Doc metadata:
- Special support for HTML5 metadata terms.
* Output:
- iCalendar:
+ Use the new firstOfLiteral/allOfLiteral functions as appropriate.
+ Pay better attention to date resolution.
- RecipeBook XML <http://www.happy-monkey.net/recipebook/> export.
- RDF/TriX export.
- N3 export: exactly the same as Turtle, but nests some of the BNodes,
with " = _:Node".
- RDF: support for collections.
- Microformats JSON output.
* Dates and times:
- Support for approximate dates in microformats. The class "approx" must be
included on the datetime property element, or any descendant element. Two
easy syntaxes:
+ <span class="approx bday">1665</span>
+ <span class="bday">
<span class="approx">ca.</span>
<span class="value">1665</span>
</span>
- Better support for ISO 8601 "end of days" notation. e.g. 2008-08-26T24:00.
- Improved support for datetimes outside the range AD 1 to 9999.
- Cleverer support for value exceprting when parsing datetimes.
* NetNewsWire plugins:
- "extras" directory includes some plugins for NNW.
* Documentation:
- Installation help for the Cognition daemon on Mac OS X and Linux.
* Refactoring and bug fixes for URI and CURIE handling:
- Cognition::HTMLParser::abs_url is now Cognition::HTMLParser::uri and no
longer handles CURIEs or BNodes.
- Cognition::HTMLParser::uri can detect absolute URIs and avoids
canonicalising them.
- Dropped function: Cognition::HTMLParser::_fq2pfx
- New function: Cognition::Namespace::to_curie, slightly smarter than above.
- Dropped function: Cognition::HTMLParser::_pfx2fq
- New function: Cognition::HTMLParser::eRDF::curie
- New function: Cognition::HTMLParser::RDFa::curie
- New function: Cognition::HTMLParser::RDFa::uriOrSafeCurie
- New function: Cognition::HTMLParser::RDFa::reservedWordOrCurie
- New function: Cognition::HTMLParser::RoleAttr::reservedWordOrCurie
- New function: Cognition::HTMLParser::Metadata::reservedWordOrCurie
- Change the prefix used by default for undefined prefixes in RDF (an error
condition) from <http://undefined-namespace-prefix.invalid/> to
<http://invalid.invalid/ns#>.
- Support question-mark and equals-sign namespaces in addition to the usual
hash and slash types.
* STRINGIFY:
- Support param@value.
- Improved <pre> handling. (Where the <pre> is *outside* the property.)
* Infrastructure:
- Allow daemon to parse code passed as STDIN.
+ Syntax is "COGNIFY STDIN AS http://example.com/".
+ In that example, <http://example.com/> is taken to be the base URI.
+ Indicate end of input using a line containing a lone full stop.
- Command-line client has similar capabilities.
+ In place of URL parameter on command line, pass '-'.
+ Then pass a second URL parameter which represents the base URI
for the document.
- Moved a bunch of stuff that's shared by daemon and client into a new
Cognition::Misc module.
* Now uses thing-described-by.org a lot for generating URIs for people, places,
events, etc.
* Integrate Google's SocialGraph::NodeMapper Perl module.
- This is a little tricky to install as it relies on JavaScript::SpiderMonkey
which in turn relies on Mozilla's libjs. Therefore, I've made this module
optional. If SocialGraph::NodeMapper is installed, it will be used.
Otherwise, those bits of code with call it will be ignored.
- hCard uri, uid and impp properties are passed through it.
- XFN links are passed through it.
- See <http://code.google.com/p/google-sgnodemapper/>.
* Cognition is now on Google Code:
- See <http://code.google.com/p/cognition-parser/>.


swignition/0.1-alpha15 (2009-01-25) :-

Renamed Cognition to Swignition; allow Swignition to be pointed directly at
non-HTML files (including various RDF serialisations, RSS feeds and JSON);
various GRDDL improvements, including support for RDF-EASE; improved recursive
parse.

* Previously, Swignition always expected to be pointed at an HTML file. It can
now be pointed at:
- RDF/XML, Notation 3, TriG, Turtle and N-Triples.
- RDF/JSON (but only recognised if the JSON schema is included)
- JSON (via jsonGRDDL, requires JavaScript::SpiderMonkey)
- Tag soup RSS / Atom
+ Includes support for RDFa in item <description>.
+ Includes support for Microformats in item <description>.
- TriX, including XSLT transformations.
* Outputs:
- include a JSON schema in RDF/JSON output.
- Atom bugfixes.
* Improvements in GRDDL:
- Support XML namespace GRDDL.
- Support XML attribute GRDDL.
- Move GRDDL code out of the RDF/XML module and clean up.
- Support RDF-EASE as a transformation language.
* User-Agent:
- Better 'Accept' header sent.
- More descriptive 'User-Agent' header sent.
* Plain old metadata:
- Allow <title> and <meta> parsing to be turned off.
* Recursive parsing:
- rel="meta" can now point at any file format understood by Swignition,
including feeds, JSON, etc.
- Moved to Swignition::GenericParser::Recursive.
- The NoFollow feature actually works now.
* Microformats:
- hAtom: closer conformance to AtomOwl.


swignition/0.1-alpha16 (2009-02-28) :-

* Replace alpha3's RDFModel with a new DataModel for internal storage of data.
- Includes support for multiple graphs.
- Includes support for some graphs which are outside the standard RDF model
(e.g. literal subjects).
- Easy rdfs:Container and rdf:List support.
- SPARQL support (by wrapping around Redland).
* Microformats:
- XFN:
+ Add support for the Relationship Vocabulary as HTML link types. See
http://purl.org/vocab/relationship/.
- hCalendar: allow 'rdate' and 'exdate' to be intervals.

$Id$

Show details Hide details

Change log

r164 by m...@tobyinkster.co.uk on Mar 14, 2009   Diff
Changelog: Allow RDATE and EXDATE to be
intervals instead of times; support for
relationships vocab.
Go to: 
Project members, sign in to write a code review

Older revisions

r159 by m...@tobyinkster.co.uk on Feb 07, 2009   Diff
Update ChangeLog.
r152 by m...@tobyinkster.co.uk on Feb 01, 2009   Diff
Changelog update
r146 by m...@tobyinkster.co.uk on Jan 31, 2009   Diff
Lots of propset stuff; a few updates
to S::DM::Node.
All revisions of this file

File info

Size: 38319 bytes, 817 lines

File properties

svn:keywords
Id
Hosted by Google Code