My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
PhiloLineBugsToDoList  
PhiloLine Bugs and To Do List
Updated Feb 5, 2010 by MarkyMay...@gmail.com

PhiloLine version 0d.

  • To Do: Add handler for nested divs to shingler and aligner to allow access to subarticles and other objects below "div1".
  • Bug/Logic error: I think I found a few minor bugs in mkshinglesbatch.pl. If $DoFlattenAccents is true, I think it should call &FlattenAccents on each of the stop words. For example, I ran it on FILENAME and noticed there were a bunch of "etait"s in the shingles even though "était" is in the stop words file.
  • Bug/Logic error: Another problem that I found is that the length check for $DeleteShortWords should be moved further down in &SelectAndMassageWord. It needs to be moved after the call to !&DoVirtNorm. Otherwise, the word might be replaced with a short one and it won't get filtered.
  • Bug?: mkshinglesbatch.pl: In mapping bytes to div objects, if there is a DIV with NO words, which is really a tagging error, it generates an out of range sequence error. Not sure this is a bug, since complete empty divs are an encoding error. But, the patch is to change:
if ($DoDivLevelObjects) {
			$thisdivid = $divid[$z];           # Do byte mapping to div ids.
			if ($offset > $divsbyte[$z]) {
				$z++;
				$thisdivid = $divid[$z];
				if ($offset > $divsbyte[$z]) {
					print "ERROR Seq $z $offset $divsbyte[$z] \n";
					exit;
					}
				}
			}

to

		if ($DoDivLevelObjects) {
                        $thisdivid = $divid[$z];          # Do byte mapping to div ids.
                        if ($offset > $divsbyte[$z]) {
                                $z++;
                                $thisdivid = $divid[$z];
                                if ($offset > $divsbyte[$z]) {
                                        $z++;
                                        $thisdivid = $divid[$z];
                                        print "WARNING Remapped counter 1 time\n";
                                        }
                                if ($offset > $divsbyte[$z]) {
                                        print "ERROR Seq $z $offset $divsbyte[$z] \n";
                                        exit;
                                        }
                                }
                        }
  • makeshinglesbatch.pl: div mapping does not include structural encoding such as <front, so object maps can get out of sequence.

Sign in to add a comment
Powered by Google Project Hosting