Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

add option for newline/CR removal to collapse_whitespace filter #464

Open
GoogleCodeExporter opened this issue Apr 6, 2015 · 13 comments
Open

Comments

@GoogleCodeExporter
Copy link

enabling

    ModPagespeedEnableFilters  collapse_whitespace

results in reduction of

    ----------------------------
                    content
                </div>
            </div>
        </div>
    </div>
    ----------------------------

to

    ----------------------------
    content
    </div>
    </div>
    </div>
    </div>
    ----------------------------

requesting a further cosmetic toggle to remove new-lines/returns.  if toggled 
ON, the desired result would be, ideally,

    ----------------------------
    content</div></div></div></div>
    ----------------------------

Original issue reported on code.google.com by pgnet.dev on 12 Jul 2012 at 5:19

@GoogleCodeExporter
Copy link
Author

The whitespace between the </div> is significant, or it can be, depending on 
the layout.  You are proposing to remove all whitespace.  It would be possible 
to yield this:
content </div> </div> </div> </div>

but that would not be any smaller in bytes than what we have now.  Our feeling 
is that leaving newlines as is, rather than converting them to spaces, aids in 
readability of the resulting doc without compromising size.

Of course in some contexts it is possible for us to remove all the whitespace 
but mod_pagespeed doesn't currently do the sort of layout analysis needed to 
make that determination.

Original comment by jmara...@google.com on 12 Jul 2012 at 5:26

@GoogleCodeExporter
Copy link
Author

is the following, then, consistent with the 'sort of layout analysis' that MPS 
is intended to do?

with filter DISabled

    #   ModPagespeedEnableFilters  collapse_whitespace

my app-autogenerated page contains,
-------------------------------------------
...
        </a>
    </div>

     </div>
    </div>
        </div>
  </div>
</div>  </div>
</div>  </div>
</div></footer>  </div>
    <div class="region region-page-bottom" id=region-page-bottom>
...
-------------------------------------------

with filter ENabled

        ModPagespeedEnableFilters  collapse_whitespace

that's reduced to,
-------------------------------------------
</a>
</div>
</div>
</div>
</div>
</div>
</div> </div>
</div> </div>
</div></footer> </div>
<div class="region region-page-bottom" id=region-page-bottom>
-------------------------------------------

Is there a good doc that explains the rules of what whitespace gets removed and 
which gets left alone?  Or, is the source the doc?

Original comment by pgnet.dev on 12 Jul 2012 at 5:59

@GoogleCodeExporter
Copy link
Author

From 
https://developers.google.com/speed/docs/mod_pagespeed/filter-whitespace-collaps
e:

The filter reduces the transfer size of HTML files by replacing contiguous 
whitespace with a single whitespace character.

Original comment by sligocki@google.com on 12 Jul 2012 at 6:00

  • Changed state: Invalid

@GoogleCodeExporter
Copy link
Author

If you think that's clear ...

Original comment by pgnet.dev on 12 Jul 2012 at 6:06

@GoogleCodeExporter
Copy link
Author

Sorry if it's not clear. We'd like the doc to be as useful as possible, do you 
have any suggestion for making the doc more clear? Maybe we should elaborate 
more than this one sentence?

Note, this is the reason that we call it collapse_whitespace rather than 
strip_whitespace, because it collapses a block of whitespace into a single char.

Also, I just noticed the title of this bug. Would you prefer that there was a 
strip_whitespace rewriter even if it was dangerous and could cause your site to 
be re-formatted?

Original comment by sligocki@google.com on 12 Jul 2012 at 6:49

  • Changed state: New

@GoogleCodeExporter
Copy link
Author

Original comment by jmara...@google.com on 16 Jul 2012 at 4:36

  • Changed state: RequestClarification

@GoogleCodeExporter
Copy link
Author

Issue 577 has been merged into this issue.

Original comment by jmara...@google.com on 3 Dec 2012 at 7:27

@GoogleCodeExporter
Copy link
Author

Hello, 

Could I continue requesting feature through this issue? I think that 
collapse_whitespace is not same with remove whitespace. IMHO, collapse 
whitespace is just make beautiful indentation, but they didn't optimize the 
HTML resource.

For the example, take a look from the collapse whitespace faq here[1].
They just remove collapse unwanted whitespace, but not optimize HTML resource.

-- collapse whitespace filter is looks like this --
<html>
<head>
<title>Hello, world!</title>
<script> var x = 'Hello,   world!';</script>
</head>
<body>
Hello, World!
<pre>
      Hello,
        World!
</pre>
</body>
</html>

-- remove whitespace filter is looks like this --
<html><head><title>Hello, world!</title><script> var x = 'Hello,   
world!';</script></head><body>Hello, World!<pre>Hello, 
World!</pre></body></html>

AFAIK, the following risk if the filters is enabled are the content 
(whitespace) inside <pre> or <code> tags will be optimized. But, it can be 
controlled using a triggered command such as prevent_pre_command_optimized or 
something like that.

So, the <pre> or <code> tags still can be executed as ussual (having a 
whitespace).

[1] 
https://developers.google.com/speed/docs/mod_pagespeed/filter-whitespace-collaps
e

Original comment by dewangg...@xtremenitro.org on 21 Mar 2013 at 11:26

@GoogleCodeExporter
Copy link
Author

Update:

Sorry, not only the <pre> or <code> tags. But also, the other tags have 
individual function. (eg. inline <script>, <style>, etc). 

For example with inline script and style on a page.

-- unoptimized page --
<html>
<head>
<title>UnOptimized Pages</title>
<style>
a {font-size: 12pt; font-weigt: bold;}
</style>
</head>
<body>
<script>var x = 'Yow bro !';</script>
<p>Yow bro!</p>
</body>
</html>

-- optimized page --
<html><head><title>Optimized Pages</title><style>a {font-size: 12pt; 
font-weigt: bold;}</style></head><body><script>var x = 'Yow bro 
!';</script><p>Yow bro!</p></body></html>

In the other hand, the filters is only rewrite the html tags only. Not the 
content. So, it could be eliminate any risk and give a small risk rank. :) So 
the function of scripts, style, and the other content is still running normally.

Original comment by dewangg...@xtremenitro.org on 21 Mar 2013 at 11:39

@GoogleCodeExporter
Copy link
Author

We certainly like and encourage requests for enhancement, so thank you! :), but 
... as we explained this is problematic and currently not a high priority 
because of the engineering effort vs potential savings ratio. I also think 
you'll find that if you gzip your pages with "whitespace collapsed" (per the 
filter) and "whitespace removed" (done manually) there will be negligible 
savings.

Original comment by matterb...@google.com on 21 Mar 2013 at 1:04

@GoogleCodeExporter
Copy link
Author

Ah ic, it's just about engineering effort vs savings ratio. I just see my issue 
is merged into this one, and the status is still RequestClarification. :D 

Thanks for your time matt.

Original comment by dewangg...@xtremenitro.org on 21 Mar 2013 at 3:37

@GoogleCodeExporter
Copy link
Author

No more clarification is required; switching status back to 'New'.

Original comment by jmara...@google.com on 31 Jan 2014 at 3:46

  • Changed state: New

@GoogleCodeExporter
Copy link
Author

Hi josh,

Thanks for setting this issue up, currently I remove newline/whitespace on 
whole pages by inserting a 'hook' on my framework. You could check 
http://www.lintas.me to see the result,

Hope this features come in next release :)

Original comment by dewangg...@xtremenitro.org on 8 Feb 2014 at 3:57

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant