Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Can squid change content-encoding on the fly?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3477

PostPosted: Mon Nov 18, 2024 9:34 pm    Post subject: Can squid change content-encoding on the fly? Reply with quote

Anyone knows whether squid is capable of changing content-encoding for cached responses and how to enable it?
Haven't dug too deep, but it looks like cloudflare is trying to be smart while some http clients are not, and it results in feeding brotli-compressed files to applications which don't brotli.

Since I already have a squid proxy set up for cache, I'd like it to decompress the response body before sending it to the client.
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
Banana
Moderator
Moderator


Joined: 21 May 2004
Posts: 1830
Location: Germany

PostPosted: Tue Nov 19, 2024 6:41 am    Post subject: Reply with quote

I don't know the real answer, but if you control your cloudflare you could just deactivate it: https://developers.cloudflare.com/rules/compression-rules/examples/disable-all-brotli/

But what makes me wonder: Why does the client get brotli if it it does not understand?
The header Accept Encondig tells the server what can be send.
_________________
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3477

PostPosted: Tue Nov 19, 2024 10:11 am    Post subject: Reply with quote

Yeah, well, it's not _my_ cloudflare.
From what I found during my quick search on the internet, CF strips this header completely and enables or disables compression on its own proxy based on user agent and the zone policy. Didn't test it yet.
Still, I have several clients using the same resources, some of which do understand brotli and some don't. Assuming everything is working as expected:

Brotli-enabled client happens to go first, allowing compressed response
Squid stores the compressed response
Dumb client requests the same resoure
Squid serves the same compressed response from cache.
Dumb client goes WTF?!
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
Banana
Moderator
Moderator


Joined: 21 May 2004
Posts: 1830
Location: Germany

PostPosted: Tue Nov 19, 2024 10:24 am    Post subject: Reply with quote

Ah, now I understand, partly. I still do not get your setup.

Anyway, have a look here: https://wiki.gentoo.org/wiki/Privoxy
Quote:
It may be combined with caching proxies like squid to improve its overall speed.

And it has a brotli useflag with the description "Decompress brotli compressed data using app-arch/brotli before filtering"

So, maybe this can help you.

The whole CF does something the server does not know about is not just somehting random: https://forum.palemoon.org/viewtopic.php?f=17&t=27188 (as far as I know)
_________________
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22976

PostPosted: Tue Nov 19, 2024 1:49 pm    Post subject: Reply with quote

In the observed failure case, what HTTP headers were sent by the Cloudflare response and by the dumb client request? I am curious whether Squid was provided with enough context to understand that it should not use the cached brotli response when serving the dumb client. If the dumb client omits Accept-Encoding: brotli and the Cloudflare response included Vary: Accept-Encoding, then I would expect Squid to react to that by considering the cached response unusable due to the mismatch between the dumb client's Accept-Encoding and the original Accept-Encoding request that populated the cache.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3477

PostPosted: Tue Nov 19, 2024 2:33 pm    Post subject: Reply with quote

> Anyway, have a look here: https://wiki.gentoo.org/wiki/Privoxy
Yeah, it looks like I'm gonna need another instance of mitmdump on top of squid. It handles decompression just fine. I feel like I got to the point where I need to start making notes of which proxy is doing what on which port though, and it sucks.

Hu, to satisfy your curiosity, I dumped the headers (and removed some identifying information):
Code:
 request        Host:
 request        User-Agent:
 request        Accept: */*
 request        Accept-Language: en-US,en;q=0.5
 request        Accept-Encoding: gzip, deflate, br
 request        Referer:
 request        Origin:
 request        Connection: keep-alive
 request        Sec-Fetch-Dest: empty
 request        Sec-Fetch-Mode: cors
 request        Sec-Fetch-Site: cross-site
 request        Host:
 request        User-Agent:
 request        Accept: application/json, text/javascript, */*; q=0.01
 request        Accept-Language: en-US,en;q=0.5
 request        Accept-Encoding: gzip, deflate, br
 request        X-Requested-With: XMLHttpRequest
 request        Referer:
 request        Cookie:
 request        Connection: keep-alive
 request        Sec-Fetch-Dest: empty
 request        Sec-Fetch-Mode: cors
 request        Sec-Fetch-Site: same-origin

Code:
 response       Date:
 response       Content-Type:
 response       Content-Length:
 response       Last-Modified:
 response       ETag:
 response       Access-Control-Allow-Origin:
 response       Access-Control-Allow-Credentials: true
 response       Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
 response       Access-Control-Allow-Headers: DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range
 response       CF-Cache-Status: MISS
 response       Expires:
 response       Cache-Control: public, max-age=14400
 response       Accept-Ranges: bytes
 response       Vary: Accept-Encoding
 response       Server: cloudflare
 response       CF-RAY:
 response       X-Cache: MISS from squid.local
 response       X-Cache-Lookup: MISS from squid.local:3128
 response       Connection: keep-alive
 response       Server: ddos-guard
 response       Set-Cookie:
 response       Strict-Transport-Security: max-age=31536000, max-age=63072000;includeSubDomains;preload
 response       Content-Security-Policy: upgrade-insecure-requests;
 response       Content-Type: text/html; charset=UTF-8
 response       Vary: Accept-Encoding
 response       Cache-Control: no-cache
 response       Date:
 response       Expires:
 response       access-control-allow-methods: GET, HEAD, OPTIONS
 response       access-control-allow-headers: Origin,Range,Accept-Encoding,Referer,Cache-Control
 response       access-control-expose-headers: Server,Content-Length,Content-Range,Date
 response       Content-Encoding: gzip
 response       referrer-policy: no-referrer-when-downgrade
 response       x-xss-protection: 1; mode=block
 response       x-content-type-options: nosniff
 response       Age: 0
 response       DDG-Cache-Status: MISS
 response       X-Cache: MISS from squid.local
 response       X-Cache-Lookup: HIT from squid.local:3128
 response       Transfer-Encoding: chunked
 response       Connection: keep-alive

So, looks like it should have worked, but didn't.
Still, I don't think downloading it again would be a good solution. It is a caching proxy, it exists solely to reuse previously seen responses. Yeah, it definitely could do a better job.
Anyway, I'll leave it for another day or 2, and if nobody comes up with any clever ideas to handle it with squid alone, I'll just add another proxy to the chain.
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
Banana
Moderator
Moderator


Joined: 21 May 2004
Posts: 1830
Location: Germany

PostPosted: Tue Nov 19, 2024 7:25 pm    Post subject: Reply with quote

So this is your use case, right?

client -> squid -> any website in the internet(which could have CF in front of)

A client with br capabilities gets br content. Either from CF or directly.
A client with no br capabilities also gets the br content since it requests the same URI, which is the cache key in squid.

I'm no squid expert but maybe this can help: https://wiki.squid-cache.org/ConfigExamples/DynamicContent/Coordinator

But, I did use varnish a long time ago and there was the posibility to modify the cache key based on additional information and not just the URI
https://varnish-cache.org/docs/trunk/users-guide/vcl-hashing.html
https://www.varnish-software.com/developers/tutorials/http-caching-basics/#cache-variations

Also nginx can do that:
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_key

Which would result in a cache key wich will be different for the br client and the non br client and thus each would get the correct content.
_________________
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3477

PostPosted: Wed Nov 20, 2024 1:15 pm    Post subject: Reply with quote

Yes, that's the gist of it.

The thing about ICAP was certainly an interesting read, though doesn't seem applicable to my case. I want the response to be modified by my proxy, not the request. I'm still saving it as a new tool for the future.
Changing the cache key doesn't seem like a good option either. I don't want to re-download the same content in a different wrapping. It would double the bandwith used and halve the cache space.

So far it seems like adding another proxy for decompressing stuff is still the best option. It is ugly and kinda annoying, but it is within my reach.
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3477

PostPosted: Wed Nov 20, 2024 3:55 pm    Post subject: Reply with quote

So, I came up with this little mitmproxy hack:
Code:
import mitmproxy
import mitmproxy.http
import brotli

def response(flow):
    if(flow.response.headers.get('Content-Encoding') == 'br'):
        flow.response.headers.pop('Content-Encoding')
        flow.response.content = brotli.decompress(flow.response.content)
        if 'Content-Length' in flow.response.headers:
            flow.response.headers['Content-Length'] = str(len(flow.response.content))


It's not perfect, since processing the response body inside mitmproxy requires enough RAM to load the full response, but it should be good enough for this particular pipeline.

Thanks
_________________
Make Computing Fun Again
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum