View previous topic :: View next topic |
Author |
Message |
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Mon Nov 18, 2024 9:34 pm Post subject: Can squid change content-encoding on the fly? |
|
|
Anyone knows whether squid is capable of changing content-encoding for cached responses and how to enable it?
Haven't dug too deep, but it looks like cloudflare is trying to be smart while some http clients are not, and it results in feeding brotli-compressed files to applications which don't brotli.
Since I already have a squid proxy set up for cache, I'd like it to decompress the response body before sending it to the client. _________________ Make Computing Fun Again |
|
Back to top |
|
|
Banana Moderator
Joined: 21 May 2004 Posts: 1831 Location: Germany
|
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Tue Nov 19, 2024 10:11 am Post subject: |
|
|
Yeah, well, it's not _my_ cloudflare.
From what I found during my quick search on the internet, CF strips this header completely and enables or disables compression on its own proxy based on user agent and the zone policy. Didn't test it yet.
Still, I have several clients using the same resources, some of which do understand brotli and some don't. Assuming everything is working as expected:
Brotli-enabled client happens to go first, allowing compressed response
Squid stores the compressed response
Dumb client requests the same resoure
Squid serves the same compressed response from cache.
Dumb client goes WTF?! _________________ Make Computing Fun Again |
|
Back to top |
|
|
Banana Moderator
Joined: 21 May 2004 Posts: 1831 Location: Germany
|
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22965
|
Posted: Tue Nov 19, 2024 1:49 pm Post subject: |
|
|
In the observed failure case, what HTTP headers were sent by the Cloudflare response and by the dumb client request? I am curious whether Squid was provided with enough context to understand that it should not use the cached brotli response when serving the dumb client. If the dumb client omits Accept-Encoding: brotli and the Cloudflare response included Vary: Accept-Encoding, then I would expect Squid to react to that by considering the cached response unusable due to the mismatch between the dumb client's Accept-Encoding and the original Accept-Encoding request that populated the cache. |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Tue Nov 19, 2024 2:33 pm Post subject: |
|
|
> Anyway, have a look here: https://wiki.gentoo.org/wiki/Privoxy
Yeah, it looks like I'm gonna need another instance of mitmdump on top of squid. It handles decompression just fine. I feel like I got to the point where I need to start making notes of which proxy is doing what on which port though, and it sucks.
Hu, to satisfy your curiosity, I dumped the headers (and removed some identifying information):
Code: | request Host:
request User-Agent:
request Accept: */*
request Accept-Language: en-US,en;q=0.5
request Accept-Encoding: gzip, deflate, br
request Referer:
request Origin:
request Connection: keep-alive
request Sec-Fetch-Dest: empty
request Sec-Fetch-Mode: cors
request Sec-Fetch-Site: cross-site
request Host:
request User-Agent:
request Accept: application/json, text/javascript, */*; q=0.01
request Accept-Language: en-US,en;q=0.5
request Accept-Encoding: gzip, deflate, br
request X-Requested-With: XMLHttpRequest
request Referer:
request Cookie:
request Connection: keep-alive
request Sec-Fetch-Dest: empty
request Sec-Fetch-Mode: cors
request Sec-Fetch-Site: same-origin
|
Code: | response Date:
response Content-Type:
response Content-Length:
response Last-Modified:
response ETag:
response Access-Control-Allow-Origin:
response Access-Control-Allow-Credentials: true
response Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
response Access-Control-Allow-Headers: DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range
response CF-Cache-Status: MISS
response Expires:
response Cache-Control: public, max-age=14400
response Accept-Ranges: bytes
response Vary: Accept-Encoding
response Server: cloudflare
response CF-RAY:
response X-Cache: MISS from squid.local
response X-Cache-Lookup: MISS from squid.local:3128
response Connection: keep-alive
response Server: ddos-guard
response Set-Cookie:
response Strict-Transport-Security: max-age=31536000, max-age=63072000;includeSubDomains;preload
response Content-Security-Policy: upgrade-insecure-requests;
response Content-Type: text/html; charset=UTF-8
response Vary: Accept-Encoding
response Cache-Control: no-cache
response Date:
response Expires:
response access-control-allow-methods: GET, HEAD, OPTIONS
response access-control-allow-headers: Origin,Range,Accept-Encoding,Referer,Cache-Control
response access-control-expose-headers: Server,Content-Length,Content-Range,Date
response Content-Encoding: gzip
response referrer-policy: no-referrer-when-downgrade
response x-xss-protection: 1; mode=block
response x-content-type-options: nosniff
response Age: 0
response DDG-Cache-Status: MISS
response X-Cache: MISS from squid.local
response X-Cache-Lookup: HIT from squid.local:3128
response Transfer-Encoding: chunked
response Connection: keep-alive
|
So, looks like it should have worked, but didn't.
Still, I don't think downloading it again would be a good solution. It is a caching proxy, it exists solely to reuse previously seen responses. Yeah, it definitely could do a better job.
Anyway, I'll leave it for another day or 2, and if nobody comes up with any clever ideas to handle it with squid alone, I'll just add another proxy to the chain. _________________ Make Computing Fun Again |
|
Back to top |
|
|
Banana Moderator
Joined: 21 May 2004 Posts: 1831 Location: Germany
|
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Wed Nov 20, 2024 1:15 pm Post subject: |
|
|
Yes, that's the gist of it.
The thing about ICAP was certainly an interesting read, though doesn't seem applicable to my case. I want the response to be modified by my proxy, not the request. I'm still saving it as a new tool for the future.
Changing the cache key doesn't seem like a good option either. I don't want to re-download the same content in a different wrapping. It would double the bandwith used and halve the cache space.
So far it seems like adding another proxy for decompressing stuff is still the best option. It is ugly and kinda annoying, but it is within my reach. _________________ Make Computing Fun Again |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Wed Nov 20, 2024 3:55 pm Post subject: |
|
|
So, I came up with this little mitmproxy hack:
Code: | import mitmproxy
import mitmproxy.http
import brotli
def response(flow):
if(flow.response.headers.get('Content-Encoding') == 'br'):
flow.response.headers.pop('Content-Encoding')
flow.response.content = brotli.decompress(flow.response.content)
if 'Content-Length' in flow.response.headers:
flow.response.headers['Content-Length'] = str(len(flow.response.content))
|
It's not perfect, since processing the response body inside mitmproxy requires enough RAM to load the full response, but it should be good enough for this particular pipeline.
Thanks _________________ Make Computing Fun Again |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|