Weekly Check In - 6

Published: 07/15/2020

What did I do till now?

Last week I was implementing 

  • H2Agent
  • H2ConnectionPool 
  • H2DownloadHandler (Work In Progress)

The above classes adds the following features

  • H2ConnectionPool maintains a pool of all HTTP/2 connections. It works by creating a map from (uri.scheme, uri.host, uri.port) to the H2ClientProtocol instance. Suppose we get total N requests each having its own remote URL and there are M unique set of base URL, then there will be at most M connections maintained by the pool where M <= N always. For any request we simply check if we already have a H2 connection established then we'll use it or create a new connection. 
  • H2Agent is responsible for issuing the request and internally using the H2ConnectionPool to establish new connection if required or use a cached connection. The H2Agent also wraps the context factory provided as an argument in the constructor using H2WrappedContextFactory which updates the ClientTLSOptions context to use only h2 as  acceptable protocol during NPN or ALPN. The constructor signature of H2Agent is exactly same as of twisted's Agent class such that it is easy to integrate into Scrapy. 
  • H2DownloadHandler is the Scrapy's way of issuing request. There are similar download handlers for HTTP/1.x and other protocols. I have completed a basic implementation which support HTTPS requests. I'm still working on integrating this fully into Scrapy.

Apart from the above classes I added an idle timeout in H2ClientProtocol using the twisted's TimeoutMixin. So, if the connection is idle for too long (~240seconds) then it will close itself and fire a Deferred which will be handled by H2ConnectionPool -- such that any upcoming requests will not use up a closed connection & instead create a new one if required.

What's coming up next?

This week I plan to

  • Write tests for the Idle timeout and H2ClientProtocol
  • Complete the implementation of H2DownloadHandler

Did I get stuck anywhere?

Yes. Most of the last week I was working on solving the bug where the _StandardEndpointFactory won't establish a proper HTTP/2 connection. The only error that I had was "Connection was closed in an un-clean manner" which did not really help. The error stack was also not very helpful. I really had to deep dive for this which gave me some amazing insights on how Twisted & TLS Handshake works interally. I found that the connection was actually established but the problem was in the TLS Handshake. For some reason specifying the acceptable protocols as h2 in SSL.Context before the connection is even started to establish works but anything else -- which includes updating the acceptable protocols list during the handshake do not work! I still don't know what's the exact problem but I do have a working fix now. I do wonder what may be the reason behind the connection failing when we specify the acceptable protocols list during TLS Handshake in Twisted 🤔, probably I'll look again if I found some time during this week. To integrate the fix in my codebase I created a wrapper class which wraps any context factory which implements IPolicyForHTTPS and updates the acceptable protocols list to [b'h2']. 

class H2WrappedContextFactory:
    def __init__(self, context_factory) -> None:
        verifyObject(IPolicyForHTTPS, context_factory)
        self._wrapped_context_factory = context_factory

    def creatorForNetloc(self, hostname, port) -> ClientTLSOptions:
        options = self._wrapped_context_factory.creatorForNetloc(hostname, port)
        _setAcceptableProtocols(options._ctx, [b'h2'])
        return options

Apart from the above bug I did had some minor issues but those were quick to fix 🙂