Articles on adityaa30's Bloghttps://blogs.python-gsoc.orgUpdates on different articles published on adityaa30's BlogenSun, 30 Aug 2020 12:59:23 +0000Weekly Check In - 12https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-12-5/<h2>What did I do till now?</h2> <p>Last week I was working on finishing up the <strong>HTTPNegotiateDownloadHandler</strong>. Presently the download handler uses ALPN or NPN (whichever is available) to negotiate a protocol (presently one of HTTP/1.1 or HTTP/2) from the remote server and issues the requests on the respective download handler. Presently, all requests made via proxy are directly issued using the <strong>HTTP11DownloadHandler</strong>. </p> <h2>What's coming up next? </h2> <p>I plan on continue working on implementing the CONNECT method for HTTP/2. </p> <h2>Did I get stuck anywhere?</h2> <p>Yep. I was stuck for almost a week on the CONNECT protocol. Now, I have managed to fix the bug where the raw TCP connection instance could not be switched to HTTP/2. However, there are some issues during the TLS handshake with the final target resource 😥.  </p>k.aditya00@gmail.com (adityaa30)Sun, 30 Aug 2020 12:59:23 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-12-5/Weekly Check In - 11https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-11-6/<h2>What did I do till now?</h2> <p>Last week, I finished finalizing the PR for the basic implementation of the <strong>H2ClientProtocol</strong>. The protocol now works with all the request methods except the <strong>CONNECT </strong>method. The work on Tunneling using CONNECT method is still in progress. I started with creating another protocol for negotiation which uses ALPN or NPN (whichever is available) to negotiate a protocol (presently one of HTTP/1.1 or HTTP/2) from the remote server based on the priority given by the user via the Scrapy project's settings and then uses the respective download handler to complete the request. </p> <h2>What's coming up next? </h2> <p>This week I am majorly working on finishing the Negotiation Protocol.</p> <h2>Did I get stuck anywhere?</h2> <p>Nope. I spent more time on finalizing a clean architure last week so mostly my time went in planning. Apart from that there were no major blockers :) </p>k.aditya00@gmail.com (adityaa30)Sat, 22 Aug 2020 07:38:04 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-11-6/Weekly Check In - 10https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-10-7/<h2>What did I do till now?</h2> <p>I started implementing the CONNECT method for Tunneling via HTTP/2. After a lot of testing, I realized the approach I was taking was not really feasible, hence next I plan to work on an approach which initially uses HTTP/1.1 CONNECT to establish a connection with the proxy and then shifts to HTTP/2 for all the requests made via proxy. </p> <h2>What's coming up next? </h2> <p>Next week, I plan to</p> <ul> <li>Make the PR for H2ClientProtocol ready to be merged with master - verify all cases covered via tests, other tests pass and there are no bugs introduced</li> <li>Implement the CONNECT method using combination of HTTP/1.1 and HTTP/2</li> </ul> <h2>Did I get stuck anywhere?</h2> <p>Yes, this week I had many problems while adding support for tunneling for proxies. I have planned completely another approach for next week using HTTP/1.1 and HTTP/2. Let's see how it goes :) </p>k.aditya00@gmail.com (adityaa30)Thu, 13 Aug 2020 00:38:21 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-10-7/Weekly Check In - 9https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-9-8/<h2>What did I do till now?</h2> <p>Last week I completed the <strong>ScrapyH2ProxyAgent</strong> implementation and added the required tests. I was going through the codebase for <strong>hyper-h2</strong> library to get insight on how they implemented CONNECT method for HTTP/2. </p> <h2>What's coming up next?</h2> <p>Next week I plan to finish working on <strong>ScrapyTunnelingH2Agent</strong> which enables a user to create a SSL Tunnel and proxy requests.</p> <h2>Did I get stuck anywhere?</h2> <p>Yeah I am stuck at a weird problem where two test cases are colliding i.e none of them being related to each other but fails when I run them both together and passes when I run them separately. I'm still working on finding a working fix! </p>k.aditya00@gmail.com (adityaa30)Thu, 06 Aug 2020 01:51:33 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-9-8/Weekly Check In - 8https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-8-7/<h2>What did I do till now?</h2> <p>Last week I added tests for <strong>H2Agent </strong>and<strong> H2DownloaderHandler</strong></p> <h2>What's coming up next?</h2> <p>Next week I plan to continue working on <strong>ScrapyTunnelingH2Agent.</strong></p> <h2>Did I get stuck anywhere?</h2> <p>Yes. I got stuck for a long time while setting up the testing environment of <strong>H2DownloaderHandler. </strong>The problem was a bit weird one, till now Scrapy was using the Twisted's <strong>WrappingFactory</strong> class to wrap the <strong>Site</strong> instance, which allows only upto HTTP/1.1 (for unknown reasons) which took me a long time to realize. After removing the <strong>WrappingFactory</strong>, the tests environment was setup as required. Apart from this another hurdle I'm still facing is about the CONNECT Protocol in HTTP/2.0, I couldn't really find much blogs/articles on this to get a better idea. I plan to look at some open-source libraries' implementation of HTTP/2.0 CONNECT now.  </p>k.aditya00@gmail.com (adityaa30)Thu, 30 Jul 2020 03:41:48 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-8-7/Weekly Check In - 7https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-7-10/<h2>What did I do till now?</h2> <p>This week I implemented the <b>ScrapyH2Agent</b> which is the handled directly by <strong>H2DownloadHandler</strong> to issue requests. Internally the <strong>ScrapyH2Agent</strong> uses</p> <ul> <li><strong>H2Agent ✅</strong></li> <li><strong>ScrapyProxyH2Agent ✅</strong></li> <li><strong>ScrapyTunnelingH2Agent</strong></li> </ul> <p>The <strong>ScrapyTunnelingH2Agent</strong> is still work in progress. Besides the coding part, I read articles on how CONNECT protocol works for HTTP/2 in order to implement the tunneling agent.</p> <p> </p> <h2>What's coming up next?</h2> <p>This week I plan to </p> <ul> <li>Complete <strong>ScrapyTunnelingH2Agent</strong> implementation</li> <li>Add public documentation on how to use <strong>H2DownloaderHandler</strong></li> <li>Add unit tests for <strong>H2Agent</strong></li> </ul> <h2>Did I get stuck anywhere?</h2> <p>This week I did not face any major blockers 🙂</p>k.aditya00@gmail.com (adityaa30)Thu, 23 Jul 2020 14:46:01 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-7-10/Weekly Check In - 6https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-6-12/<h2>What did I do till now?</h2> <p>Last week I was implementing </p> <ul> <li>H2Agent</li> <li>H2ConnectionPool </li> <li>H2DownloadHandler (Work In Progress)</li> </ul> <p>The above classes adds the following features</p> <ul> <li><strong>H2ConnectionPool</strong> maintains a pool of all HTTP/2 connections. It works by creating a map from <strong>(uri.scheme, uri.host, uri.port)</strong> to the <strong>H2ClientProtocol</strong> instance. Suppose we get total <strong>N</strong> requests each having its own remote URL and there are <strong>M</strong> unique set of base URL, then there will be at most M connections maintained by the pool where <strong>M &lt;= N</strong> always. For any request we simply check if we already have a H2 connection established then we'll use it or create a new connection. </li> <li><strong>H2Agent</strong> is responsible for issuing the request and internally using the <strong>H2ConnectionPool</strong> to establish new connection if required or use a cached connection. The <strong>H2Agent</strong> also wraps the context factory provided as an argument in the constructor using <strong>H2WrappedContextFactory</strong> which updates the <strong>ClientTLSOptions</strong> context to use only <em>h2</em> as  acceptable protocol during NPN or ALPN. The constructor signature of <strong>H2Agent</strong> is exactly same as of twisted's <strong>Agent</strong> class such that it is easy to integrate into Scrapy. </li> <li><strong>H2DownloadHandler </strong>is the Scrapy's way of issuing request. There are similar download handlers for HTTP/1.x and other protocols. I have completed a basic implementation which support HTTPS requests. I'm still working on integrating this fully into Scrapy.</li> </ul> <p>Apart from the above classes I added an idle timeout in H2ClientProtocol using the twisted's <strong>TimeoutMixin</strong>. So, if the connection is idle for too long (~240seconds) then it will close itself and fire a Deferred which will be handled by <strong>H2ConnectionPool</strong> -- such that any upcoming requests will not use up a closed connection &amp; instead create a new one if required.</p> <h2>What's coming up next?</h2> <p>This week I plan to</p> <ul> <li>Write tests for the Idle timeout and <strong>H2ClientProtocol</strong></li> <li>Complete the implementation of <strong>H2DownloadHandler</strong></li> </ul> <h2>Did I get stuck anywhere?</h2> <p>Yes. Most of the last week I was working on solving the bug where the <strong>_StandardEndpointFactory</strong> won't establish a proper HTTP/2 connection. The only error that I had was "<em>Connection was closed in an un-clean manner</em>" which did not really help. The error stack was also not very helpful. I really had to deep dive for this which gave me some amazing insights on how <strong>Twisted</strong> &amp; TLS Handshake works interally. I found that the connection was actually established but the problem was in the TLS Handshake. For some reason specifying the acceptable protocols as <strong>h2</strong> in SSL.Context before the connection is even started to establish works but anything else -- which includes updating the acceptable protocols list during the handshake do not work! I still don't know what's the exact problem but I do have a working fix now. I do wonder what may be the reason behind the connection failing when we specify the acceptable protocols list during TLS Handshake in Twisted 🤔, probably I'll look again if I found some time during this week. To integrate the fix in my codebase I created a wrapper class which wraps any context factory which implements <strong>IPolicyForHTTPS</strong> and updates the acceptable protocols list to [b'h2']. </p> <pre><code class="language-python">@implementer(IPolicyForHTTPS) class H2WrappedContextFactory: def __init__(self, context_factory) -&gt; None: verifyObject(IPolicyForHTTPS, context_factory) self._wrapped_context_factory = context_factory def creatorForNetloc(self, hostname, port) -&gt; ClientTLSOptions: options = self._wrapped_context_factory.creatorForNetloc(hostname, port) _setAcceptableProtocols(options._ctx, [b'h2']) return options </code></pre> <p>Apart from the above bug I did had some minor issues but those were quick to fix 🙂 </p>k.aditya00@gmail.com (adityaa30)Wed, 15 Jul 2020 20:04:20 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-6-12/Weekly Check In - 5https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-5-7/<h2>What did I do till now?</h2> <p>I was going through <strong>Twisted</strong>'s implementation of HTTP/1.x and how they are handling multiple requests. I was focusing on their implementation of <strong>HTTPConnectionPool</strong> which is responsible for establing a new connection whenever required &amp; using an existing connection (in cache). </p> <p>Besides this, I did the requested changes on the HTTP/2 Client implementation. </p> <h2>What's coming up next?</h2> <p>Next week I plan to finish coding <strong>H2ConnectionPool</strong> and its integration with <strong>HTTP2ClientProtocol</strong>. Along with the integration I plan to write unit tests as well.</p> <h2>Did I get stuck anywhere?</h2> <p>No. I mostly read lots of documentation &amp; Twisted codebase throughout this week and fixed the bugs found in HTTP/2 Client implementation. </p>k.aditya00@gmail.com (adityaa30)Mon, 06 Jul 2020 16:16:49 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-5-7/Weekly Check In - 4https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-4-8/<h2>What did I do till now?</h2> <p>Last week I was working on</p> <ul> <li>Writing tests for HTTP2ClientProtocol</li> <li>Add support for large number of requests over a single connection</li> </ul> <p>I finished both of the tasks above. I added inline docstrings for most of the methods. Still working on public documentation!</p> <h2>What's coming up next?</h2> <p>Next week I plan to</p> <ul> <li>Start working on <strong>H2ConnectionPool</strong> and <b>H2ClientFactory</b> which are responsible for handlng multiple connections to different authorities. Present implementation is capable of handling large number of request over single connection to only one authority.</li> <li>Finish the public documentation of HTTP2ClientProtocol</li> </ul> <h2>Did I get stuck anywhere?</h2> <p>I am very new to writing tests using <b>TwistedTrial</b> so was having minor bugs while setting up the testing environment and writing tests. Apart from this there was no major blockers during the last week 😁</p>k.aditya00@gmail.com (adityaa30)Tue, 30 Jun 2020 02:44:48 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-4-8/Weekly Check In - 3https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-3-9/<h2>What did I do till now?</h2> <p>Finish the HTTP2 Client Protocol implementation. </p> <h2>What's coming up next?</h2> <p>Next week I plan to </p> <ul> <li>Write unit tests for HTTP2 Client Protocol</li> <li>Add required documentation</li> </ul> <p>I have kept the goals for the next week simple as I think there will be some errors uncovered while unit testing which can take time. As the HTTP2 Client Protocol is the core component of this project I have planned this whole week for it.</p> <h2>Did I get stuck anywhere?</h2> <p> Yes I was stuck with the bug where the HTTP2 Client was working for all the request sending data which can fit in one DATA Frame. When the request body became large, the body had to be broken into a lots of data chunks and send frame by frame and along with this I had to manage flow control for the stream (on which request was initiated) -- This was not working at all. Generally there should be a WINDOW_UPDATE frame send from the remote peer to notify that the sent data chunks were received by the peer and can receive more now. I was getting a WINDOW_UPDATE for the whole HTTP/2 connection but not for the stream on which request was initiated. Initially I didn't know what to do because this was something very new to me and unexpected at the same time 😟. After some discussions with mentors and reading up <a href="https://http2.github.io/http2-spec/">HTTP/2 RFC</a> I realized that it was okay to <strong>not receive</strong> WINDOW_UPDATE frame for a specific stream and instead <strong>receive</strong> for the whole connection and in terms of flow control both are same. So, finally I was able to fix this bug finishing a working implementation of HTTP/2 Client. Yaaay 🥳 . </p>k.aditya00@gmail.com (adityaa30)Mon, 22 Jun 2020 19:37:21 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-3-9/Weekly Check In - 2https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-2-8/<h2>What did I do till now?</h2> <p>Add support for both GET and POST requests in the HTTP/2 Client. I read up setting up tests with Twisted. </p> <h2>Whats coming up next?</h2> <p>Next week I plan to </p> <ul> <li>Finish up with HTTP/2 Client Protocol implementation</li> <li>Add tests &amp; documentation</li> </ul> <h2>Did I get stuck anywhere?</h2> <p>Initially in my first approach while testing I realized that the client works for requests having response size which is less than the total flow control window. However, for the case when really large response is expected the client was indefinitely waiting and eventually timeout. The fix for that was relatively very simple --  acknowledge each data frame received 😁. This week I also tried to setup testing environment using the inbuilt MockServer in Scrapy which I have not been able to successfully setup due to the issue with setting up HTTP/2 connection with my client and the custom server. Still working on that! </p>k.aditya00@gmail.com (adityaa30)Mon, 15 Jun 2020 23:18:18 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-2-8/Weekly Check In - 1https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-1-6/<h2>What did I do till now?</h2> <p>As the Community Bonding phase finished I started coding the HTTP/2 Client Protocol. I started simple with adding support for GET requests.</p> <h2>Whats coming up next? </h2> <p>Next week I plan to</p> <ul> <li>Add support for GET and POST requests for HTTP/2</li> <li>Setup base classes used for testing the Client Protocol</li> </ul> <h2>Did I get stuck anywhere?</h2> <p>Initially I was intimidated with some of the libraries that I was using for my project. Now, I am comfortable working with them. I was stuck with the issue of combining different chunks of data received from the server for multiple streams in proper order but now its fixed 😊</p>k.aditya00@gmail.com (adityaa30)Thu, 11 Jun 2020 06:02:53 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-1-6/Weekly Check In - 0https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-0-1/<p>Hello, I am Aditya Kumar. I will be contributing to Scrapy during GSoC'20. This is my first blog of the series.</p> <h2><strong>What did I do till now?</strong></h2> <ul> <li>I had two meetings with my mentors to discuss about the project goals and deadlines </li> <li>I was looking into implementation of <strong>HTTP/2 Client</strong> by various libraries to get a better picture</li> </ul> <h2><strong>Whats coming up next?</strong></h2> <p>Next week, I would work on implementing a simple HTTP/2 Client which can handle GET, POST &amp; HEAD requests. </p> <h2><strong>Did I get stuck anywhere?</strong></h2> &lt;article&gt; <p>Last week, I was mainly working on tested code functioning as tutorials. So I didn't come across any bugs.</p> &lt;/article&gt; <ul> </ul>k.aditya00@gmail.com (adityaa30)Sun, 31 May 2020 21:48:17 +0000https://blogs.python-gsoc.org/en/adityaa30s-blog/weekly-check-in-0-1/