当前位置 博文首页 > python用700行代码实现http客户端

    python用700行代码实现http客户端

    作者:终末之冬 时间:2021-02-10 18:04

    本文用python在TCP的基础上实现一个HTTP客户端, 该客户端能够复用TCP连接, 使用HTTP1.1协议. 

    一. 创建HTTP请求

      HTTP是基于TCP连接的, 它的请求报文格式如下:

      

      因此, 我们只需要创建一个到服务器的TCP连接, 然后按照上面的格式写好报文并发给服务器, 就实现了一个HTTP请求.

    1. HTTPConnection类

      基于以上的分析, 我们首先定义一个HTTPConnection类来管理连接和请求内容:

    class HTTPConnection:
      default_port = 80
      _http_vsn = 11
      _http_vsn_str = 'HTTP/1.1'
    
      def __init__(self, host: str, port: int = None) -> None:
        self.sock = None
        self._buffer = []
        self.host = host
        self.port = port if port is not None else self.default_port
        self._state = _CS_IDLE
        self._response = None
        self._method = None
        self.block_size = 8192
    
      def _output(self, s: Union[str, bytes]) -> None:
        if hasattr(s, 'encode'):
          s = s.encode('latin-1')
        self._buffer.append(s)
    
      def connect(self) -> None:
        self.sock = socket.create_connection((self.host, self.port))

      对于这个HTTPConnection对象, 我们只需要创建TCP连接, 然后按照HTTP协议的格式把请求数据写入buffer中, 最后把buffer中的数据发送出去就行了.

    2. 编写请求行

      请求行的内容比较简单, 就是说明请求方法, 请求路径和HTTP协议. 使用下面的方法来编写一个请求行:

    def put_request(self, method: str, url: str) -> None:
      self._method = method
    
      url = url or '/'
    
      request = f'{method} {url} {self._http_vsn_str}'
      self._output(request)

    3. 添加请求头

      HTTP请求头和python的字典类似, 每行都是一个字段名与值的映射关系. HTTP协议并不要求设置所有合法的请求头的值, 我们只需要按照需要, 设置特定的请求头即可. 使用如下代码添加请求头:

    def put_header(self, header: Union[bytes, str], value: Union[bytes, str, int]) -> None:
      if hasattr(header, 'encode'):
        header = header.encode('ascii')
    
      if hasattr(value, 'encode'):
        value = value.encode('latin-1')
      elif isinstance(value, int):
        value = str(value).encode('ascii')
    
      header = header + b': ' + value
      self._output(header)

      此外, 在HTTP请求中, Host请求头字段是必须的, 否则网站可能会拒绝响应. 因此, 如果用户没有设置这个字段, 这里就应该主动把它加上去:

    def _add_host(self, url: str) -> None:
      # 所有HTTP / 1.1请求报文中必须包含一个Host头字段
      # 如果用户没给,就调用这个函数来生成
      netloc = ''
      if url.startswith('http'):
        nil, netloc, nil, nil, nil = urllib.parse.urlsplit(url)
    
      if netloc:
        try:
          netloc_enc = netloc.encode('ascii')
        except UnicodeEncodeError:
          netloc_enc = netloc.encode('idna')
        self.put_header('Host', netloc_enc)
      else:
        host = self.host
        port = self.port
    
        try:
          host_enc = host.encode('ascii')
        except UnicodeEncodeError:
          host_enc = host.encode('idna')
    
        # 对IPv6的地址进行额外处理
        if host.find(':') >= 0:
          host_enc = b'[' + host_enc + b']'
    
        if port == self.default_port:
          self.put_header('Host', host_enc)
        else:
          host_enc = host_enc.decode('ascii')
          self.put_header('Host', f'{host_enc}:{port}')

    4. 发送请求正文

      我们接受两种形式的body数据: 一个基于io.IOBase的可读文件对象, 或者是一个能通过迭代得到数据的对象. 在传输数据之前, 我们首先要确定数据是否采用分块传输:

    def request(self, method: str, url: str, headers: dict = None, body: Union[io.IOBase, Iterable] = None,
          encode_chunked: bool = False) -> None:
      ...
      if 'content-length' not in header_names:
        if 'transfer-encoding' not in header_names:
          encode_chunked = False
          content_length = self._get_content_length(body, method)
          if content_length is None:
            if body is not None:
              # 在这种情况下, body一般是个生成器或者可读文件之类的东西,应该分块传输
              encode_chunked = True
              self.put_header('Transfer-Encoding', 'chunked')
          else:
            self.put_header('Content-Length', str(content_length))
        else:
          # 如果设置了transfer-encoding,则根据用户给的encode_chunked参数决定是否分块
          pass
      else:
        # 只要给了content-length,那么一定不是分块传输
        encode_chunked = False
      ...
    
    
    @staticmethod
    def _get_content_length(body: Union[str, bytes, bytearray, Iterable, io.IOBase], method: str) -> Optional[int]:
      if body is None:
        # PUT,POST,PATCH三个方法默认是有body的
        if method.upper() in _METHODS_EXPECTING_BODY:
          return 0
        else:
          return None
    
      if hasattr(body, 'read'):
        return None
    
      try:
        # 对于bytes或者bytearray格式的数据,通过memoryview获取它的长度
        return memoryview(body).nbytes
      except TypeError:
        pass
    
      if isinstance(body, str):
        return len(body)
    
      return None

       在确定了是否分块之后, 就可以把正文发出去了. 如果body是一个可读文件的话, 就调用_read_readable方法把它封装为一个生成器:

    def _send_body(self, message_body: Union[str, bytes, bytearray, Iterable, io.IOBase], encode_chunked: bool) -> None:
      if hasattr(message_body, 'read'):
        chunks = self._read_readable(message_body)
      else:
        try:
          memoryview(message_body)
        except TypeError:
          try:
            chunks = iter(message_body)
          except TypeError:
            raise TypeError(
              f'message_body should be a bytes-like object or an iterable, got {repr(type(message_body))}')
        else:
          # 如果是字节类型的,通过一次迭代把它发出去
          chunks = (message_body,)
    
      for chunk in chunks:
        if not chunk:
          continue
    
        if encode_chunked:
          chunk = f'{len(chunk):X}\r\n'.encode('ascii') + chunk + b'\r\n'
        self.send(chunk)
    
      if encode_chunked:
        self.send(b'0\r\n\r\n')
    
    
    def _read_readable(self, readable: io.IOBase) -> Generator[bytes, None, None]:
      need_encode = False
      if isinstance(readable, io.TextIOBase):
        need_encode = True
      while True:
        data_block = readable.read(self.block_size)
        if not data_block:
          break
        if need_encode:
          data_block = data_block.encode('utf-8')
        yield data_block

    二. 获取响应数据

      HTTP响应报文的格式与请求报文大同小异, 它大致是这样的:

      因此, 我们只要用HTTPConnection的socket对象读取服务器发送的数据, 然后按照上面的格式对数据进行解析就行了.

    1. HTTPResponse类

      我们首先定义一个简单的HTTPResponse类. 它的属性大致上就是socket的文件对象以及一些请求的信息等等, 调用它的begin方法来解析响应行和响应头的数据, 然后调用read方法读取响应正文:

    class HTTPResponse:
    
      def __init__(self, sock: socket.socket, method: str = None) -> None:
        self.fp = sock.makefile('rb')
        self._method = method
        self.headers = None
        self.version = _UNKNOWN
        self.status = _UNKNOWN
        self.reason = _UNKNOWN
        self.chunked = _UNKNOWN
        self.chunk_left = _UNKNOWN
        self.length = _UNKNOWN
        self.will_close = _UNKNOWN
    
      def begin(self) -> None:
        ...
    
      def read(self, amount: int = None) -> bytes:
        ...

    2. 解析状态行

      状态行的解析比较简单, 我们只需要读取响应的第一行数据, 然后把它解析为HTTP协议版本,状态码和原因短语三部分就行了:

    def _read_status(self) -> Tuple[str, int, str]:
      line = str(self._read_line(), 'latin-1')
      if not line:
        raise RemoteDisconnected('Remote end closed connection without response')
      try:
        version, status, reason = line.split(None, 2)
      except ValueError:
        # reason只是给人看的, 一般和status对应, 所以它有可能不存在
        try:
          version, status = line.split(None, 1)
          reason = ''
        except ValueError:
          version, status, reason = '', '', ''
      if not version.startswith('HTTP/'):
        self._close_conn()
        raise BadStatusLine(line)
    
      try:
        status = int(status)
        if status < 100 or status > 999:
          raise BadStatusLine(line)
      except ValueError:
        raise BadStatusLine(line)
      return version, status, reason.strip()

      如果状态码为100, 则客户端需要解析多个响应状态行. 它的原理是这样的: 在请求数据过大的时候, 有的客户端会先不发送请求数据, 而是先在header中添加一个Expect: 100-continue, 如果服务器愿意接收数据, 会返回100的状态码, 这时候客户端再把数据发过去. 因此, 如果读取到100的状态码, 那么后面往往还会收到一个正式的响应数据, 应该继续读取响应头. 这部分的代码如下:

    def begin(self) -> None:
      while True:
        version, status, reason = self._read_status()
        if status != HTTPStatus.CONTINUE:
          break
        # 跳过100状态码部分的响应头
        while True:
          skip = self._read_line().strip()
          if not skip:
            breakself.status = status
      self.reason = reason
      if version in ('HTTP/1.0', 'HTTP/0.9'):
        self.version = 10
      elif version.startswith('HTTP/1.'):
        self.version = 11
      else:
        # HTTP2还没研究, 这里就不写了
        raise UnknownProtocol(version)
    
      ...

    3. 解析响应头

      解析响应头比响应行还要简单. 因为每个header字段占一行, 我们只需要一直调用read_line方法读取字段, 直到读完header为止就行了.

    def _parse_header(self) -> None:
      headers = {}
      while True:
        line = self._read_line()
        if len(headers) > _MAX_HEADERS:
          raise HTTPException('got more than %d headers' % _MAX_HEADERS)
        if line in _EMPTY_LINE:
          break
        line = line.decode('latin-1')
        i = line.find(':')
        if i == -1:
          raise BadHeaderLine(line)
        # 这里默认没有重名的情况
        key, value = line[:i].lower(), line[i + 1:].strip()
        headers[key] = value
      self.headers = headers

    4. 接收响应正文

      在接收响应正文之前, 首先要确定它的传输方式和长度:

    def _set_chunk(self) -> None:
      transfer_encoding = self.get_header('transfer-encoding')
      if transfer_encoding and transfer_encoding.lower() == 'chunked':
        self.chunked = True
        self.chunk_left = None
      else:
        self.chunked = False
    
    
    def _set_length(self) -> None:
      # 首先要知道数据是否是分块传输的
      if self.chunked == _UNKNOWN:
        self._set_chunk()
    
      # 如果状态码是1xx或者204(无响应内容)或者304(使用上次缓存的内容),则没有响应正文
      # 如果这是个HEAD请求,那么也不能有响应正文
      if (self.status == HTTPStatus.NO_CONTENT or
          self.status == HTTPStatus.NOT_MODIFIED or
          100 <= self.status < 200 or
          self._method == 'HEAD'):
        self.length = 0
        return
    
      length = self.get_header('content-length')
      if length and not self.chunked:
        try:
          self.length = int(length)
        except ValueError:
          self.length = None
        else:
          if self.length < 0:
            self.length = None
      else:
        self.length = None

       然后, 我们实现一个read方法, 从body中读取指定大小的数据:

    def read(self, amount: int = None) -> bytes:
      if self.is_closed():
        return b''
      if self._method == 'HEAD':
        self.close()
        return b''
      if amount is None:
        return self._read_all()
      return self._read_amount(amount)

      如果没有指定需要的数据大小, 就默认读取所有数据:

    def _read_all(self) -> bytes:
      if self.chunked:
        return self._read_all_chunk()
      if self.length is None:
        s = self.fp.read()
      else:
        try:
          s = self._read_bytes(self.length)
        except IncompleteRead:
          self.close()
          raise
        self.length = 0
      self.close()
      return s
    
    
    def _read_all_chunk(self) -> bytes:
      assert self.chunked != _UNKNOWN
      value = []
      try:
        while True:
          chunk = self._read_chunk()
          if chunk is None:
            break
          value.append(chunk)
        return b''.join(value)
      except IncompleteRead:
        raise IncompleteRead(b''.join(value))
    
    
    def _read_chunk(self) -> Optional[bytes]:
      try:
        chunk_size = self._read_chunk_size()
      except ValueError:
        raise IncompleteRead(b'')
      if chunk_size == 0:
        self._read_and_discard_trailer()
        self.close()
        return None
      chunk = self._read_bytes(chunk_size)
      # 每块的结尾会有一个\r\n,这里把它读掉
      self._read_bytes(2)
      return chunk
    
    
    def _read_chunk_size(self) -> int:
      line = self._read_line(error_message='chunk size')
      i = line.find(b';')
      if i >= 0:
        line = line[:i]
      try:
        return int(line, 16)
      except ValueError:
        self.close()
        raise
    
    
    def _read_and_discard_trailer(self) -> None:
      # chunk的尾部可能会挂一些额外的信息,比如MD5值,过期时间等等,一般会在header中用trailer字段说明
      # 当chunk读完之后调用这个函数, 这些信息就先舍弃掉得了
      while True:
        line = self._read_line(error_message='chunk size')
        if line in _EMPTY_LINE:
          break

      否则的话, 就读取部分数据, 如果正好是分块数据的话, 就比较复杂了. 简单来说, 就是用bytearray制造一个所需大小的数组, 然后依次读取chunk把数据往里面填, 直到填满或者没数据为止.  然后用chunk_left记录下当前块剩余的量, 以便下次读取.

    def _read_amount(self, amount: int) -> bytes:
      if self.chunked:
        return self._read_amount_chunk(amount)
      if isinstance(self.length, int) and amount > self.length:
        amount = self.length
      container = bytearray(amount)
      n = self.fp.readinto(container)
      if not n and container:
        # 如果读不到字节了,也就可以关了
        self.close()
      elif self.length is not None:
        self.length -= n
        if not self.length:
          self.close()
      return memoryview(container)[:n].tobytes()
    
    
    def _read_amount_chunk(self, amount: int) -> bytes:
      # 调用这个方法,读取amount大小的chunk类型数据,不足就全部读取
      assert self.chunked != _UNKNOWN
      total_bytes = 0
      container = bytearray(amount)
      mvb = memoryview(container)
      try:
        while True:
          # mvb可以理解为容器的空的那一部分
          # 这里一直调用_full_readinto把数据填进去,让mvb越来越小,同时记录填入的量
          # 等没数据或者当前数据足够把mvb填满之后,跳出循环
          chunk_left = self._get_chunk_left()
          if chunk_left is None:
            break
          if len(mvb) <= chunk_left:
            n = self._full_readinto(mvb)
            self.chunk_left = chunk_left - n
            total_bytes += n
            break
          temp_mvb = mvb[:chunk_left]
          n = self._full_readinto(temp_mvb)
          mvb = mvb[n:]
          total_bytes += n
          self.chunk_left = 0
    
      except IncompleteRead:
        raise IncompleteRead(bytes(container[:total_bytes]))
    
      return memoryview(container)[:total_bytes].tobytes()
    
    
    def _full_readinto(self, container: memoryview) -> int:
      # 返回读取的量.如果没能读满,这个方法会报警
      amount = len(container)
      n = self.fp.readinto(container)
      if n < amount:
        raise IncompleteRead(bytes(container[:n]), amount - n)
      return n
    
    
    def _get_chunk_left(self) -> Optional[int]:
      # 如果当前块读了一半,那么直接返回self.chunk_left就行了
      # 否则,有三种情况
      # 1). chunk_left为None,说明body压根没开始读,于是返回当前这一整块的长度
      # 2). chunk_left为0,说明这块读完了,于是返回下一块的长度
      # 3). body数据读完了,返回None,顺便做好善后工作
      chunk_left = self.chunk_left
      if not chunk_left:
        if chunk_left == 0:
          # 如果剩余零,说明上一块已经读完了,这里把\r\n读掉
          # 如果是None,就说明chunk压根没开始读
          self._read_bytes(2)
        try:
          chunk_left = self._read_chunk_size()
        except ValueError:
          raise IncompleteRead(b'')
        if chunk_left == 0:
          self._read_and_discard_trailer()
          self.close()
          chunk_left = None
        self.chunk_left = chunk_left
      return chunk_left

    三. 复用TCP连接

      HTTP通信本质上是基于TCP连接发送和接收HTTP请求和响应, 因此, 只要TCP连接不断开, 我们就可以继续用它进行HTTP请求, 这样就避免了创建和销毁TCP连接产生的消耗.

    1. 判断连接是否会断开

      在下面几种情况中, 服务端会自动断开连接:

    • HTTP协议小于1.1且没有在头部设置了keep-alive
    • HTTP协议大于等于1.1但是在头部设置了connection: close
    • 数据没有分块传输, 也没有说明数据的长度, 这种情况下, 服务器一般会在发送完成后断开连接, 让客户端知道数据发完了

      根据上面列出来的几种情况, 通过下面的代码来判断连接是否会断开:

    def _check_close(self) -> bool:
      conn = self.get_header('connection')
    
      if not self.chunked and self.length is None:
        return True
    
      if self.version == 11:
        if conn and 'close' in conn.lower():
          return True
        return False
      else:
        if self.headers.get('keep-alive'):
          return False
    
        if conn and 'keep-alive' in conn.lower():
          return False
    
      return True

    2. 正确地关闭HTTPResponse对象

      由于TCP连接的复用, 一个HTTPConnection可以产生多个HTTPResponse对象, 而这些对象在同一个TCP连接上, 会共用这个连接的读缓冲区. 这就导致, 如果上一个HTTPResponse对象没有把它的那部分数据读完, 就会对下一个响应产生影响.

      另一方面来看, 我们也需要及时地关闭与这个TCP关联的文件对象来避免占用资源. 因此, 我们定义如下的close方法关闭一个HTTPResponse对象:

    def close(self) -> None:
      if self.is_closed():
        return
      fp = self.fp
      self.fp = None
      fp.close()
    
    
    def is_closed(self) -> bool:
      return self.fp is None

      用户调用HTTPResponse对象的read方法, 把缓冲区数据读完之后, 就会自动调用close方法(具体实现见上一章的第四节: 读取响应数据这部分). 因此, 在获取下一个响应数据之前, 我们只需要调用这个对象的is_closed方法, 就能判断读缓冲区是否已经读完, 能否继续接收响应了.

    3. HTTP请求的生命周期

      不使用管道机制的话, 不同的HTTP请求必须按次序进行, 相互之间不能重叠. 基于这个原因, 我们为HTTPConnection对象设置IDLE, REQ_STARTED和REQ_SENT三种状态, 一个完整的请求应该经历这几种状态:

      根据上面的流程, 对HTTPConnection中对应的方法进行修改:

    def get_response(self) -> HTTPResponse:
      if self._response and self._response.is_closed():
        self._response = None
      if self._state != _CS_REQ_SENT or self._response:
        raise ResponseNotReady(self._state)
    
      response = HTTPResponse(self.sock, method=self._method)
    
      try:
        try:
          response.begin()
        except ConnectionError:
          self.close()
          raise
        assert response.will_close != _UNKNOWN
        self._state = _CS_IDLE
    
        if response.will_close:
          self.close()
        else:
          self._response = response
    
        return response
      except Exception as _:
        response.close()
        raise
    
    def put_request(self, method: str, url: str) -> None:
      # 调用这个函数开始新一轮的请求,它负责写好请求行输出到缓存里面去
      # 调用它的前提是当前处于空闲状态
      # 如果之前的response还在并且已结束,会自动把它消除掉
      if self._response and self._response.is_closed():
        self._response = None
    
      if self._state == _CS_IDLE:
        self._state = _CS_REQ_STARTED
      else:
        raise CannotSendRequest(self._state)
    
      ...
    
    def put_header(self, header: Union[bytes, str], value: Union[bytes, str, int]) -> None:
      if self._state != _CS_REQ_STARTED:
        raise CannotSendHeader()
    
      ...
    
    def end_headers(self, message_body=None, encode_chunked=False) -> None:
      if self._state == _CS_REQ_STARTED:
        self._state = _CS_REQ_SENT
      else:
        raise CannotSendHeader()
      ...

      需要注意的是, 如果第二个请求已经进入到获取响应的阶段了, 而上一个请求的响应还没关闭, 那么就应该直接报错, 否则读取到的会是上一个请求剩余的响应部分数据, 导致解析响应出现问题.

    事实上, HTTP1.1开始支持管道化技术, 也就是一次提交多个HTTP请求, 然后等待响应, 而不是在接收到上一个请求的响应后, 才发送后面的请求.
    基于这种处理模式, 管道化技术理论上可以减少IO时间的损耗, 提升效率, 不过, 需要服务端的支持, 而且会增加程序的复杂程度, 这里就不实现了.

    四. 总结

    1. 完整代码

      HTTPConnection的完整代码如下:

    class HTTPConnection:
      default_port = 80
      _http_vsn = 11
      _http_vsn_str = 'HTTP/1.1'
    
      def __init__(self, host: str, port: int = None) -> None:
        self.sock = None
        self._buffer = []
        self.host = host
        self.port = port if port is not None else self.default_port
        self._state = _CS_IDLE
        self._response = None
        self._method = None
        self.block_size = 8192
    
      def request(self, method: str, url: str, headers: dict = None, body: Union[io.IOBase, Iterable] = None,
            encode_chunked: bool = False) -> None:
        self.put_request(method, url)
        headers = headers or {}
        header_names = frozenset(k.lower() for k in headers.keys())
        if 'host' not in header_names:
          self._add_host(url)
    
        if 'content-length' not in header_names:
          if 'transfer-encoding' not in header_names:
            encode_chunked = False
            content_length = self._get_content_length(body, method)
            if content_length is None:
              if body is not None:
                encode_chunked = True
                self.put_header('Transfer-Encoding', 'chunked')
            else:
              self.put_header('Content-Length', str(content_length))
          else:
            # 如果设置了transfer-encoding,则根据用户给的encode_chunked参数决定是否分块
            pass
        else:
          # 只要给了content-length,那么一定不是分块传输
          encode_chunked = False
    
        for hdr, value in headers.items():
          self.put_header(hdr, value)
        if isinstance(body, str):
          body = _encode(body)
        self.end_headers(body, encode_chunked=encode_chunked)
    
      def send(self, data: bytes) -> None:
        if self.sock is None:
          self.connect()
    
        self.sock.sendall(data)
    
      def get_response(self) -> HTTPResponse:
        if self._response and self._response.is_closed():
          self._response = None
        if self._state != _CS_REQ_SENT or self._response:
          raise ResponseNotReady(self._state)
    
        response = HTTPResponse(self.sock, method=self._method)
    
        try:
          try:
            response.begin()
          except ConnectionError:
            self.close()
            raise
          assert response.will_close != _UNKNOWN
          self._state = _CS_IDLE
    
          if response.will_close:
            self.close()
          else:
            self._response = response
    
          return response
        except Exception as _:
          response.close()
          raise
    
      def connect(self) -> None:
        self.sock = socket.create_connection((self.host, self.port))
    
      def close(self) -> None:
        self._state = _CS_IDLE
        try:
          sock = self.sock
          if sock:
            self.sock = None
            sock.close()
        finally:
          response = self._response
          if response:
            self._response = None
            response.close()
    
      def put_request(self, method: str, url: str) -> None:
        # 调用这个函数开始新一轮的请求,它负责写好请求行输出到缓存里面去
        # 调用它的前提是当前处于空闲状态
        # 如果之前的response还在并且已结束,会自动把它消除掉
        if self._response and self._response.is_closed():
          self._response = None
    
        if self._state == _CS_IDLE:
          self._state = _CS_REQ_STARTED
        else:
          raise CannotSendRequest(self._state)
    
        self._method = method
    
        url = url or '/'
    
        request = f'{method} {url} {self._http_vsn_str}'
        self._output(request)
    
      def put_header(self, header: Union[bytes, str], value: Union[bytes, str, int]) -> None:
        if self._state != _CS_REQ_STARTED:
          raise CannotSendHeader()
    
        if hasattr(header, 'encode'):
          header = header.encode('ascii')
    
        if hasattr(value, 'encode'):
          value = value.encode('latin-1')
        elif isinstance(value, int):
          value = str(value).encode('ascii')
    
        header = header + b': ' + value
        self._output(header)
    
      def end_headers(self, message_body=None, encode_chunked=False) -> None:
        if self._state == _CS_REQ_STARTED:
          self._state = _CS_REQ_SENT
        else:
          raise CannotSendHeader()
        self._send_output(message_body, encode_chunked=encode_chunked)
    
      def _add_host(self, url: str) -> None:
        # 所有HTTP / 1.1请求报文中必须包含一个Host头字段
        # 如果用户没给,就调用这个函数来生成
        netloc = ''
        if url.startswith('http'):
          nil, netloc, nil, nil, nil = urlsplit(url)
    
        if netloc:
          try:
            netloc_enc = netloc.encode('ascii')
          except UnicodeEncodeError:
            netloc_enc = netloc.encode('idna')
          self.put_header('Host', netloc_enc)
        else:
          host = self.host
          port = self.port
    
          try:
            host_enc = host.encode('ascii')
          except UnicodeEncodeError:
            host_enc = host.encode('idna')
    
          # 对IPv6的地址进行额外处理
          if host.find(':') >= 0:
            host_enc = b'[' + host_enc + b']'
    
          if port == self.default_port:
            self.put_header('Host', host_enc)
          else:
            host_enc = host_enc.decode('ascii')
            self.put_header('Host', f'{host_enc}:{port}')
    
      def _output(self, s: Union[str, bytes]) -> None:
        # 将数据添加到缓冲区
        if hasattr(s, 'encode'):
          s = s.encode('latin-1')
        self._buffer.append(s)
    
      def _send_output(self, message_body=None, encode_chunked=False) -> None:
        # 发送并清空缓冲数据.然后,如果有请求正文,就也顺便发送
    
        self._buffer.extend((b'', b''))
        msg = b'\r\n'.join(self._buffer)
        self._buffer.clear()
        self.send(msg)
    
        if message_body is not None:
          self._send_body(message_body, encode_chunked)
    
      def _send_body(self, message_body: Union[bytes, str, bytearray, Iterable, io.IOBase], encode_chunked: bool) -> None:
        if hasattr(message_body, 'read'):
          chunks = self._read_readable(message_body)
        else:
          try:
            memoryview(message_body)
          except TypeError:
            try:
              chunks = iter(message_body)
            except TypeError:
              raise TypeError(
                f'message_body should be a bytes-like object or an iterable, got {repr(type(message_body))}')
          else:
            # 如果是字节类型的,通过一次迭代把它发出去
            chunks = (message_body,)
    
        for chunk in chunks:
          if not chunk:
            continue
    
          if encode_chunked:
            chunk = f'{len(chunk):X}\r\n'.encode('ascii') + chunk + b'\r\n'
          self.send(chunk)
    
        if encode_chunked:
          self.send(b'0\r\n\r\n')
    
      def _read_readable(self, readable: io.IOBase) -> Generator[bytes, None, None]:
        need_encode = False
        if isinstance(readable, io.TextIOBase):
          need_encode = True
        while True:
          data_block = readable.read(self.block_size)
          if not data_block:
            break
          if need_encode:
            data_block = data_block.encode('utf-8')
          yield data_block
    
      @staticmethod
      def _get_content_length(body: Union[str, bytes, bytearray, Iterable, io.IOBase], method: str) -> Optional[int]:
        if body is None:
          # PUT,POST,PATCH三个方法默认是有body的
          if method.upper() in _METHODS_EXPECTING_BODY:
            return 0
          else:
            return None
    
        if hasattr(body, 'read'):
          return None
    
        try:
          # 对于bytes或者bytearray格式的数据,通过memoryview获取它的长度
          return memoryview(body).nbytes
        except TypeError:
          pass
    
        if isinstance(body, str):
          return len(body)
    
        return None
    
    

      HTTPResponse的完整代码如下:

    class HTTPResponse:
    
      def __init__(self, sock: socket.socket, method: str = None) -> None:
        self.fp = sock.makefile('rb')
        self._method = method
        self.headers = None
        self.version = _UNKNOWN
        self.status = _UNKNOWN
        self.reason = _UNKNOWN
        self.chunked = _UNKNOWN
        self.chunk_left = _UNKNOWN
        self.length = _UNKNOWN
        self.will_close = _UNKNOWN
    
      def begin(self) -> None:
        if self.headers is not None:
          return
        self._parse_status_line()
        self._parse_header()
        self._set_chunk()
        self._set_length()
        self.will_close = self._check_close()
    
      def _read_line(self, limit: int = _MAX_LINE + 1, error_message: str = '') -> bytes:
        # 注意,这个方法默认不去除line尾部的\r\n
        line = self.fp.readline(limit)
        if len(line) > _MAX_LINE:
          raise LineTooLong(error_message)
        return line
    
      def _read_bytes(self, amount: int) -> bytes:
        data = self.fp.read(amount)
        if len(data) < amount:
          raise IncompleteRead(data, amount - len(data))
        return data
    
      def _parse_status_line(self) -> None:
        while True:
          version, status, reason = self._read_status()
          if status != HTTPStatus.CONTINUE:
            break
          while True:
            skip = self._read_line(error_message='header line').strip()
            if not skip:
              break
    
        self.status = status
        self.reason = reason
        if version in ('HTTP/1.0', 'HTTP/0.9'):
          self.version = 10
        elif version.startswith('HTTP/1.'):
          self.version = 11
        else:
          raise UnknownProtocol(version)
    
      def _read_status(self) -> Tuple[str, int, str]:
        line = str(self._read_line(error_message='status line'), 'latin-1')
        if not line:
          raise RemoteDisconnected('Remote end closed connection without response')
        try:
          version, status, reason = line.split(None, 2)
        except ValueError:
          # reason只是给人看的, 和status对应, 所以它有可能不存在
          try:
            version, status = line.split(None, 1)
            reason = ''
          except ValueError:
            version, status, reason = '', '', ''
        if not version.startswith('HTTP/'):
          self.close()
          raise BadStatusLine(line)
    
        try:
          status = int(status)
          if status < 100 or status > 999:
            raise BadStatusLine(line)
        except ValueError:
          raise BadStatusLine(line)
        return version, status, reason.strip()
    
      def _parse_header(self) -> None:
        headers = {}
        while True:
          line = self._read_line(error_message='header line')
          if len(headers) > _MAX_HEADERS:
            raise HTTPException('got more than %d headers' % _MAX_HEADERS)
          if line in _EMPTY_LINE:
            break
          line = line.decode('latin-1')
          i = line.find(':')
          if i == -1:
            raise BadHeaderLine(line)
          # 这里默认没有重名的情况
          key, value = line[:i].lower(), line[i + 1:].strip()
          headers[key] = value
        self.headers = headers
    
      def _set_chunk(self) -> None:
        transfer_encoding = self.get_header('transfer-encoding')
        if transfer_encoding and transfer_encoding.lower() == 'chunked':
          self.chunked = True
          self.chunk_left = None
        else:
          self.chunked = False
    
      def _set_length(self) -> None:
        # 首先要知道数据是否是分块传输的
        if self.chunked == _UNKNOWN:
          self._set_chunk()
    
        # 如果状态码是1xx或者204(无响应内容)或者304(使用上次缓存的内容),则没有响应正文
        # 如果这是个HEAD请求,那么也不能有响应正文
        assert isinstance(self.status, int)
        if (self.status == HTTPStatus.NO_CONTENT or
            self.status == HTTPStatus.NOT_MODIFIED or
            100 <= self.status < 200 or
            self._method == 'HEAD'):
          self.length = 0
          return
    
        length = self.get_header('content-length')
        if length and not self.chunked:
          try:
            self.length = int(length)
          except ValueError:
            self.length = None
          else:
            if self.length < 0:
              self.length = None
        else:
          self.length = None
    
      def _check_close(self) -> bool:
        conn = self.get_header('connection')
    
        if not self.chunked and self.length is None:
          return True
    
        if self.version == 11:
          if conn and 'close' in conn.lower():
            return True
          return False
        else:
          if self.headers.get('keep-alive'):
            return False
    
          if conn and 'keep-alive' in conn.lower():
            return False
    
        return True
    
      def close(self) -> None:
        if self.is_closed():
          return
        fp = self.fp
        self.fp = None
        fp.close()
    
      def is_closed(self) -> bool:
        return self.fp is None
    
      def read(self, amount: int = None) -> bytes:
        if self.is_closed():
          return b''
        if self._method == 'HEAD':
          self.close()
          return b''
        if amount is None:
          return self._read_all()
        print(amount, amount is None)
        return self._read_amount(amount)
    
      def _read_all(self) -> bytes:
        if self.chunked:
          return self._read_all_chunk()
        if self.length is None:
          s = self.fp.read()
        else:
          try:
            s = self._read_bytes(self.length)
          except IncompleteRead:
            self.close()
            raise
          self.length = 0
        self.close()
        return s
    
      def _read_all_chunk(self) -> bytes:
        assert self.chunked != _UNKNOWN
        value = []
        try:
          while True:
            chunk = self._read_chunk()
            if chunk is None:
              break
            value.append(chunk)
          return b''.join(value)
        except IncompleteRead:
          raise IncompleteRead(b''.join(value))
    
      def _read_chunk(self) -> Optional[bytes]:
        try:
          chunk_size = self._read_chunk_size()
        except ValueError:
          raise IncompleteRead(b'')
        if chunk_size == 0:
          self._read_and_discard_trailer()
          self.close()
          return None
        chunk = self._read_bytes(chunk_size)
        # 每块的结尾会有一个\r\n,这里把它读掉
        self._read_bytes(2)
        return chunk
    
      def _read_chunk_size(self) -> int:
        line = self._read_line(error_message='chunk size')
        i = line.find(b';')
        if i >= 0:
          line = line[:i]
        try:
          return int(line, 16)
        except ValueError:
          self.close()
          raise
    
      def _read_and_discard_trailer(self) -> None:
        # chunk的尾部可能会挂一些额外的信息,比如MD5值,过期时间等等,一般会在header中用trailer字段说明
        # 当chunk读完之后调用这个函数, 这些信息就先舍弃掉得了
        while True:
          line = self._read_line(error_message='chunk size')
          if line in _EMPTY_LINE:
            break
    
      def _read_amount(self, amount: int) -> bytes:
        if self.chunked:
          return self._read_amount_chunk(amount)
        if isinstance(self.length, int) and amount > self.length:
          amount = self.length
        container = bytearray(amount)
        n = self.fp.readinto(container)
        if not n and container:
          # 如果读不到字节了,也就可以关了
          self.close()
        elif self.length is not None:
          self.length -= n
          if not self.length:
            self.close()
        return memoryview(container)[:n].tobytes()
    
      def _read_amount_chunk(self, amount: int) -> bytes:
        # 调用这个方法,读取amount大小的chunk类型数据,不足就全部读取
        assert self.chunked != _UNKNOWN
        total_bytes = 0
        container = bytearray(amount)
        mvb = memoryview(container)
        try:
          while True:
            # mvb可以理解为容器的空的那一部分
            # 这里一直调用_full_readinto把数据填进去,让mvb越来越小,同时记录填入的量
            # 等没数据或者当前数据足够把mvb填满之后,跳出循环
            chunk_left = self._get_chunk_left()
            if chunk_left is None:
              break
            if len(mvb) <= chunk_left:
              n = self._full_readinto(mvb)
              self.chunk_left = chunk_left - n
              total_bytes += n
              break
            temp_mvb = mvb[:chunk_left]
            n = self._full_readinto(temp_mvb)
            mvb = mvb[n:]
            total_bytes += n
            self.chunk_left = 0
    
        except IncompleteRead:
          raise IncompleteRead(bytes(container[:total_bytes]))
    
        return memoryview(container)[:total_bytes].tobytes()
    
      def _full_readinto(self, container: memoryview) -> int:
        # 返回读取的量.如果没能读满,这个方法会报警
        amount = len(container)
        n = self.fp.readinto(container)
        if n < amount:
          raise IncompleteRead(bytes(container[:n]), amount - n)
        return n
    
      def _get_chunk_left(self) -> Optional[int]:
        # 如果当前块读了一半,那么直接返回self.chunk_left就行了
        # 否则,有三种情况
        # 1). chunk_left为None,说明body压根没开始读,于是返回当前这一整块的长度
        # 2). chunk_left为0,说明这块读完了,于是返回下一块的长度
        # 3). body数据读完了,返回None,顺便做好善后工作
        chunk_left = self.chunk_left
        if not chunk_left:
          if chunk_left == 0:
            # 如果剩余零,说明上一块已经读完了,这里把\r\n读掉
            # 如果是None,就说明chunk压根没开始读
            self._read_bytes(2)
          try:
            chunk_left = self._read_chunk_size()
          except ValueError:
            raise IncompleteRead(b'')
          if chunk_left == 0:
            self._read_and_discard_trailer()
            self.close()
            chunk_left = None
          self.chunk_left = chunk_left
        return chunk_left
    
      def get_header(self, name, default: str = None) -> Optional[str]:
        if self.headers is None:
          raise ResponseNotReady()
        return self.headers.get(name, default)
    
      @property
      def info(self) -> str:
        return repr(self.headers)
    
    
    
    下一篇:没有了