.. -*- coding: utf-8 -*-
.. :Project:   pglast — Parser module
.. :Created:   gio 10 ago 2017 10:19:26 CEST
.. :Author:    Lele Gaifax <lele@metapensiero.it>
.. :License:   GNU General Public License version 3 or later
.. :Copyright: © 2017, 2018, 2021, 2023, 2024 Lele Gaifax
..

==========================================================
 :mod:`pglast.parser` --- The interface with libpg_query
==========================================================

.. module:: pglast.parser
   :synopsis: The interface with libpg_query

This module is a C extension written in Cython__ that exposes a few functions from the
underlying ``libpg_query`` library it links against.

.. data:: LONG_MAX

   The highest integer that can be stored in a C ``long`` variable: it is used as a marker, for
   example in PG's ``FetchStmt.howMany``, that uses the constant ``FETCH_ALL``.

.. exception:: ParseError

   Exception representing the error state returned by the parser.

.. exception:: DeparseError

   Exception representing the error state returned by the deparser.

.. class:: Displacements(string)

   Helper class used to find the index of Unicode character from its offset in the
   corresponding UTF-8 encoded array.

   Example:

   .. doctest::

      >>> from pglast.parser import Displacements
      >>> unicode = '€ 0.01'
      >>> utf8 = unicode.encode('utf-8')
      >>> d = Displacements(unicode)
      >>> for offset in range(len(utf8)):
      ...   idx = d(offset)
      ...   print(f'{offset} [{utf8[offset]:2x}] -> {idx} [{unicode[idx]}]')
      ...
      0 [e2] -> 0 [€]
      1 [82] -> 0 [€]
      2 [ac] -> 0 [€]
      3 [20] -> 1 [ ]
      4 [30] -> 2 [0]
      5 [2e] -> 3 [.]
      6 [30] -> 4 [0]
      7 [31] -> 5 [1]

   The underlying ``libpg_parse`` library operates on ``UTF-8`` strings: its parser functions
   emit tokens with a ``location``, that is actually the offset within the ``UTF-8``
   representation of the statement. With this class you can fixup those offsets, like in the
   following example:

   .. doctest::

      >>> import json
      >>> from pglast.parser import parse_sql_json
      >>> stmt = 'select alias.bar as alìbàbà from foo as alias'
      >>> parsed = json.loads(parse_sql_json(stmt))
      >>> select = parsed['stmts'][0]['stmt']['SelectStmt']
      >>> rangevar = select['fromClause'][0]['RangeVar']
      >>> loc = rangevar['location']
      >>> print(stmt[loc:loc+3])
       as
      >>> d = Displacements(stmt)
      >>> adjloc = d(loc)
      >>> print(stmt[adjloc:adjloc+3])
      foo

.. function:: deparse_protobuf(buffer)

   :param bytes buffer: a ``Protobuf`` buffer
   :returns: str

   Return the ``SQL`` statement from the given `buffer` argument, something generated by
   :func:`parse_sql_protobuf()`.

.. function:: fingerprint(query)

   :param str query: The SQL statement
   :returns: str

   Fingerprint the given `query`, a string with the ``SQL`` statement(s), and return a
   hash digest that can identify similar queries. For similar queries that are different
   only because of the queried object or formatting, the returned digest will be the same.

.. function:: get_postgresql_version()

   :returns: a tuple

   Return the PostgreSQL version as a tuple (`major`, `minor`, `patch`).

.. function:: parse_sql(query)

   :param str query: The SQL statement
   :returns: tuple

   Parse the given `query`, a string with the ``SQL`` statement(s), and return the
   corresponding *parse tree* as a tuple of :class:`pglast.ast.RawStmt` instances.

.. function:: parse_sql_json(query)

   :param str query: The SQL statement
   :returns: str

   Parse the given `query`, a string with the ``SQL`` statement(s), and return the
   ``libpg_query``\ 's ``JSON``\ -serialized parse tree.

.. function:: parse_sql_protobuf(query)

   :param str query: The SQL statement
   :returns: bytes

   Parse the given `query`, a string with the ``SQL`` statement(s), and return the
   ``libpg_query``\ 's ``Protobuf``\ -serialized parse tree.

.. function:: parse_plpgsql_json(query)

   :param str query: The PLpgSQL statement
   :returns: str

   Parse the given `query`, a string with the ``plpgsql`` statement(s), and return the
   ``libpg_query``\ 's ``JSON``\ -serialized parse tree.

.. function:: scan(query)

   :param str query: The SQL statement
   :returns: sequence of tuples

   Split the given `query` into its *tokens*. Each token is a `namedtuple` with the following
   slots:

   start : int
     the index of the start of the token

   end : int
     the index of the end of the token

   name : str
     the name of the token

   kind : str
     the kind of the token

   Example:

   .. doctest::

      >>> from pglast.parser import scan
      >>> stmt = 'select bar as alìbàbà from foo'
      >>> tokens = scan(stmt)
      >>> print(tokens[0])
      Token(start=0, end=5, name='SELECT', kind='RESERVED_KEYWORD')
      >>> print([stmt[t.start:t.end+1] for t in tokens])
      ['select', 'bar', 'as', 'alìbàbà', 'from', 'foo']

.. function:: split(query, with_parser=True, only_slices=False)

   :param str query: The SQL statement
   :param bool with_parser: Whether to use the parser or the scanner
   :param bool only_slices: Return slices instead of statement's text
   :returns: tuple

   Split the given `stmts` string into a sequence of the single ``SQL`` statements.

   By default this uses the *parser* to perform the job; when `with_parser` is ``False``
   the *scanner* variant is used, indicated when the statements may contain parse errors.

   When `only_slices` is ``True``, return a sequence of :class:`slice` instances, one for each
   statement, instead of statements text.

   .. note:: Leading and trailing whitespace are removed from the statements.

   Example:

   .. doctest::

      >>> from pglast.parser import split
      >>> split('select 1 for; select 2')
      Traceback (most recent call last):
        ...
      pglast.parser.ParseError: syntax error at or near ";", at index 12
      >>> split('select 1 for; select 2', with_parser=False)
      ('select 1 for', 'select 2')
      >>> stmts = "select 'fòò'; select 'bàr'"
      >>> print([stmts[r] for r in split(stmts, only_slices=True)])
      ["select 'fòò'", "select 'bàr'"]

__ http://cython.org/
