PostgreSQL - Recursive queries with PostgreSQL

 
Vista:
Imágen de perfil de outrera

Recursive queries with PostgreSQL

Publicado por outrera (35 intervenciones) el 02/04/2014 15:32:25
FUENTE: http://www.niwi.be/2013/10/26/recursive-queries-postgresql/


Recursive queries with PostgreSQL


Sat 26 October 2013 - In blog.

Tags: postgresql

Sometimes we need store some graph/tree structures in your relational database (eg, postgresql) and query childs of some top level node of your stored tree.

This problem, has various soultions.

The most simple way to get all child nodes of one top level node is execute query for each tree depth. But for large trees with 5+ levels of nesting, this is pretty slow.

PostgreSQL offers recursive queries that helps a lot obtaining all child nodes.


Sample data

Here, we will try both methods for obtain a child nodes. For this experimet we will used these sql data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
DROP TABLE IF EXISTS node;
 
CREATE TABLE node (
    id integer,
    parent_id integer,
    name varchar(255),
    PRIMARY KEY(id)
);
 
INSERT INTO node VALUES
    (1, null, 'node 1'),
    (2, null, 'node 2'),
    (3,    1, 'node 3'),
    (4,    3, 'node 4'),
    (5,    2, 'node 5'),
    (6,    4, 'node 6'),
    (7,    6, 'node 7'),
    (8,    7, 'node 8'),
    (9,    1, 'node 9');


Experiment code

Watching this snippet, we can obseve two main functions, firts implements a simple way for obtain all chinlds and second use postgresql recursive query for do same thing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# -*- coding: utf-8 -*-
 
import datetime
import pprint
import functools
 
from sqlalchemy import create_engine
from sqlalchemy.sql import text
 
engine = create_engine('postgresql+psycopg2://localhost/test', echo=False)
 
def bench(func=None, *, name=None):
    """
    Decorator used for benchmark execution of arbitrary function.
    """
    if func is None:
        return functools.partial(bench, name=name)
 
    def _wrapper(*args, **kwargs):
        t = datetime.datetime.now()
        result = func(*args, **kwargs)
 
        delta = (datetime.datetime.now() - t).total_seconds() * 1000.0
        print("[{}] Elapsed time: {} msecs.".format(name, delta))
        return result
    return _wrapper
 
@bench(name="simple")
def dump_simple():
    sm_s = text("select id, parent_id, name from node where parent_id = :parent_id order by name;")
    sm_n = text("select id, parent_id, name from node where id = :id;")
 
    def get_parents(conn, parent_id):
        result = conn.execute(sm_s, parent_id=parent_id)
        final_results = []
 
        for item in result:
            final_results.append(item)
            final_results.extend(get_parents(conn, item[0]))
 
        return final_results
 
    def dumps(conn, id):
        result = conn.execute(sm_n, id=id)
        first = result.fetchone()
        result.close()
        return [first] + get_parents(conn, id)
 
    with engine.begin() as conn:
        return dumps(conn, 1)
 
@bench(name="recursive")
def dump_recursive():
    sm_s = text("""
        WITH RECURSIVE nodes_cte AS (
            SELECT n.id, n.parent_id, n.name, n.id::TEXT AS path
                FROM node AS n
                WHERE n.parent_id = :id
            UNION ALL
            SELECT c.id, c.parent_id, c.name, (p.path || '->' || c.id::TEXT) AS path
                FROM nodes_cte AS p, node AS c
                WHERE c.parent_id = p.id
        )
        (
            SELECT id, parent_id, name, '' as path FROM node WHERE id = :id
            UNION ALL
            SELECT * FROM nodes_cte ORDER BY path ASC
        );
    """)
 
    with engine.begin() as conn:
        result = conn.execute(sm_s, id=1)
        try:
            return result.fetchall()
        finally:
            result.close()
 
if __name__ == "__main__":
    pprint.pprint(dump_simple())
    pprint.pprint(dump_recursive())



Benchmarking

This is a result of executing a code snippet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[simple] Elapsed time: 25.171 msecs.
[(1, None, 'node 1'),
 (3, 1, 'node 3'),
 (4, 3, 'node 4'),
 (6, 4, 'node 6'),
 (7, 6, 'node 7'),
 (8, 7, 'node 8'),
 (9, 1, 'node 9')]
[recursive] Elapsed time: 2.727 msecs.
[(1, None, 'node 1', ''),
 (3, 1, 'node 3', '3'),
 (4, 3, 'node 4', '3->4'),
 (6, 4, 'node 6', '3->4->6'),
 (7, 6, 'node 7', '3->4->6->7'),
 (8, 7, 'node 8', '3->4->6->7->8'),
 (9, 1, 'node 9', '9')]


Watching a postgresql log of execute dump_simple function:

LOG: duration: 0.039 ms statement: BEGIN
LOG: duration: 0.558 ms statement: select id, parent_id, name from node where id = 1;
LOG: duration: 0.861 ms statement: select id, parent_id, name from node where parent_id = 1 order by name;
LOG: duration: 0.238 ms statement: select id, parent_id, name from node where parent_id = 3 order by name;
LOG: duration: 0.234 ms statement: select id, parent_id, name from node where parent_id = 4 order by name;
LOG: duration: 0.235 ms statement: select id, parent_id, name from node where parent_id = 6 order by name;
LOG: duration: 0.234 ms statement: select id, parent_id, name from node where parent_id = 7 order by name;
LOG: duration: 0.229 ms statement: select id, parent_id, name from node where parent_id = 8 order by name;
LOG: duration: 0.227 ms statement: select id, parent_id, name from node where parent_id = 9 order by name;
LOG: duration: 0.027 ms statement: COMMIT


And this is a postgresql log of executing a dump_recursive function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
LOG:  duration: 0.019 ms  statement: BEGIN
LOG:  duration: 1.334 ms  statement:
                WITH RECURSIVE nodes_cte AS (
                    SELECT n.id, n.parent_id, n.name, n.id::TEXT AS path
                        FROM node AS n
                        WHERE n.parent_id = 1
                    UNION ALL
                    SELECT c.id, c.parent_id, c.name, (p.path || '->' || c.id::TEXT) AS path
                        FROM nodes_cte AS p, node AS c
                        WHERE c.parent_id = p.id
                )
                (
                    SELECT id, parent_id, name, '' as path FROM node WHERE id = 1
                    UNION ALL
                    SELECT * FROM nodes_cte ORDER BY path ASC
                );
LOG: duration: 0.031 ms statement: COMMIT


We can observe, that using recursive query we can obtain a same results in more less time that using a simple way (execute query for each depth) of obtain child nodes.
Valora esta pregunta
Me gusta: Está pregunta es útil y esta claraNo me gusta: Está pregunta no esta clara o no es útil
0
Responder