DEV Community

Cover image for Learn How to Navigate Code Structures and Extract Details Using Tree-sitter
Rijul Rajesh
Rijul Rajesh

Posted on

Learn How to Navigate Code Structures and Extract Details Using Tree-sitter

If you’ve ever wished you could query code like data, Tree-sitter might be your new best friend.

Whether you're building a code analysis tool, editor extension, or just exploring syntax trees—this guide will help you understand Tree-sitter Queries from scratch using real Python examples. Let’s dive in!


What Is Tree-sitter?

Tree-sitter is a parser generator and runtime for building fast, accurate parsers for programming languages. It's used in editors like Neovim, Zed, and VS Code extensions for:

  • Syntax highlighting

  • Structural editing

  • Code navigation

  • Language-aware tools

Tree-sitter Queries

A Tree-sitter query is a way to search through this syntax tree to find specific code patterns. Think of it like a super-powered search tool that not only looks for words but understands the structure of the code.

Let’s say we’re analyzing the following Python code using Tree-sitter:


from rest_framework.views import APIView

from rest_framework.response import Response

from rest_framework import generics, serializers

from django.contrib.auth.models import User



class UserView(APIView):

def get(self, request):

user_id = request.GET.get('id')

if user_id:

return Response({"user_id": user_id})

return Response({"error": "User ID missing"}, status=400)



def post(self, request):

data = request.data

username = data.get('username')

return Response({"username": username"})



class UserSerializer(serializers.ModelSerializer):

class Meta:

model = User

fields = ['id', 'username', 'email']



class UserDetailUpdateView(generics.RetrieveUpdateAPIView):

queryset = User.objects.all()

serializer_class = UserSerializer

lookup_field = 'pk'

Enter fullscreen mode Exit fullscreen mode

We will go through various Tree Sitter queries to match parts of this code, so lets begin.

You can practice using tree sitter from the tree sitter playground. We will be using that here for the demo.

Every tree sitter query is composed of nodes. Lets go through some of the Node types first.

Node Types

Every piece of code is represented as a node in the syntax tree.

Some Examples (Python):

  • identifier

  • call

  • string

  • assignment

  • parameters

  • argument_list

  • return_statement

  • attribute

  • if_statement


identifier

An identifier is a name that the programmer gives to things like variables, functions, classes, or parameters.

(identifier) @var-name
Enter fullscreen mode Exit fullscreen mode


string

In Tree-sitter, a (string) node represents a string literal in the source code — i.e., any value enclosed in quotation marks, like "hello" or 'world'.


(string) @string-val

Enter fullscreen mode Exit fullscreen mode


call

In Tree-sitter, a (call) node represents a function call — when a function is being invoked/executed in the code.


(call

function: (identifier) @called-func)

Enter fullscreen mode Exit fullscreen mode


assignment

In Tree-sitter, an (assignment) node represents an assignment statement, where a value is stored in a variable.


(assignment

left: (identifier) @left-var

right: (_) @right-value)

Enter fullscreen mode Exit fullscreen mode


parameters

In Tree-sitter, a (parameters) node represents the list of parameters that a function accepts.

This query captures each (identifier) inside the parameter list and tags it as @param-name.


(parameters

(identifier) @param-name)

Enter fullscreen mode Exit fullscreen mode


argument_list

In Tree-sitter, an (argument_list) node represents the list of arguments passed to a function when it's being called.


(argument_list

(string) @arg)

Enter fullscreen mode Exit fullscreen mode


return_statement


(return_statement) @return-line

Enter fullscreen mode Exit fullscreen mode


attribute

In Tree-sitter, a (return_statement) node represents a return statement in a function — used to send a value back to the caller.


(attribute

object: (identifier) @object

attribute: (identifier) @prop)

Enter fullscreen mode Exit fullscreen mode


if_statement

In Tree-sitter, an (if_statement) node captures the structure of an if block in a language like Python


(if_statement

condition: (_) @cond

consequence: (_) @if-body)

Enter fullscreen mode Exit fullscreen mode

Named vs Anonymous Nodes

Named nodes are meaningful parts of the code defined by the grammar, like function calls, variable names, or statements.

Anonymous nodes are just syntax symbols or punctuation like =, (, ), or commas — they don’t have special names in the grammar.

| Node Type | Description | Examples |

| ------------- | ---------------------- | ---------------------------------------- |

| Named | Grammar-defined | call, identifier, return_statement |

| Anonymous | Just syntax characters | '=', '(', ')', ',' |

Logical Operators in Tree-sitter Queries

Logical operators help you choose exactly what you want when searching code with Tree-sitter.

Think of them like filters — they check if the thing you found matches or doesn't match certain words or patterns.

You write them with a # before the word, and they work on parts of the code you already found.

1. #match?: Regex match


(function_definition

name: (identifier) @func-name

(#match? @func-name "^get"))

Enter fullscreen mode Exit fullscreen mode

This matches any function whose name starts with get, such as get_user, getData, etc.

In the output, you can observe that the matched text is highlighted as blue.

Continue reading the full article here

Top comments (0)