Skip to content
Prashant Garg — Library

Library

Code, open data, and the occasional essay.

Retrieving and Generating Data using LLMs

Python code notebooks and slides to use API to access LLMs.

This open-source notebook collection and slides demonstrate two complementary LLM paradigms, retrieval and generation, for turning raw text into structured, research-ready data.

Retrieval notebooks show how to mine large document corpora to extract causal edges, stance labels, demographic attributes and other key fields (e.g., the pipeline powering www.causal.claims).

Generation notebooks start from minimal seed prompts and leverage the model's prior to build production networks, innovation profiles and context-aware keyword dictionaries (see aipnet.io and www.academicexpression.online).

Across both strands you will find hands-on modules for prompt engineering, JSON-schema enforcement, cost-efficient batch calling, embedding-based code mapping (HS6 / JEL) and validation routines such as modal voting and cosine sanity checks.

By the end, users can scale or adapt each workflow — whether analysing messy policy PDFs or constructing supply-chain graphs — while keeping costs predictable and outputs auditable.

Mapping Bob Dylan's mind

I construct a Knowledge Graph from Dylan's lyrics (1962–2012).

Knowledge graph extracted from "Jokerman" — concepts in Dylan's lyrics and the stated relationships between them.

I tracked the evolution of key themes over time — from protest/political to mythic/biblical and movement/travel. The trends align closely with pivotal moments in Dylan's career.

Evolution of key themes (1962–2012): protest/political gives way to mythic/biblical and movement/travel, tracking pivotal career moments.

Next, I mapped transitions between different concept types (like person → abstract) and color-coded them by sentiment. This alluvial diagram uncovers the emotional dynamics woven into Dylan's lyrical connections.

Transitions between concept types (person → abstract, …), colour-coded by sentiment — the emotional dynamics of Dylan's lyrical connections.

Dylan's lyrics shift from literal expression to an increasingly metaphorical style over the decades. This trend highlights his growing reliance on symbolic, emotionally charged language in his 70s.

Literal vs. metaphorical expression by decade: an increasingly symbolic style, strongest in his 70s.

Finally, by measuring the variance in eigenvector centrality, I quantified "dishabituation" — the mix of mainstream vs. peripheral concepts. The mid-career peak reveals Dylan's most eclectic and disruptive phase. Note, this is relative to his dishabituation state in the 60s.

"Dishabituation" — variance in eigenvector centrality, mixing mainstream and peripheral concepts. The mid-career peak is Dylan's most eclectic phase (relative to the 60s).

Data — public goods

Datasets released alongside the papers, free to use with attribution.

Causal Claims in Economics — claim graphs

with Thiemo Fetzer

Evidence-annotated knowledge graph of ~45,000 economics papers (1980–2023): standardized concepts as nodes, stated causal and non-causal relationships as edges.

AIPNET — product-level input–output data

with Thiemo Fetzer, Peter John Lambert, Bennet Feld

AI-generated production network over 5,000 HS products: directed input–output edges plus measures of product importance, with full data download.

Academic Expression — academics on social media

with Thiemo Fetzer

Data on the political stance and expression of 100,000+ academics (2016–2022), linking social media content to academic records.

Library prashantgarg.os
Start
12:00 PM