Skip navigation

Insights from Implementing RAG and Its Connection with Jackdaw

The latest PoliRuralPlus webinar, held on December 15th, explored the practical aspects of implementing Retrieval-Augmented Generation (RAG) systems and their relevance to ongoing PoliRuralPlus tools development. The session was led by Aron Rynkiewicz from the Poznan Supercomputing and Networking Center.

Exploring RAG in Practice

Aron began by addressing two persistent challenges in large language models (LLMs): hallucinations and missing data. He explained how RAG mitigates these issues by connecting AI models with external knowledge sources through vector databases. The RAG pipeline — from embedding user queries to retrieving context and generating refined responses — was illustrated using real examples from agricultural policy retrieval across Europe.

Technical Workflow and Implementation

Participants were guided through the RAG implementation workflow developed at PSNC. The process included:

  • Collecting and chunking over 1,500 policy documents,
  • Using FastAPI, MinIO, and MongoDB for infrastructure,
  • Employing vector databases for contextual search and retrieval, and
  • Applying summarization techniques to manage language model efficiency.

Aron shared practical insights on selecting embedding models, chunk sizes, and database precision, emphasizing the balance between performance, scalability, and data privacy. A comparison between Chroma and Milvus databases illustrated the trade-offs between deployment simplicity and flexibility.

Performance and Data Privacy

The presentation also covered optimization strategies, such as reducing vector precision to minimize storage use while maintaining acceptable retrieval accuracy. Aron underlined that self-hosted vector databases can ensure better data control, a crucial factor when handling sensitive or proprietary datasets.

Key Takeaways

The webinar concluded with a reflection on testing retrieval speed, recall, and real-world performance, reaffirming that effective RAG implementation depends on iterative experimentation and domain-specific fine-tuning.

👉 Follow the PoliRuralPlus Tools Webinar Series through our website and social media channels to explore more on how AI and geospatial intelligence are transforming regional innovation and policy-making.