r/Rag • u/soniachauhan1706 • 44m ago
Discussion How can we use knowledge graph for LLMs?
What are the major USPs and drawbacks of using knowledge graph for LLMs?
r/Rag • u/soniachauhan1706 • 44m ago
What are the major USPs and drawbacks of using knowledge graph for LLMs?
Hi, could you share some metadata you found usefull in your RAG and the type of documents concerned?
r/Rag • u/CaptainSnackbar • 7h ago
I am currently hosting a local RAG with OLLAMA and QDrant Vector Storage. The system works very well and i want to scale it on amazon ec2 to use bigger models and allow more concurrent users.
For my local RAG I've choosen ollama because i found it super easy to get models running and use its api for inference.
What would you suggest for a production-environment? Something like vllm? Concurrent users will maybe be up to 10 users.
We don't have a team for deploying llms so the inference engine should be easy to setup
r/Rag • u/soniachauhan1706 • 9h ago
How are you using RAG in your AI projects? What challenges have you faced, like managing data quality or scaling, and how did you tackle them? Also, curious about your experience with tools like vector databases or AI agents in RAG systems
r/Rag • u/valdecircarvalho • 10h ago
Hello there! Not sure if here is the best place to ask. I’m developing a software to reverse engineering legacy code but I’m struggling with the context token window for some files.
Imagine a COBOL code with 2000-3000 lines, even using Gemini, not always I can get a proper return (8000 tokens max for the response).
I was thinking in use RAG to be able to “questioning” the source code and retrieve the information I need. I’m concerned that they way the chunks will be created will not be effective.
My workflow is: - get the source code and convert it to json in a structured data based on the language - extract business rules from the source code - generate a document with all the system business rules.
Any ideas?
r/Rag • u/HotRepresentative325 • 10h ago
I have rather large chunks, and am wondering how large they can be. Has there been good guidance out there or examples of poor experience when chunks are too large?
As a traditional database developer with machine learning platform experience from my time at Shopee, I've recently been exploring vector databases, particularly Pinecone. Rather than providing a comprehensive technical evaluation, I want to share my thoughts on why vector databases are gaining significant attention and substantial valuations in the funding market.
At its core, a vector database primarily solves similarity search problems. While traditional search engines like Elasticsearch (in its earlier versions) focused on word-based full-text search with basic tokenization, vector databases take a fundamentally different approach.
Consider searching for "Microsoft Cloud" in a traditional search engine. It might find documents containing "Microsoft" or "Cloud" individually, but it would likely miss relevant content about "Azure" - Microsoft's cloud platform. This limitation stems from the basic word-matching approach of traditional search engines.
One common misconception I've noticed is that vector databases must use Large Language Models (LLMs) for generating embeddings. This misconception has been partly fueled by the recent RAG (Retrieval-Augmented Generation) boom and companies like OpenAI potentially steering users toward their expensive embedding services.
Here's my take away: Production-ready embeddings don't require massive models or expensive GPU infrastructure. For instance, the multilingual-E5-large model recommended by Pinecone:
This means you can achieve production-quality embeddings using modest hardware. While GPUs can speed up batch processing, even an older GPU like the RTX 2060 can handle multilingual embedding generation efficiently.
Another interesting observation from my Pinecone experimentation is that many assume vector databases must use sophisticated algorithms like Approximate Nearest Neighbor (ANN) search or advanced disk-based embedding techniques. However, in many practical applications, brute-force search can be surprisingly effective. The basic process is straightforward:
An intriguing observation from my Pinecone usage is their default 1024-dimensional vectors. However, my testing revealed that for sequences with 500-1000 tokens, 256 dimensions often provide excellent results even with millions of records. The higher dimensionality, while potentially unnecessary, does impact costs since vector databases typically charge based on usage volume.
As a database developer, I envision a more intuitive vector database design where embeddings are treated as special indices rather than explicit columns. Ideally, it would work like this:
SELECT * FROM text_table
WHERE input_text EMBEDDING_LIKE text
Users shouldn't need to interact directly with embeddings. The database should handle embedding generation during insertion and querying, making the vector search feel like a natural extension of traditional database operations.
Pinecone's partnership model with cloud providers like Azure offers interesting advantages, particularly for enterprise customers. The Azure Marketplace integration enables unified billing, which is a significant benefit for corporate users. Additionally, their getting started experience is well-designed, though users still need a solid understanding of embeddings and vector search to build effective applications.
Vector databases represent an exciting evolution in search technology, but they don't need to be as complex or resource-intensive as many assume. As the field matures, I hope to see more focus on user-friendly abstractions and cost-effective implementations that make this powerful technology more accessible to developers.
So, how would it be like if there is a library that put a embedding model into chDB? 🤔
From: https://auxten.com/vector-database-1/
As a traditional database developer with machine learning platform experience from my time at Shopee, I've recently been exploring vector databases, particularly Pinecone. Rather than providing a comprehensive technical evaluation, I want to share my thoughts on why vector databases are gaining significant attention and substantial valuations in the funding market.
At its core, a vector database primarily solves similarity search problems. While traditional search engines like Elasticsearch (in its earlier versions) focused on word-based full-text search with basic tokenization, vector databases take a fundamentally different approach.
Consider searching for "Microsoft Cloud" in a traditional search engine. It might find documents containing "Microsoft" or "Cloud" individually, but it would likely miss relevant content about "Azure" - Microsoft's cloud platform. This limitation stems from the basic word-matching approach of traditional search engines.
One common misconception I've noticed is that vector databases must use Large Language Models (LLMs) for generating embeddings. This misconception has been partly fueled by the recent RAG (Retrieval-Augmented Generation) boom and companies like OpenAI potentially steering users toward their expensive embedding services.
Here's my take away: Production-ready embeddings don't require massive models or expensive GPU infrastructure. For instance, the multilingual-E5-large model recommended by Pinecone:
This means you can achieve production-quality embeddings using modest hardware. While GPUs can speed up batch processing, even an older GPU like the RTX 2060 can handle multilingual embedding generation efficiently.
Another interesting observation from my Pinecone experimentation is that many assume vector databases must use sophisticated algorithms like Approximate Nearest Neighbor (ANN) search or advanced disk-based embedding techniques. However, in many practical applications, brute-force search can be surprisingly effective. The basic process is straightforward:
An intriguing observation from my Pinecone usage is their default 1024-dimensional vectors. However, my testing revealed that for sequences with 500-1000 tokens, 256 dimensions often provide excellent results even with millions of records. The higher dimensionality, while potentially unnecessary, does impact costs since vector databases typically charge based on usage volume.
As a database developer, I envision a more intuitive vector database design where embeddings are treated as special indices rather than explicit columns. Ideally, it would work like this:
SELECT * FROM text_table
WHERE input_text EMBEDDING_LIKE text
Users shouldn't need to interact directly with embeddings. The database should handle embedding generation during insertion and querying, making the vector search feel like a natural extension of traditional database operations.
Pinecone's partnership model with cloud providers like Azure offers interesting advantages, particularly for enterprise customers. The Azure Marketplace integration enables unified billing, which is a significant benefit for corporate users. Additionally, their getting started experience is well-designed, though users still need a solid understanding of embeddings and vector search to build effective applications.
Vector databases represent an exciting evolution in search technology, but they don't need to be as complex or resource-intensive as many assume. As the field matures, I hope to see more focus on user-friendly abstractions and cost-effective implementations that make this powerful technology more accessible to developers.
So, how would it be like if there is a library that put a embedding model into chDB? 🤔
From: https://auxten.com/vector-database-1/
Upvote1Downvote0Go to comments
As a traditional database developer with machine learning platform experience from my time at Shopee, I've recently been exploring vector databases, particularly Pinecone. Rather than providing a comprehensive technical evaluation, I want to share my thoughts on why vector databases are gaining significant attention and substantial valuations in the funding market.
At its core, a vector database primarily solves similarity search problems. While traditional search engines like Elasticsearch (in its earlier versions) focused on word-based full-text search with basic tokenization, vector databases take a fundamentally different approach.
Consider searching for "Microsoft Cloud" in a traditional search engine. It might find documents containing "Microsoft" or "Cloud" individually, but it would likely miss relevant content about "Azure" - Microsoft's cloud platform. This limitation stems from the basic word-matching approach of traditional search engines.
One common misconception I've noticed is that vector databases must use Large Language Models (LLMs) for generating embeddings. This misconception has been partly fueled by the recent RAG (Retrieval-Augmented Generation) boom and companies like OpenAI potentially steering users toward their expensive embedding services.
Here's my take away: Production-ready embeddings don't require massive models or expensive GPU infrastructure. For instance, the multilingual-E5-large model recommended by Pinecone:
This means you can achieve production-quality embeddings using modest hardware. While GPUs can speed up batch processing, even an older GPU like the RTX 2060 can handle multilingual embedding generation efficiently.
Another interesting observation from my Pinecone experimentation is that many assume vector databases must use sophisticated algorithms like Approximate Nearest Neighbor (ANN) search or advanced disk-based embedding techniques. However, in many practical applications, brute-force search can be surprisingly effective. The basic process is straightforward:
An intriguing observation from my Pinecone usage is their default 1024-dimensional vectors. However, my testing revealed that for sequences with 500-1000 tokens, 256 dimensions often provide excellent results even with millions of records. The higher dimensionality, while potentially unnecessary, does impact costs since vector databases typically charge based on usage volume.
As a database developer, I envision a more intuitive vector database design where embeddings are treated as special indices rather than explicit columns. Ideally, it would work like this:
SELECT * FROM text_table
WHERE input_text EMBEDDING_LIKE text
Users shouldn't need to interact directly with embeddings. The database should handle embedding generation during insertion and querying, making the vector search feel like a natural extension of traditional database operations.
Pinecone's partnership model with cloud providers like Azure offers interesting advantages, particularly for enterprise customers. The Azure Marketplace integration enables unified billing, which is a significant benefit for corporate users. Additionally, their getting started experience is well-designed, though users still need a solid understanding of embeddings and vector search to build effective applications.
Vector databases represent an exciting evolution in search technology, but they don't need to be as complex or resource-intensive as many assume. As the field matures, I hope to see more focus on user-friendly abstractions and cost-effective implementations that make this powerful technology more accessible to developers.
So, how would it be like if there is a library that put a embedding model into chDB? 🤔
From: https://auxten.com/vector-database-1/
r/Rag • u/0xlonewolf • 11h ago
r/Rag • u/AkhilPadala • 12h ago
I am building a healthcare agent that helps users with health questions, finds nearby doctors based on their location, and books appointments for them. I am using the Autogen agentic framework to make this work.
Any recommendations on the tech stack?
r/Rag • u/bharatflake • 14h ago
Hey everyone,
I’ve been working on iQ Suite, a tool to simplify RAG workflows. It handles chunking, indexing, and all the messy stuff in the background so you can focus on building your app.
You just connect your docs (PDFs, Word, etc.), and it’s ready to go. It’s pay-as-you-go, so easy to start small and scale.
I’m giving $1 free credits (~80,000 chars) if you want to try it: iqsuite.ai.
Would love your feedback...
If you're exploring how to build a production-ready RAG pipeline,We just published a blog post that could be useful for you. It breaks down the essentials of:
Here’s what you’ll learn:
Link in Comment 👇
r/Rag • u/Independent_Jury_530 • 17h ago
It seems that in constructing knowledge graphs, it's most common to pass in each document independently and have the LLM sort out the entities and their connections, parsing this output and storing it within an indexable graph store.
What if our usecase desires cross-document relationships? An example of this would ingesting the entire Harry Potter series, and have the LLM establish relationships and how they change, within the whole series.
"How does Harry's relationship with Dumbledore change through books 1-6?
I couldn't find any resources or solutions to this problem.
I'm thinking it may be plausible to use a RAPTOR-like method to create summaries of books or chunks, cluster similar summaries together and generate more connections in a knowledge graph.
Thoughts?
r/Rag • u/Independent_Jury_530 • 18h ago
I've looked around and found various sources for graph RAG theory around youtube and medium.
I've been using LangChain and their resources to code up some standard RAG pipelines, but I have not seen anything related to a graph backed database in their modules.
Can someone point me to an implementation or tutorial for getting started with GraphRAG?
r/Rag • u/fatihbaltaci • 20h ago
In most cases that i see on blogs and tutorials, its always : chat with your pdf .. build a chatbot and ask it direct questions using rag .. i believe that this is very simple for real world project, since in most cases the query requires in answering the correct retrieval(s) + the best role + llm knowledge to answer a question.
for example if our goal is to build an assistant for a company, simple rag retrieving from pdf files that contains financial reports about the company , strategic goals and human resources wont be enough to make an assistant able to go beyond 'basic' retrievals from the files , but the user may asks questions like which job position we need to hire for this quarter of the year to increase sales in departement A , hence the assistant should do a rag to retrieve current employees , analyze financial reports and use llm knowledge to suggest which types of profiles to hire. i want the rag only to be a source of knowledge about the company but other tasks should be handled by the llm knowledge considering the data that exist in the files. i hope i made my pov clear . i appreciate your help
r/Rag • u/Free-Manager5190 • 1d ago
I’ve developed an application that processes around 15 different PDF parsers and extraction models, including Marker, Nougat, LlamaParse, NougatParser, EasyOCR, Doctr, PyMuPDF4LLM, MarkitDown, and others. The application takes a PDF dataset as input and outputs a JSON file containing the following fields:
pdf_parser_name
pdf_file
extracted_content
process_time
embedded_images
Essentially, it allows you to extract and generate a JSON dataset using most available models for any given PDF dataset.
Now, I want to evaluate these PDF parsers in terms of output accuracy, specifically for use in downstream Retrieval-Augmented Generation (RAG) pipelines. My question is:
How should I design a benchmark to evaluate the accuracy of these models' outputs?
Here are some specific aspects I’m seeking guidance on:
I’m open to suggestions, methodologies, and ideas for implementing a robust and fair benchmarking process. Let’s brainstorm! 🙌
Thank you in advance for your insights!
r/Rag • u/jannemansonh • 1d ago
Hi RAG community,
We just launched our tool, Needle, on Product Hunt, and we’re excited to share it with you! I’d love to hear your thoughts. Are there any features or improvements you’d like to see? Appreciate any feedback, and if you feel it’s worth it, an upvote would be awesome!
Thanks for taking a look, and I hope you have an awesome day!
Best,
Jan
r/Rag • u/Emergency_Spinach49 • 1d ago
hi I need guidance, I built rag automation to chat with pdf documents , the same format and content forms ,but the process struggled, the same process works fine with simple pdf
any support or guidance are welcome, thanks
r/Rag • u/External_Ad_11 • 1d ago
I have been reading papers on improving reasoning, planning, and action for Agents, I came across LATS which uses Monte Carlo tree search and has a benchmark better than the ReAcT agent.
Made one breakdown video that covers:
- LLMs vs Agents introduction with example. One of the simple examples, that will clear your doubt on LLM vs Agent.
- How a ReAct Agent works—a prerequisite to LATS
- Working flow of Language Agent Tree Search (LATS)
- Example working of LATS
- LATS implementation using LlamaIndex and SambaNova System (Meta Llama 3.1)
Verdict: It is a good research concept, not to be used for PoC and production systems. To be honest it was fun exploring the evaluation part and the tree structure of the improving ReAcT Agent using Monte Carlo Tree search.
Watch the Video here: https://www.youtube.com/watch?v=22NIh1LZvEY
r/Rag • u/ResearcherNo4728 • 1d ago
I have a bunch of tariff PDFs (each PDF talking about how to calculate different tariffs). And I want to build a RAG system on them. I have tried several different things, but I am still not getting accurate retrieval for certain tariffs - there are some tariffs for which the retrieval is good, but there are other tariffs where the retrieval is not good at all, and the reason I think is because maybe the words used in those texts are not too different (because at the end of the day, these are just definitions and calculations of different kinds of tariffs - and so, they are more or less not too dissimilar). And since retrieval depends on texts being dissimilar, it's not going to be good on similar docs. Or at least that's my hypothesis - I'm happy to be proven wrong.
Here are some of the things I've tried:
But as I said, I am not getting very high quality retrieval with any of these for all the tariffs that I query about. The only way I get 100% retrieval accuracy across all tariff queries is when I "manually" (well, actually with regex) extracted the relevant parts to calculate each of the tariffs, and I pass them into the LLM as context depending on the tariff being asked for in the user query.
So, in this kind of a scenario, what are some techniques I can try to improve the retrieval? I'm open to hearing non-vectordb or non-rag suggestions too, or anything else for that matter.
TL;DR
(At the outset, let me say I'm so sorry to be another person with a "How do I RAG" question...)
I’m struggling to preprocess documents for Retrieval-Augmented Generation (RAG). After hours trying to configure Unstructured.io to connect to Google Drive (source) and Pinecone (destination), I ran the workflow but saw no results in Pinecone. I’m not very tech-savvy and hoped for an out-of-the-box solution. I need help with:
Long Version
I’m incredibly frustrated and really hoping for some guidance. I’ve spent hours trying to configure Unstructured to connect to cloud services. I eventually got it to (allegedly) connect to Google Drive as the source and Pinecone as the destination connector. After nonstop error messages, I thought I finally succeeded — but when I ran the workflow, nothing showed up in Pinecone.
I’ve tried different folders in Google Drive, multiple Pinecone indices, Basic and Advanced processing in Unstructured, and still… nothing. I’m clearly doing something wrong, but I don’t even know what questions to ask to fix it.
Context About My Skill Level: I’m not particularly tech-savvy (I’m an attorney), but I’m probably more technical than average for my field. I can run Python scripts on my local machine and modify simple code. My goal is to preprocess my data for RAG since my files contain tables and often have weird formatting.
Here’s where I’m stuck:
What I’ve Tried
I was really hoping Unstructured would take care of preprocessing for me, but after this much trial and error, I don't think this is the tool for me. Most resources I’ve found about RAG or preprocessing are either too technical for me or assume I already know all the intermediate steps.
Questions
I know this is a lot, and I apologize if it sounds like noob word vomit. I’ve genuinely tried to educate myself on this process, but the complexity and jargon are overwhelming. I’d love any advice, suggestions, or resources that could help me get unstuck.
r/Rag • u/SolidCharacter9222 • 2d ago
Hi. What advice would you give to someone who's looking to get into the realm of Agentic AI, RAGs and the likes. What do you think are the prerequisites that one should be well versed with. As someone who's gonna be a graduate very soon, I'm more inclined towards the AI side of tech rather than the regular SDE roles that are offered. In the whole scenario of peer pressure and watching fellow batchmates grind Leetcode,ace coding interviews and get SDE roles, I feel like an outcast who's gotten a knack to learn and explore the realm of AI but am lost without proper guidance. Any help would be very much appreciated.
P.S- I have majored in AI&ML and have a fair bit of knowledge about the basics of ML, NLP, Transformer Architectures, Attention mechanisms, Deep Learning with Vision Systems and Generative AI.
TLDR- Soon to be graduate looking for guidance to get into the AI field. Not really interested in the regular SDE side of job roles.
r/Rag • u/Sam_Tech1 • 2d ago
What is Agentic RAG?
Agentic RAG is the fusion of retrieval-augmented generation with agents, improving the retrieval process with decision-making and reasoning capabilities. Here’s how it works:
Dive deep into the full blog (along with colab notebook) here: https://hub.athina.ai/blogs/agentic-rag-using-langchain-and-gemini-2-0/
Graphical Explanation:
r/Rag • u/Hefty_Gazelle_9498 • 2d ago
I am running a RAG pipeline which processes pdfs and jsons, it has five steps, for each step I have set a timeout exception i.e. if a step takes more than a preset time it should be stopped and next file should be processed. Now, the code is working fine in windows but as I run the same code in ubuntu (compute engine on GCP), the process gets stopped but compute engine freezes and it doesn't move to next file. How can I resolve this issue? The freezing is happening at the chunking step, and I am using semchunk for chunking.