12/19/2023 0 Comments Tutorial notionExplore Context-aware splitters, which keep the location ("context") of each split in the original Document:ĭocumentTransformer: Object that performs a transformation on a list of Documents.We use the RecursiveCharacterTextSplitter, which will (recursively) split the document using common separators (like new lines) until each chunk is the appropriate size.ĭocumentSplitter: Object that splits a list of Documents into smaller chunks. The overlap helps mitigate the possibility of separating a statement from important context related to it. In this case we'll split our documents into chunks of 1000 characters with 200 characters of overlap between chunks. This should help us retrieve only the most relevant bits of the blog post at run time. So we'll split the Document into chunks for embedding and vector storage. And even for those models that could fit the full post in their context window, empirically models struggle to find the relevant context in very long prompts. This is too long to fit in the context window of many models. Our loaded document is over 42k characters long.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |