acreom | acreom is a dev-first knowledge base with tasks running on local mark... |
AirbyteLoader | Airbyte is a data integration platform for ELT pipelines from APIs, d... |
Airbyte CDK (Deprecated) | Note: AirbyteCDKLoader is deprecated. Please use AirbyteLoader instea... |
Airbyte Gong (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
Airbyte Hubspot (Deprecated) | Note: AirbyteHubspotLoader is deprecated. Please use AirbyteLoader in... |
Airbyte JSON (Deprecated) | Note: AirbyteJSONLoader is deprecated. Please use AirbyteLoader inste... |
Airbyte Salesforce (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
Airbyte Shopify (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
Airbyte Stripe (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
Airbyte Typeform (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
Airbyte Zendesk Support (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... |
Airtable | * Get your API key here. |
Alibaba Cloud MaxCompute | Alibaba Cloud MaxCompute (previously known as ODPS) is a general purp... |
Amazon Textract | Amazon Textract is a machine learning (ML) service that automatically... |
Apify Dataset | Apify Dataset is a scalable append-only storage with sequential acces... |
ArcGIS | This notebook demonstrates the use of the langchaincommunity.document... |
ArxivLoader | arXiv is an open-access archive for 2 million scholarly articles in t... |
AssemblyAI Audio Transcripts | The AssemblyAIAudioTranscriptLoader allows to transcribe audio files ... |
AstraDB | DataStax Astra DB is a serverless vector-capable database built on Ca... |
Async Chromium | Chromium is one of the browsers supported by Playwright, a library us... |
AsyncHtml | AsyncHtmlLoader loads raw HTML from a list of URLs concurrently. |
Athena | Amazon Athena is a serverless, interactive analytics service built |
AWS S3 Directory | Amazon Simple Storage Service (Amazon S3) is an object storage service |
AWS S3 File | Amazon Simple Storage Service (Amazon S3) is an object storage servic... |
AZLyrics | AZLyrics is a large, legal, every day growing collection of lyrics. |
Azure AI Data | Azure AI Studio provides the capability to upload data assets to clou... |
Azure Blob Storage Container | Azure Blob Storage is Microsoft's object storage solution for the clo... |
Azure Blob Storage File | Azure Files offers fully managed file shares in the cloud that are ac... |
Azure AI Document Intelligence | Azure AI Document Intelligence (formerly known as Azure Form Recogniz... |
BibTeX | BibTeX is a file format and reference management system commonly used... |
BiliBili | Bilibili is one of the most beloved long-form video sites in China. |
Blackboard | Blackboard Learn (previously the Blackboard Learning Management Syste... |
Blockchain | Overview |
Brave Search | Brave Search is a search engine developed by Brave Software. |
Browserbase | Browserbase is a developer platform to reliably run, manage, and moni... |
Browserless | Browserless is a service that allows you to run headless Chrome insta... |
Cassandra | Cassandra is a NoSQL, row-oriented, highly scalable and highly availa... |
ChatGPT Data | ChatGPT is an artificial intelligence (AI) chatbot developed by OpenA... |
College Confidential | College Confidential gives information on 3,800+ colleges and univers... |
Concurrent Loader | Works just like the GenericLoader but concurrently for those who choo... |
Confluence | Confluence is a wiki collaboration platform that saves and organizes ... |
CoNLL-U | CoNLL-U is revised version of the CoNLL-X format. Annotations are enc... |
Copy Paste | This notebook covers how to load a document object from something you... |
Couchbase | Couchbase is an award-winning distributed NoSQL cloud database that d... |
CSV | A comma-separated values (CSV) file is a delimited text file that use... |
Cube Semantic Layer | This notebook demonstrates the process of retrieving Cube's data mode... |
Datadog Logs | Datadog is a monitoring and analytics platform for cloud-scale applic... |
Dedoc | This sample demonstrates the use of Dedoc in combination with LangCha... |
Diffbot | Diffbot is a suite of ML-based products that make it easy to structur... |
Discord | Discord is a VoIP and instant messaging social platform. Users have t... |
Docugami | This notebook covers how to load documents from Docugami. It provides... |
Docusaurus | Docusaurus is a static-site generator which provides out-of-the-box d... |
Dropbox | Dropbox is a file hosting service that brings everything-traditional ... |
DuckDB | DuckDB is an in-process SQL OLAP database management system. |
Email | This notebook shows how to load email (.eml) or Microsoft Outlook (.m... |
EPub | EPUB is an e-book file format that uses the ".epub" file extension. T... |
Etherscan | Etherscan is the leading blockchain explorer, search, API and analyt... |
EverNote | EverNote is intended for archiving and creating notes in which photos... |
Facebook Chat | Messenger) is an American proprietary instant messaging app and platf... |
Fauna | Fauna is a Document Database. |
Figma | Figma is a collaborative web application for interface design. |
FireCrawl | FireCrawl crawls and convert any website into LLM-ready data. It craw... |
Geopandas | Geopandas is an open-source project to make working with geospatial d... |
Git | Git is a distributed version control system that tracks changes in an... |
GitBook | GitBook is a modern documentation platform where teams can document e... |
GitHub | This notebooks shows how you can load issues and pull requests (PRs) ... |
Glue Catalog | The AWS Glue Data Catalog is a centralized metadata repository that a... |
Google AlloyDB for PostgreSQL | AlloyDB is a fully managed relational database service that offers hi... |
Google BigQuery | Google BigQuery is a serverless and cost-effective enterprise data wa... |
Google Bigtable | Bigtable is a key-value and wide-column store, ideal for fast access ... |
Google Cloud SQL for SQL server | Cloud SQL is a fully managed relational database service that offers ... |
Google Cloud SQL for MySQL | Cloud SQL is a fully managed relational database service that offers ... |
Google Cloud SQL for PostgreSQL | Cloud SQL for PostgreSQL is a fully-managed database service that hel... |
Google Cloud Storage Directory | Google Cloud Storage is a managed service for storing unstructured da... |
Google Cloud Storage File | Google Cloud Storage is a managed service for storing unstructured da... |
Google Firestore in Datastore Mode | Firestore in Datastore Mode is a NoSQL document database built for au... |
Google Drive | Google Drive is a file storage and synchronization service developed ... |
Google El Carro for Oracle Workloads | Google El Carro Oracle Operator |
Google Firestore (Native Mode) | Firestore is a serverless document-oriented database that scales to m... |
Google Memorystore for Redis | Google Memorystore for Redis is a fully-managed service that is power... |
Google Spanner | Spanner is a highly scalable database that combines unlimited scalabi... |
Google Speech-to-Text Audio Transcripts | The GoogleSpeechToTextLoader allows to transcribe audio files with th... |
Grobid | GROBID is a machine learning library for extracting, parsing, and re-... |
Gutenberg | Project Gutenberg is an online library of free eBooks. |
Hacker News | Hacker News (sometimes abbreviated as HN) is a social news website fo... |
Huawei OBS Directory | The following code demonstrates how to load objects from the Huawei O... |
Huawei OBS File | The following code demonstrates how to load an object from the Huawei... |
HuggingFace dataset | The Hugging Face Hub is home to over 5,000 datasets in more than 100 ... |
iFixit | iFixit is the largest, open repair community on the web. The site con... |
Images | This covers how to load images into a document format that we can use... |
Image captions | By default, the loader utilizes the pre-trained Salesforce BLIP image... |
IMSDb | IMSDb is the Internet Movie Script Database. |
Iugu | Iugu is a Brazilian services and software as a service (SaaS) company... |
Joplin | Joplin is an open-source note-taking app. Capture your thoughts and s... |
Jupyter Notebook | Jupyter Notebook (formerly IPython Notebook) is a web-based interacti... |
Kinetica | This notebooks goes over how to load documents from Kinetica |
lakeFS | lakeFS provides scalable version control over the data lake, and uses... |
LarkSuite (FeiShu) | LarkSuite is an enterprise collaboration platform developed by ByteDa... |
LLM Sherpa | This notebook covers how to use LLM Sherpa to load files of many type... |
Mastodon | Mastodon is a federated social media and social networking service. |
MediaWiki Dump | MediaWiki XML Dumps contain the content of a wiki (wiki pages with al... |
Merge Documents Loader | Merge the documents returned from a set of specified data loaders. |
mhtml | MHTML is a is used both for emails but also for archived webpages. MH... |
Microsoft Excel | The UnstructuredExcelLoader is used to load Microsoft Excel files. Th... |
Microsoft OneDrive | Microsoft OneDrive (formerly SkyDrive) is a file hosting service oper... |
Microsoft OneNote | This notebook covers how to load documents from OneNote. |
Microsoft PowerPoint | Microsoft PowerPoint is a presentation program by Microsoft. |
Microsoft SharePoint | Microsoft SharePoint is a website-based collaboration system that use... |
Microsoft Word | Microsoft Word is a word processor developed by Microsoft. |
Near Blockchain | Overview |
Modern Treasury | Modern Treasury simplifies complex payment operations. It is a unifie... |
MongoDB | MongoDB is a NoSQL , document-oriented database that supports JSON-li... |
News URL | This covers how to load HTML news articles from a list of URLs into a... |
Notion DB 1/2 | Notion is a collaboration platform with modified Markdown support tha... |
Notion DB 2/2 | Notion is a collaboration platform with modified Markdown support tha... |
Nuclia | Nuclia automatically indexes your unstructured data from any internal... |
Obsidian | Obsidian is a powerful and extensible knowledge base |
Open Document Format (ODT) | The Open Document Format for Office Applications (ODF), also known as... |
Open City Data | Socrata provides an API for city open data. |
Oracle Autonomous Database | Oracle autonomous database is a cloud database that uses machine lear... |
Oracle AI Vector Search: Document Processing | Oracle AI Vector Search is designed for Artificial Intelligence (AI) ... |
Org-mode | A Org Mode document is a document editing, formatting, and organizing... |
Pandas DataFrame | This notebook goes over how to load data from a pandas DataFrame. |
Pebblo Safe DocumentLoader | Pebblo enables developers to safely load data and promote their Gen A... |
Polars DataFrame | This notebook goes over how to load data from a polars DataFrame. |
Psychic | This notebook covers how to load documents from Psychic. See here for... |
PubMed | PubMedยฎ by The National Center for Biotechnology Information, Nationa... |
PyPDFLoader | This notebook provides a quick overview for getting started with PyPD... |
PySpark | This notebook goes over how to load data from a PySpark DataFrame. |
Quip | Quip is a collaborative productivity software suite for mobile and We... |
ReadTheDocs Documentation | Read the Docs is an open-sourced free software documentation hosting ... |
Recursive URL | The RecursiveUrlLoader lets you recursively scrape all child links fr... |
Reddit | Reddit is an American social news aggregation, content rating, and di... |
Roam | ROAM is a note-taking tool for networked thought, designed to create ... |
Rockset | Rockset is a real-time analytics database which enables queries on ma... |
rspace | This notebook shows how to use the RSpace document loader to import r... |
RSS Feeds | This covers how to load HTML news articles from a list of RSS feed UR... |
RST | A reStructured Text (RST) file is a file format for textual data used... |
scrapfly | ScrapFly |
ScrapingAnt | Overview |
Sitemap | Extends from the WebBaseLoader, SitemapLoader loads a sitemap from a ... |
Slack | Slack is an instant messaging program. |
Snowflake | This notebooks goes over how to load documents from Snowflake |
Source Code | This notebook covers how to load source code files using a special ap... |
Spider | Spider is the fastest and most affordable crawler and scraper that re... |
Spreedly | Spreedly is a service that allows you to securely store credit cards ... |
Stripe | Stripe is an Irish-American financial services and software as a serv... |
Subtitle | The SubRip file format is described on the Matroska multimedia contai... |
SurrealDB | SurrealDB is an end-to-end cloud-native database designed for modern ... |
Telegram | Telegram Messenger is a globally accessible freemium, cross-platform,... |
Tencent COS Directory | Tencent Cloud Object Storage (COS) is a distributed |
Tencent COS File | Tencent Cloud Object Storage (COS) is a distributed |
TensorFlow Datasets | TensorFlow Datasets is a collection of datasets ready to use, with Te... |
TiDB | TiDB Cloud, is a comprehensive Database-as-a-Service (DBaaS) solution... |
2Markdown | 2markdown service transforms website content into structured markdown... |
TOML | TOML is a file format for configuration files. It is intended to be e... |
Trello | Trello is a web-based project management and collaboration tool that ... |
TSV | A tab-separated values (TSV) file is a simple, text-based file format... |
Twitter | Twitter is an online social media and social networking service. |
Unstructured | This notebook covers how to use Unstructured document loader to load ... |
Upstage | This notebook covers how to get started with UpstageLayoutAnalysisLoa... |
URL | This example covers how to load HTML documents from a list of URLs in... |
Vsdx | A visio file (with extension .vsdx) is associated with Microsoft Visi... |
Weather | OpenWeatherMap is an open-source weather service provider |
WebBaseLoader | This covers how to use WebBaseLoader to load all text from HTML webpa... |
WhatsApp Chat | WhatsApp (also called WhatsApp Messenger) is a freeware, cross-platfo... |
Wikipedia | Wikipedia is a multilingual free online encyclopedia written and main... |
XML | The UnstructuredXMLLoader is used to load XML files. The loader works... |
Xorbits Pandas DataFrame | This notebook goes over how to load data from a xorbits.pandas DataFr... |
YouTube audio | Building chat or QA applications on YouTube videos is a topic of high... |
YouTube transcripts | YouTube is an online video sharing and social media platform created ... |
Yuque | Yuque is a professional cloud-based knowledge base for team collabora... |