| acreom | acreom is a dev-first knowledge base with tasks running on local mark... | 
| AirbyteLoader | Airbyte is a data integration platform for ELT pipelines from APIs, d... | 
| Airbyte CDK (Deprecated) | Note: AirbyteCDKLoader is deprecated. Please use AirbyteLoader instea... | 
| Airbyte Gong (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... | 
| Airbyte Hubspot (Deprecated) | Note: AirbyteHubspotLoader is deprecated. Please use AirbyteLoader in... | 
| Airbyte JSON (Deprecated) | Note: AirbyteJSONLoader is deprecated. Please use AirbyteLoader inste... | 
| Airbyte Salesforce (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... | 
| Airbyte Shopify (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... | 
| Airbyte Stripe (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... | 
| Airbyte Typeform (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... | 
| Airbyte Zendesk Support (Deprecated) | Note: This connector-specific loader is deprecated. Please use Airbyt... | 
| Airtable | * Get your API key here. | 
| Alibaba Cloud MaxCompute | Alibaba Cloud MaxCompute (previously known as ODPS) is a general purp... | 
| Amazon Textract | Amazon Textract is a machine learning (ML) service that automatically... | 
| Apify Dataset | Apify Dataset is a scalable append-only storage with sequential acces... | 
| ArcGIS | This notebook demonstrates the use of the langchaincommunity.document... | 
| ArxivLoader | arXiv is an open-access archive for 2 million scholarly articles in t... | 
| AssemblyAI Audio Transcripts | The AssemblyAIAudioTranscriptLoader allows to transcribe audio files ... | 
| AstraDB | DataStax Astra DB is a serverless vector-capable database built on Ca... | 
| Async Chromium | Chromium is one of the browsers supported by Playwright, a library us... | 
| AsyncHtml | AsyncHtmlLoader loads raw HTML from a list of URLs concurrently. | 
| Athena | Amazon Athena is a serverless, interactive analytics service built | 
| AWS S3 Directory | Amazon Simple Storage Service (Amazon S3) is an object storage service | 
| AWS S3 File | Amazon Simple Storage Service (Amazon S3) is an object storage servic... | 
| AZLyrics | AZLyrics is a large, legal, every day growing collection of lyrics. | 
| Azure AI Data | Azure AI Studio provides the capability to upload data assets to clou... | 
| Azure Blob Storage Container | Azure Blob Storage is Microsoft's object storage solution for the clo... | 
| Azure Blob Storage File | Azure Files offers fully managed file shares in the cloud that are ac... | 
| Azure AI Document Intelligence | Azure AI Document Intelligence (formerly known as Azure Form Recogniz... | 
| BibTeX | BibTeX is a file format and reference management system commonly used... | 
| BiliBili | Bilibili is one of the most beloved long-form video sites in China. | 
| Blackboard | Blackboard Learn (previously the Blackboard Learning Management Syste... | 
| Blockchain | Overview | 
| Brave Search | Brave Search is a search engine developed by Brave Software. | 
| Browserbase | Browserbase is a developer platform to reliably run, manage, and moni... | 
| Browserless | Browserless is a service that allows you to run headless Chrome insta... | 
| Cassandra | Cassandra is a NoSQL, row-oriented, highly scalable and highly availa... | 
| ChatGPT Data | ChatGPT is an artificial intelligence (AI) chatbot developed by OpenA... | 
| College Confidential | College Confidential gives information on 3,800+ colleges and univers... | 
| Concurrent Loader | Works just like the GenericLoader but concurrently for those who choo... | 
| Confluence | Confluence is a wiki collaboration platform that saves and organizes ... | 
| CoNLL-U | CoNLL-U is revised version of the CoNLL-X format. Annotations are enc... | 
| Copy Paste | This notebook covers how to load a document object from something you... | 
| Couchbase | Couchbase is an award-winning distributed NoSQL cloud database that d... | 
| CSV | A comma-separated values (CSV) file is a delimited text file that use... | 
| Cube Semantic Layer | This notebook demonstrates the process of retrieving Cube's data mode... | 
| Datadog Logs | Datadog is a monitoring and analytics platform for cloud-scale applic... | 
| Dedoc | This sample demonstrates the use of Dedoc in combination with LangCha... | 
| Diffbot | Diffbot is a suite of ML-based products that make it easy to structur... | 
| Discord | Discord is a VoIP and instant messaging social platform. Users have t... | 
| Docugami | This notebook covers how to load documents from Docugami. It provides... | 
| Docusaurus | Docusaurus is a static-site generator which provides out-of-the-box d... | 
| Dropbox | Dropbox is a file hosting service that brings everything-traditional ... | 
| DuckDB | DuckDB is an in-process SQL OLAP database management system. | 
| Email | This notebook shows how to load email (.eml) or Microsoft Outlook (.m... | 
| EPub | EPUB is an e-book file format that uses the ".epub" file extension. T... | 
| Etherscan | Etherscan  is the leading blockchain explorer, search, API and analyt... | 
| EverNote | EverNote is intended for archiving and creating notes in which photos... | 
| Facebook Chat | Messenger) is an American proprietary instant messaging app and platf... | 
| Fauna | Fauna is a Document Database. | 
| Figma | Figma is a collaborative web application for interface design. | 
| FireCrawl | FireCrawl crawls and convert any website into LLM-ready data. It craw... | 
| Geopandas | Geopandas is an open-source project to make working with geospatial d... | 
| Git | Git is a distributed version control system that tracks changes in an... | 
| GitBook | GitBook is a modern documentation platform where teams can document e... | 
| GitHub | This notebooks shows how you can load issues and pull requests (PRs) ... | 
| Glue Catalog | The AWS Glue Data Catalog is a centralized metadata repository that a... | 
| Google AlloyDB for PostgreSQL | AlloyDB is a fully managed relational database service that offers hi... | 
| Google BigQuery | Google BigQuery is a serverless and cost-effective enterprise data wa... | 
| Google Bigtable | Bigtable is a key-value and wide-column store, ideal for fast access ... | 
| Google Cloud SQL for SQL server | Cloud SQL is a fully managed relational database service that offers ... | 
| Google Cloud SQL for MySQL | Cloud SQL is a fully managed relational database service that offers ... | 
| Google Cloud SQL for PostgreSQL | Cloud SQL for PostgreSQL is a fully-managed database service that hel... | 
| Google Cloud Storage Directory | Google Cloud Storage is a managed service for storing unstructured da... | 
| Google Cloud Storage File | Google Cloud Storage is a managed service for storing unstructured da... | 
| Google Firestore in Datastore Mode | Firestore in Datastore Mode is a NoSQL document database built for au... | 
| Google Drive | Google Drive is a file storage and synchronization service developed ... | 
| Google El Carro for Oracle Workloads | Google El Carro Oracle Operator | 
| Google Firestore (Native Mode) | Firestore is a serverless document-oriented database that scales to m... | 
| Google Memorystore for Redis | Google Memorystore for Redis is a fully-managed service that is power... | 
| Google Spanner | Spanner is a highly scalable database that combines unlimited scalabi... | 
| Google Speech-to-Text Audio Transcripts | The GoogleSpeechToTextLoader allows to transcribe audio files with th... | 
| Grobid | GROBID is a machine learning library for extracting, parsing, and re-... | 
| Gutenberg | Project Gutenberg is an online library of free eBooks. | 
| Hacker News | Hacker News (sometimes abbreviated as HN) is a social news website fo... | 
| Huawei OBS Directory | The following code demonstrates how to load objects from the Huawei O... | 
| Huawei OBS File | The following code demonstrates how to load an object from the Huawei... | 
| HuggingFace dataset | The Hugging Face Hub is home to over 5,000 datasets in more than 100 ... | 
| iFixit | iFixit is the largest, open repair community on the web. The site con... | 
| Images | This covers how to load images into a document format that we can use... | 
| Image captions | By default, the loader utilizes the pre-trained Salesforce BLIP image... | 
| IMSDb | IMSDb is the Internet Movie Script Database. | 
| Iugu | Iugu is a Brazilian services and software as a service (SaaS) company... | 
| Joplin | Joplin is an open-source note-taking app. Capture your thoughts and s... | 
| Jupyter Notebook | Jupyter Notebook (formerly IPython Notebook) is a web-based interacti... | 
| Kinetica | This notebooks goes over how to load documents from Kinetica | 
| lakeFS | lakeFS provides scalable version control over the data lake, and uses... | 
| LarkSuite (FeiShu) | LarkSuite is an enterprise collaboration platform developed by ByteDa... | 
| LLM Sherpa | This notebook covers how to use LLM Sherpa to load files of many type... | 
| Mastodon | Mastodon is a federated social media and social networking service. | 
| MediaWiki Dump | MediaWiki XML Dumps contain the content of a wiki (wiki pages with al... | 
| Merge Documents Loader | Merge the documents returned from a set of specified data loaders. | 
| mhtml | MHTML is a is used both for emails but also for archived webpages. MH... | 
| Microsoft Excel | The UnstructuredExcelLoader is used to load Microsoft Excel files. Th... | 
| Microsoft OneDrive | Microsoft OneDrive (formerly SkyDrive) is a file hosting service oper... | 
| Microsoft OneNote | This notebook covers how to load documents from OneNote. | 
| Microsoft PowerPoint | Microsoft PowerPoint is a presentation program by Microsoft. | 
| Microsoft SharePoint | Microsoft SharePoint is a website-based collaboration system that use... | 
| Microsoft Word | Microsoft Word is a word processor developed by Microsoft. | 
| Near Blockchain | Overview | 
| Modern Treasury | Modern Treasury simplifies complex payment operations. It is a unifie... | 
| MongoDB | MongoDB is a NoSQL , document-oriented database that supports JSON-li... | 
| News URL | This covers how to load HTML news articles from a list of URLs into a... | 
| Notion DB 1/2 | Notion is a collaboration platform with modified Markdown support tha... | 
| Notion DB 2/2 | Notion is a collaboration platform with modified Markdown support tha... | 
| Nuclia | Nuclia automatically indexes your unstructured data from any internal... | 
| Obsidian | Obsidian is a powerful and extensible knowledge base | 
| Open Document Format (ODT) | The Open Document Format for Office Applications (ODF), also known as... | 
| Open City Data | Socrata provides an API for city open data. | 
| Oracle Autonomous Database | Oracle autonomous database is a cloud database that uses machine lear... | 
| Oracle AI Vector Search: Document Processing | Oracle AI Vector Search is designed for Artificial Intelligence (AI) ... | 
| Org-mode | A Org Mode document is a document editing, formatting, and organizing... | 
| Pandas DataFrame | This notebook goes over how to load data from a pandas DataFrame. | 
| Pebblo Safe DocumentLoader | Pebblo enables developers to safely load data and promote their Gen A... | 
| Polars DataFrame | This notebook goes over how to load data from a polars DataFrame. | 
| Psychic | This notebook covers how to load documents from Psychic. See here for... | 
| PubMed | PubMedยฎ by The National Center for Biotechnology Information, Nationa... | 
| PyPDFLoader | This notebook provides a quick overview for getting started with PyPD... | 
| PySpark | This notebook goes over how to load data from a PySpark DataFrame. | 
| Quip | Quip is a collaborative productivity software suite for mobile and We... | 
| ReadTheDocs Documentation | Read the Docs is an open-sourced free software documentation hosting ... | 
| Recursive URL | The RecursiveUrlLoader lets you recursively scrape all child links fr... | 
| Reddit | Reddit is an American social news aggregation, content rating, and di... | 
| Roam | ROAM is a note-taking tool for networked thought, designed to create ... | 
| Rockset | Rockset is a real-time analytics database which enables queries on ma... | 
| rspace | This notebook shows how to use the RSpace document loader to import r... | 
| RSS Feeds | This covers how to load HTML news articles from a list of RSS feed UR... | 
| RST | A reStructured Text (RST) file is a file format for textual data used... | 
| scrapfly | ScrapFly | 
| ScrapingAnt | Overview | 
| Sitemap | Extends from the WebBaseLoader, SitemapLoader loads a sitemap from a ... | 
| Slack | Slack is an instant messaging program. | 
| Snowflake | This notebooks goes over how to load documents from Snowflake | 
| Source Code | This notebook covers how to load source code files using a special ap... | 
| Spider | Spider is the fastest and most affordable crawler and scraper that re... | 
| Spreedly | Spreedly is a service that allows you to securely store credit cards ... | 
| Stripe | Stripe is an Irish-American financial services and software as a serv... | 
| Subtitle | The SubRip file format is described on the Matroska multimedia contai... | 
| SurrealDB | SurrealDB is an end-to-end cloud-native database designed for modern ... | 
| Telegram | Telegram Messenger is a globally accessible freemium, cross-platform,... | 
| Tencent COS Directory | Tencent Cloud Object Storage (COS) is a distributed | 
| Tencent COS File | Tencent Cloud Object Storage (COS) is a distributed | 
| TensorFlow Datasets | TensorFlow Datasets is a collection of datasets ready to use, with Te... | 
| TiDB | TiDB Cloud, is a comprehensive Database-as-a-Service (DBaaS) solution... | 
| 2Markdown | 2markdown service transforms website content into structured markdown... | 
| TOML | TOML is a file format for configuration files. It is intended to be e... | 
| Trello | Trello is a web-based project management and collaboration tool that ... | 
| TSV | A tab-separated values (TSV) file is a simple, text-based file format... | 
| Twitter | Twitter is an online social media and social networking service. | 
| Unstructured | This notebook covers how to use Unstructured document loader to load ... | 
| Upstage | This notebook covers how to get started with UpstageLayoutAnalysisLoa... | 
| URL | This example covers how to load HTML documents from a list of URLs in... | 
| Vsdx | A visio file (with extension .vsdx) is associated with Microsoft Visi... | 
| Weather | OpenWeatherMap is an open-source weather service provider | 
| WebBaseLoader | This covers how to use WebBaseLoader to load all text from HTML webpa... | 
| WhatsApp Chat | WhatsApp (also called WhatsApp Messenger) is a freeware, cross-platfo... | 
| Wikipedia | Wikipedia is a multilingual free online encyclopedia written and main... | 
| XML | The UnstructuredXMLLoader is used to load XML files. The loader works... | 
| Xorbits Pandas DataFrame | This notebook goes over how to load data from a xorbits.pandas DataFr... | 
| YouTube audio | Building chat or QA applications on YouTube videos is a topic of high... | 
| YouTube transcripts | YouTube is an online video sharing and social media platform created ... | 
| Yuque | Yuque is a professional cloud-based knowledge base for team collabora... |