How to Scrape Instagram with Python in 2026

A 2026 take on scraping Instagram with Python. What still works, what got harder, the libraries and APIs we reach for, and the legal landmines to avoid.

How to Scrape Instagram with Python in 2026

Updated May 2026. Refreshed with the current state of Instagram scraping: what actually works in 2026, which libraries are still maintained, the legal landmines, and the API-led alternatives that are usually a better answer.

Scraping Instagram in 2026 is harder than it was three years ago. Meta has tightened the unauthenticated endpoints. Login walls now appear on the second profile view. The instaloader library still works, with caveats. Selenium-based scrapers still work, with more caveats. The Graph API works perfectly, only for your own accounts. Most of the “ultimate guide” content online describes a 2021 reality that does not exist any more.

We are Osher Digital, an automation and AI consultancy in Brisbane. We have built data pipelines that pull from Instagram for client marketing teams, social listening tools, and influencer analytics platforms. The advice below is what we actually deploy. It is opinionated, current as of May 2026, and includes the cases where scraping is the wrong answer.

This guide assumes you can read Python and have a clear use case. If you are after general social media data for analytics dashboards, jump to the alternatives section first; you may not need to scrape at all. For broader context on data pipeline work see our data integration and automation consulting work.


Instagram’s terms of service prohibit automated data collection without permission. The hiQ vs. LinkedIn case in the US clarified that scraping public data is not in itself a Computer Fraud and Abuse Act violation, but that ruling does not exempt you from the platform’s terms or from privacy laws.

Practical implications for an Australian operator:

  • Scraping public profile data for internal analysis is a grey area. Doing it at scale, building a commercial product on it, or republishing the data is not.
  • Personal information collected through scraping is still personal information under the Australian Privacy Principles. APP 3 (collection), APP 5 (notification), APP 6 (use) all apply.
  • Meta will block IPs and disable accounts associated with scraping. This is operational risk, not a legal risk, but it ends pipelines.
  • Republishing copyrighted content (photos, videos, captions) without permission is a separate copyright issue.

If you are about to invest engineering effort in scraping, get a one-page legal review first. The cost is small. The cost of a takedown notice or an APP investigation is not.


Approach 1. Use the Graph API where you can

Always check this first. If your use case fits the Graph API, you do not need to scrape at all. The API is stable, fast, and within Meta’s terms.

What you can do with the Graph API in 2026:

  • Pull insights, posts, comments, and DMs for any Instagram account you own (or that has authorised your app)
  • Read public hashtag data through the Hashtag Search endpoint, with quota limits
  • Read public business account information through the Business Discovery endpoint, again with quota
  • Receive webhook events for things like comments and mentions on accounts that authorised your app

What you cannot do:

  • Read posts from accounts that have not authorised your app (other than business discovery and hashtag search, which are limited)
  • Read personal accounts (only Business and Creator accounts are accessible via Graph API)
  • Bulk-pull thousands of public profiles for influencer analysis without each one going through OAuth

A minimal Graph API call to read your own account’s recent media:

import os
import requests

ACCESS_TOKEN = os.environ["IG_ACCESS_TOKEN"]
USER_ID = os.environ["IG_USER_ID"]

url = f"https://graph.instagram.com/v21.0/{USER_ID}/media"
params = {
    "fields": "id,caption,media_type,permalink,timestamp,like_count,comments_count",
    "access_token": ACCESS_TOKEN,
    "limit": 25,
}

response = requests.get(url, params=params, timeout=30)
response.raise_for_status()

for post in response.json()["data"]:
    print(post["timestamp"], post["like_count"], post["permalink"])

Setup involves creating a Meta developer app, connecting it to a Facebook page, linking the page to your Instagram Business account, and going through the access token flow. Tedious the first time. Permanent after that.


Approach 2. instaloader for public profile data

instaloader is the most-used Instagram scraping library in 2026. Maintained, well-documented, and treats Instagram’s structure as something it has to keep up with.

What it does well:

  • Public profile metadata, posts, captions, hashtags, follower/following counts
  • Downloading post images and videos
  • Optional authenticated mode for stories, highlights, and content from accounts that follow you

What you have to handle yourself:

  • Rate limiting. Meta will block your IP if you hit the public endpoints too hard. We pace at 1 request per 4 to 8 seconds for unauthenticated calls and have not been blocked.
  • Login walls on second profile view from the same session. Persistent sessions help; rotating sessions helps more.
  • Periodic breakage when Meta changes its internal API. Pin the version, watch the GitHub issues, expect to upgrade quarterly.

A minimal example:

import time
import instaloader

L = instaloader.Instaloader(
    sleep=True,
    quiet=True,
    download_pictures=False,
    download_videos=False,
    save_metadata=False,
)

profile = instaloader.Profile.from_username(L.context, "natgeo")

print(f"{profile.username}: {profile.followers} followers")

for i, post in enumerate(profile.get_posts()):
    print(post.date_local, post.likes, post.caption[:80] if post.caption else "")
    if i >= 9:
        break
    time.sleep(5)

Notes from production. The sleep=True flag asks instaloader to back off automatically. We add an explicit sleep on top because the library’s internal pacing is conservative but not bulletproof. download_pictures=False keeps the call cheap when you only need metadata.


Approach 3. Third-party APIs and managed scrapers

Several services maintain Instagram scrapers as a paid product. They handle the cat-and-mouse game with Meta so you do not have to. The trade-off is per-call cost and trust in the vendor.

The two we have used most:

Apify. Marketplace of scrapers, including several mature Instagram ones. Pay-per-use pricing, runs on their infrastructure. Good for one-off projects and moderate ongoing volume. AUD pricing as of mid-2026 sits around $0.40 to $1 per 1,000 records depending on the scraper.

Bright Data and Oxylabs. Enterprise-grade scraping APIs with structured output. More expensive, more reliable, and the right answer if your business depends on the data flow. Plans start in the hundreds of USD per month.

Calling Apify’s Instagram scraper from Python:

import os
from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_TOKEN"])

run_input = {
    "directUrls": ["https://www.instagram.com/natgeo/"],
    "resultsType": "posts",
    "resultsLimit": 25,
}

run = client.actor("apify/instagram-scraper").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["timestamp"], item.get("likesCount"), item.get("caption", "")[:80])

Honest verdict. For most client work where the data is for analysis (not republishing), a managed scraper API is the right answer. Engineering time to maintain a custom scraper across Meta’s changes runs to several days per quarter. The vendors absorb that cost in their pricing.


Approach 4. Playwright when you need full control

Playwright (and Selenium) are the heavyweight options. Drive a real browser, log in, scroll, screenshot, click. They work for almost anything Instagram lets a logged-in user see. They are also slow, resource-heavy, and the most likely to trigger Meta’s automation detection.

When this is the right tool. Stories and highlights for accounts you follow. Reels with all comments. UI flows that no library wraps. Anything that requires interacting with the page rather than reading static endpoints.

A minimal Playwright session that opens Instagram and reads a profile’s bio:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    context = browser.new_context(
        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 13_4) "
                   "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15",
        viewport={"width": 1280, "height": 800},
    )
    page = context.new_page()

    page.goto("https://www.instagram.com/natgeo/", wait_until="networkidle")

    bio = page.locator("header section div span").nth(0).text_content()
    print("Bio fragment:", bio)

    browser.close()

Things that break this. Login walls after the first or second profile view; you will need to authenticate. Selectors change weekly, so anything that depends on specific class names is brittle by design. Headless detection (Playwright’s stealth plugins help but do not eliminate it).

Use Playwright when no other approach works, not as the default.


Rate limiting and detection avoidance

Things we do on every Instagram scraping job, regardless of approach:

Pace requests. 1 request every 4 to 8 seconds for public unauthenticated. 1 every 15 to 30 seconds for authenticated calls. Faster than that and you are testing how patient Meta is feeling.

Use real user agents and modern headers. The default Python requests UA is a flag. We use realistic Chrome and Safari UA strings that match a real OS combination.

Persist sessions. instaloader’s session save/load lets you avoid re-authenticating on every run, which avoids login challenges that escalate into account locks. Save the session to disk after first login, load on subsequent runs.

Rotate proxies for production volume. Residential proxy services (Bright Data, Smartproxy, IPRoyal) cost $5 to $15 USD per GB depending on plan. We use them when running at any meaningful volume; otherwise it is just a matter of time before the IP gets blocked.

Use a separate dedicated Instagram account for scraping. Never your personal account. Never an account tied to anything you cannot afford to lose. Treat it as expendable; assume Meta will eventually disable it.


Storage, processing, and downstream use

The scraping is half the job. The other half is making the data useful. Three patterns we see most often.

Raw to Parquet, processed to Postgres. Land the raw API responses as JSON in S3 or local Parquet files. Process into a clean Postgres schema for querying. Cheap, durable, replayable when the schema needs to change.

Pandas for ad-hoc analysis. The 2026 stack is essentially the same as the 2021 stack here. Polars is faster for large data; pandas is faster to write. Pick based on volume.

LLM-based content analysis. The interesting modern angle. Once you have post captions and comments, an LLM can summarise themes, classify sentiment, extract named entities, and group posts by topic far more reliably than the keyword-based approaches that used to be standard. This is where most of the post-2024 value of social data has shifted. For implementation patterns see our piece on building AI agents in Python.


When scraping is the wrong answer

Three cases where we tell clients not to build a scraper.

The data is for influencer marketing analytics. Use Modash, HypeAuditor, Influencity, or Klear. They have official partnerships and provide better data than you will scrape together. AUD pricing $200 to $2,000 per month depending on plan. Cheaper than building.

The data is your own brand or campaign performance. Use the Graph API. It returns better data, more reliably, with no risk of account bans. The OAuth setup is two hours of work; the data is permanent.

The volume is small (under 100 profiles a month). Manual data collection takes one person a few hours. Build a Notion database. The maintenance time of any scraper exceeds the manual time at small volumes.

Conversely, scraping makes sense when the volume is meaningful (thousands of profiles or posts), the use case is internal analysis (not republishing), and the alternatives have been ruled out for cost or coverage reasons.


Ready to build the right pipeline?

If you are sizing up an Instagram or social-data project and want a sanity check on the approach before committing, we run those calls most weeks. Often the answer is “use the API, not a scraper” and we will say so. Get in touch or book a call. For broader automation work, see our AI consulting page.


Frequently Asked Questions

How do I scrape Instagram in Python?

For your own accounts, use Meta’s Graph API. For public profile data on other accounts, instaloader is the maintained library most people reach for. For production-scale work where you need reliability, a managed scraper API like Apify or Bright Data is usually a better answer than building and maintaining your own. Playwright is the fallback for cases nothing else handles.

What is the best Instagram scraper Python library?

instaloader is still the leader in 2026 for public profile and post scraping. It is actively maintained, well-documented, and handles the bulk of common cases. For more complex flows that need full browser control, use Playwright. Both have their place, and many production setups combine them.

It is a grey area. Meta’s terms of service prohibit it, but court rulings (notably hiQ vs. LinkedIn in the US) have established that scraping public data is not automatically a criminal violation. In Australia, the Australian Privacy Principles still apply to any personal information you collect. Get a one-page legal review before building anything at scale, especially if the data flows into a commercial product.

What is a good instaloader alternative?

For Python, the realistic alternatives are Playwright (heavier, more flexible) or paid scraping services like Apify and Bright Data. Most other dedicated Instagram libraries are unmaintained or block-prone. If your use case fits the Graph API, that is not really an instaloader alternative; it is a different category and a better answer when it applies.

How much does Instagram scraping cost in Australia?

Self-hosted with instaloader on a small VPS: $30 to $100 AUD/month including proxy spend, plus engineering time. Managed scraper APIs like Apify run roughly $20 to $500 AUD/month at moderate volumes. Enterprise scraping platforms (Bright Data, Oxylabs) start in the high hundreds. Custom builds with full Playwright instrumentation usually run $15,000 to $40,000 AUD upfront with several days of maintenance per quarter.

How do I scrape Instagram posts specifically?

For your own posts, use the Graph API’s media endpoint. For public posts from other accounts, instaloader’s get_posts() generator is the standard approach. For comments under each post, instaloader supports those when authenticated. Volume above a few hundred posts per day starts to need proxy rotation and probably a managed service.

How do I scrape an Instagram account’s data?

For public accounts: instaloader’s Profile.from_username gives you metadata (followers, following, bio, post count) in one call, and Profile.get_posts gives you the post stream. For private accounts, you need to be following them with an authenticated session, and even then much of what’s available is limited compared to public profiles.

How do I avoid being blocked while scraping Instagram?

Pace requests at 4 to 8 seconds apart for unauthenticated work, 15 to 30 seconds for authenticated. Use realistic browser user agents. Persist sessions across runs to avoid repeated logins. Rotate residential proxies once you cross moderate volume. Use a dedicated, expendable Instagram account for the scraping role rather than your main account. Even with all of this, expect periodic blocks; design the pipeline to retry and resume rather than fall over.

Ready to streamline your operations?

Get in touch for a free consultation to see how we can streamline your operations and increase your productivity.