Skip to content

Flickr Metadata Export Script

This page documents a local Python script used to export Flickr metadata and image URLs to CSV for later reuse in Archives workflows and other descriptive systems.

Purpose

Use this script when you need a spreadsheet of Flickr item metadata after upload, especially when you want stable Flickr URLs for linking in finding aids, digital object records, or local project tracking.

Requirements

  • Python 3
  • The requests package installed in the Python environment used to run the script
  • A Flickr API key
  • The Flickr account user ID for the account being exported
  • A local configuration file named flickr_accounts.ini stored in the same folder as the script

If you are new to running Python scripts, this is a practical baseline setup:

After installation, confirm Python is available in PowerShell:

python --version

If that does not work on Windows, try:

py --version

VS Code is recommended here because it provides a clean text editor, an integrated terminal, and a straightforward way for future staff to inspect or modify the script safely.

Configuration File

Expected format:

[default]
api_key = YOUR_FLICKR_API_KEY_HERE
user_id = YOUR_FLICKR_USER_ID_NSID_HERE

Example with obvious placeholders:

[default]
api_key = abc123replace_me_with_your_real_key
user_id = 12345678@N00

Notes:

  • api_key is the Flickr API application key
  • user_id is the Flickr NSID for the account owner
  • the account alias or screen name is not a substitute for user_id

How the Script Works

The script prompts for one export mode:

  • album: calls flickr.photosets.getPhotos for one Flickr album
  • date: calls flickr.people.getPhotos for a Flickr upload-date range

For each photo, it collects or derives values such as:

  • Flickr photo ID
  • title
  • description
  • upload date
  • taken date
  • owner name and path alias
  • tags
  • view count
  • original format and dimensions
  • direct image URLs
  • Flickr photo page URL

If an original-size URL is not already present in the first API response, the script makes an additional flickr.photos.getSizes request for that photo.

Identifier Extraction

The script attempts to populate image_supplier_image_id from either:

  • recognized machine tags
  • an ID: pattern inside the Flickr description text

This is useful when the local workflow embeds a legacy identifier or filename stem in the description or machine tags before upload.

Running the Script

Example:

python .\flickr_metadata_export.py

Interactive prompts:

  1. Choose album or date.
  2. Enter the album ID or album URL, or enter the minimum and maximum upload dates.
  3. Enter an output CSV path or accept the default filename.

Output Columns

The current script writes these columns:

  • image_supplier_image_id
  • photo_id
  • title
  • description
  • date_upload_unix
  • date_taken
  • owner_nsid
  • owner_name
  • path_alias
  • license
  • media
  • tags
  • views
  • original_format
  • original_width
  • original_height
  • original_url
  • embed_url_original
  • large_url
  • medium_url
  • small_url
  • photo_page_url
  • Save the CSV beside the project files or processing documentation for that Flickr batch.
  • Reuse exported URLs in downstream systems instead of manually copying them from the Flickr interface one at a time.
  • Keep the script and the local INI file out of public repos unless secrets are removed and the setup is rewritten safely.

Script Copy

#!/usr/bin/env python3
import configparser
import csv
import datetime as dt
import re
import sys
import time
from pathlib import Path
from urllib.parse import parse_qs, urlparse

import requests

API_URL = "https://api.flickr.com/services/rest/"
EXTRAS = ",".join([
    "date_upload", "date_taken", "description", "license", "media",
    "machine_tags", "o_dims", "original_format", "owner_name", "path_alias",
    "tags", "url_o", "url_l", "url_m", "url_z", "views"
])

ID_PATTERN = re.compile(r"\bID:\s*([A-Za-z0-9_-]+)\b", re.IGNORECASE)

# Configuration note:
# This script expects a `flickr_accounts.ini` file in the same directory as
# this script. That INI file should include the Flickr API key and the Flickr
# user ID (NSID) for the account being exported.
#
# Important:
# - `api_key` is your Flickr API application key.
# - `user_id` is the Flickr account NSID for the account owner.
# - Do not put the API secret in the `user_id` field.
#
# How to get an API key:
# - You must have a Flickr subscription.
# - In Flickr, go to Account Settings > Sharing and Extending > API Keys.
# - Create a new key there, or view an existing one if you already have it.
#
# How to get the Flickr user ID (NSID):
# - The Flickr screen name or account URL alias is not the same as the user ID.
# - Use the Flickr API method `flickr.urls.lookupUser` with your account URL.
# - Example:
#   https://www.flickr.com/services/rest/?method=flickr.urls.lookupUser&api_key=YOUR_API_KEY&url=https://www.flickr.com/people/valdosta_archives/&format=json&nojsoncallback=1
# - The response will include `user.id`, which is the value to place in
#   `flickr_accounts.ini` as `user_id`.


def flickr_call(api_key, method, pause=0, **params):
    r = requests.get(API_URL, params={
        "method": method,
        "api_key": api_key,
        "format": "json",
        "nojsoncallback": 1,
        **params,
    }, timeout=60)
    r.raise_for_status()
    data = r.json()
    if data.get("stat") != "ok":
        raise RuntimeError(f"{method} failed: {data.get('message')}")
    if pause:
        time.sleep(pause)
    return data


def load_settings():
    script_dir = Path(__file__).resolve().parentalb
    config_path = script_dir / "flickr_accounts.ini"

    if not config_path.exists():
        raise FileNotFoundError(f"Config file not found: {config_path}")

    config = configparser.ConfigParser()
    config.read(config_path, encoding="utf-8")

    if "default" not in config:
        raise KeyError(f"Missing [default] section in {config_path}")

    api_key = config["default"].get("api_key", "").strip()
    user_id = config["default"].get("user_id", "").strip()

    if not api_key:
        raise ValueError("Missing api_key in [default] section.")
    if not user_id:
        raise ValueError("Missing user_id in [default] section.")

    return {
        "api_key": api_key,
        "user_id": user_id,
        "config_path": config_path,
        "script_dir": script_dir,
    }


def parse_album_id(value):
    value = value.strip()
    if not value:
        raise ValueError("Album ID or album URL is required.")

    if value.isdigit():
        return value

    parsed = urlparse(value)
    parts = [p for p in parsed.path.split("/") if p]
    if "albums" in parts:
        i = parts.index("albums")
        if i + 1 < len(parts):
            return parts[i + 1]

    query = parse_qs(parsed.query)
    for key in ("set", "photoset"):
        if query.get(key):
            return query[key][0]

    raise ValueError(f"Could not extract album ID from: {value}")


def parse_date(value):
    return int(
        dt.datetime.strptime(value, "%Y-%m-%d")
        .replace(tzinfo=dt.timezone.utc)
        .timestamp()
    )


def get_original_url(api_key, photo_id):
    data = flickr_call(api_key, "flickr.photos.getSizes", photo_id=photo_id)
    for size in data.get("sizes", {}).get("size", []):
        if size.get("label") == "Original":
            return size.get("source", "")
    return ""


def photo_page_url(photo):
    owner = photo.get("pathalias") or photo.get("owner", "")
    return f"https://www.flickr.com/photos/{owner}/{photo['id']}/"


def normalize_description(photo):
    desc = photo.get("description", "")
    return desc.get("_content", "") if isinstance(desc, dict) else desc


def extract_machine_tag_value(machine_tags):
    for tag in machine_tags.split():
        if ":" not in tag or "=" not in tag:
            continue
        namespace_predicate, value = tag.split("=", 1)
        _, predicate = namespace_predicate.split(":", 1)
        normalized = predicate.lower().replace("-", "").replace("_", "")
        if normalized in {
            "imagesupplierimageid",
            "supplierimageid",
            "imageid",
            "identifier",
        }:
            return value.strip('"')
    return ""


def extract_image_supplier_image_id(photo):
    value = extract_machine_tag_value(photo.get("machine_tags", ""))
    if value:
        return Path(value).stem

    description = normalize_description(photo)
    match = ID_PATTERN.search(description)
    if match:
        return match.group(1)

    return ""


def prompt_mode():
    while True:
        value = input("Export by album or upload date? Enter 'album' or 'date': ").strip().lower()
        if value in ("album", "date"):
            return value
        print("Please enter 'album' or 'date'.")


def prompt_album_id():
    value = input("Enter the Flickr album ID or album URL: ").strip()
    return parse_album_id(value)


def prompt_date_range():
    min_date = input("Enter the minimum upload date (YYYY-MM-DD): ").strip()
    max_date = input("Enter the maximum upload date (YYYY-MM-DD): ").strip()
    parse_date(min_date)
    parse_date(max_date)
    return min_date, max_date


def prompt_output_path(script_dir, mode):
    default_name = f"flickr_export_{mode}.csv"
    value = input(f"Enter output CSV path [{default_name}]: ").strip()
    if not value:
        value = default_name
    output_path = Path(value)
    if not output_path.is_absolute():
        output_path = script_dir / output_path
    return output_path


def collect_album(api_key, user_id, album_id):
    page, rows = 1, []
    while True:
        data = flickr_call(
            api_key,
            "flickr.photosets.getPhotos",
            user_id=user_id,
            photoset_id=album_id,
            extras=EXTRAS,
            per_page=500,
            page=page,
        )
        photoset = data["photoset"]
        rows.extend(photoset.get("photo", []))
        if page >= int(photoset["pages"]):
            return rows
        page += 1


def collect_date_range(api_key, user_id, min_upload_date, max_upload_date):
    page, rows = 1, []
    min_date = parse_date(min_upload_date)
    max_date = parse_date(max_upload_date) + 86399

    while True:
        data = flickr_call(
            api_key,
            "flickr.people.getPhotos",
            user_id=user_id,
            extras=EXTRAS,
            per_page=500,
            page=page,
            min_upload_date=min_date,
            max_upload_date=max_date,
        )
        photos = data["photos"]
        rows.extend(photos.get("photo", []))
        if page >= int(photos["pages"]):
            return rows
        page += 1


def to_row(api_key, photo):
    original_url = photo.get("url_o", "")
    if not original_url:
        original_url = get_original_url(api_key, photo["id"])

    return {
        "image_supplier_image_id": extract_image_supplier_image_id(photo),
        "photo_id": photo.get("id", ""),
        "title": photo.get("title", ""),
        "description": normalize_description(photo),
        "date_upload_unix": photo.get("dateupload", ""),
        "date_taken": photo.get("datetaken", ""),
        "owner_nsid": photo.get("owner", ""),
        "owner_name": photo.get("ownername", ""),
        "path_alias": photo.get("pathalias", ""),
        "license": photo.get("license", ""),
        "media": photo.get("media", ""),
        "tags": photo.get("tags", ""),
        "views": photo.get("views", ""),
        "original_format": photo.get("originalformat", ""),
        "original_width": photo.get("width_o", ""),
        "original_height": photo.get("height_o", ""),
        "original_url": original_url,
        "embed_url_original": original_url,
        "large_url": photo.get("url_l", ""),
        "medium_url": photo.get("url_m", ""),
        "small_url": photo.get("url_z", ""),
        "photo_page_url": photo_page_url(photo),
    }


def write_csv(output_path, rows):
    fieldnames = [
        "image_supplier_image_id", "photo_id", "title", "description",
        "date_upload_unix", "date_taken", "owner_nsid", "owner_name",
        "path_alias", "license", "media", "tags", "views",
        "original_format", "original_width", "original_height",
        "original_url", "embed_url_original", "large_url", "medium_url",
        "small_url", "photo_page_url"
    ]

    output_path.parent.mkdir(parents=True, exist_ok=True)
    with output_path.open("w", newline="", encoding="utf-8-sig") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(rows)


def main():
    settings = load_settings()
    api_key = settings["api_key"]
    user_id = settings["user_id"]
    script_dir = settings["script_dir"]

    print("Flickr Metadata Export")
    print(f"Using config: {settings['config_path']}")
    print("Using profile: [default]")
    print()

    mode = prompt_mode()

    if mode == "album":
        album_id = prompt_album_id()
        output_path = prompt_output_path(script_dir, "album")
        photos = collect_album(api_key, user_id, album_id)
    else:
        min_date, max_date = prompt_date_range()
        output_path = prompt_output_path(script_dir, "date")
        photos = collect_date_range(api_key, user_id, min_date, max_date)

    rows = [to_row(api_key, photo) for photo in photos]
    write_csv(output_path, rows)

    print()
    print(f"Exported {len(rows)} photos to:")
    print(output_path)


if __name__ == "__main__":
    try:
        main()
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        input("Press Enter to close...")
        sys.exit(1)

Extended EXIF Export Script

This variant is viable when you want a richer export for local preservation tracking, website reuse, or downstream metadata cleanup. It adds:

  • EXIF export for each photo
  • direct Flickr page link
  • thumbnail URL
  • medium URL
  • original URL
  • a JSON-packed exif_json column containing all returned EXIF tags

Note

This version makes additional API calls per photo. Large exports will take longer than the basic script.

#!/usr/bin/env python3
import configparser
import csv
import datetime as dt
import json
import re
import sys
import time
from pathlib import Path
from urllib.parse import parse_qs, urlparse

import requests

API_URL = "https://api.flickr.com/services/rest/"
EXTRAS = ",".join([
    "date_upload", "date_taken", "description", "license", "media",
    "machine_tags", "o_dims", "original_format", "owner_name", "path_alias",
    "tags", "url_o", "url_l", "url_m", "url_z", "views"
])

ID_PATTERN = re.compile(r"\bID:\s*([A-Za-z0-9_-]+)\b", re.IGNORECASE)


def flickr_call(api_key, method, pause=0, **params):
    response = requests.get(API_URL, params={
        "method": method,
        "api_key": api_key,
        "format": "json",
        "nojsoncallback": 1,
        **params,
    }, timeout=60)
    response.raise_for_status()
    data = response.json()
    if data.get("stat") != "ok":
        raise RuntimeError(f"{method} failed: {data.get('message')}")
    if pause:
        time.sleep(pause)
    return data


def load_settings():
    script_dir = Path(__file__).resolve().parent
    config_path = script_dir / "flickr_accounts.ini"

    if not config_path.exists():
        raise FileNotFoundError(f"Config file not found: {config_path}")

    config = configparser.ConfigParser()
    config.read(config_path, encoding="utf-8")

    if "default" not in config:
        raise KeyError(f"Missing [default] section in {config_path}")

    api_key = config["default"].get("api_key", "").strip()
    user_id = config["default"].get("user_id", "").strip()

    if not api_key:
        raise ValueError("Missing api_key in [default] section.")
    if not user_id:
        raise ValueError("Missing user_id in [default] section.")

    return {
        "api_key": api_key,
        "user_id": user_id,
        "config_path": config_path,
        "script_dir": script_dir,
    }


def parse_album_id(value):
    value = value.strip()
    if not value:
        raise ValueError("Album ID or album URL is required.")

    if value.isdigit():
        return value

    parsed = urlparse(value)
    parts = [part for part in parsed.path.split("/") if part]
    if "albums" in parts:
        index = parts.index("albums")
        if index + 1 < len(parts):
            return parts[index + 1]

    query = parse_qs(parsed.query)
    for key in ("set", "photoset"):
        if query.get(key):
            return query[key][0]

    raise ValueError(f"Could not extract album ID from: {value}")


def parse_date(value):
    return int(
        dt.datetime.strptime(value, "%Y-%m-%d")
        .replace(tzinfo=dt.timezone.utc)
        .timestamp()
    )


def normalize_description(photo):
    desc = photo.get("description", "")
    return desc.get("_content", "") if isinstance(desc, dict) else desc


def extract_machine_tag_value(machine_tags):
    for tag in machine_tags.split():
        if ":" not in tag or "=" not in tag:
            continue
        namespace_predicate, value = tag.split("=", 1)
        _, predicate = namespace_predicate.split(":", 1)
        normalized = predicate.lower().replace("-", "").replace("_", "")
        if normalized in {
            "imagesupplierimageid",
            "supplierimageid",
            "imageid",
            "identifier",
        }:
            return value.strip('"')
    return ""


def extract_image_supplier_image_id(photo):
    value = extract_machine_tag_value(photo.get("machine_tags", ""))
    if value:
        return Path(value).stem

    description = normalize_description(photo)
    match = ID_PATTERN.search(description)
    if match:
        return match.group(1)

    return ""


def photo_page_url(photo):
    owner = photo.get("pathalias") or photo.get("owner", "")
    return f"https://www.flickr.com/photos/{owner}/{photo['id']}/"


def prompt_mode():
    while True:
        value = input("Export by album or upload date? Enter 'album' or 'date': ").strip().lower()
        if value in ("album", "date"):
            return value
        print("Please enter 'album' or 'date'.")


def prompt_album_id():
    value = input("Enter the Flickr album ID or album URL: ").strip()
    return parse_album_id(value)


def prompt_date_range():
    min_date = input("Enter the minimum upload date (YYYY-MM-DD): ").strip()
    max_date = input("Enter the maximum upload date (YYYY-MM-DD): ").strip()
    parse_date(min_date)
    parse_date(max_date)
    return min_date, max_date


def prompt_output_path(script_dir, mode):
    default_name = f"flickr_export_exif_{mode}.csv"
    value = input(f"Enter output CSV path [{default_name}]: ").strip()
    if not value:
        value = default_name
    output_path = Path(value)
    if not output_path.is_absolute():
        output_path = script_dir / output_path
    return output_path


def collect_album(api_key, user_id, album_id):
    page, rows = 1, []
    while True:
        data = flickr_call(
            api_key,
            "flickr.photosets.getPhotos",
            user_id=user_id,
            photoset_id=album_id,
            extras=EXTRAS,
            per_page=500,
            page=page,
        )
        photoset = data["photoset"]
        rows.extend(photoset.get("photo", []))
        if page >= int(photoset["pages"]):
            return rows
        page += 1


def collect_date_range(api_key, user_id, min_upload_date, max_upload_date):
    page, rows = 1, []
    min_date = parse_date(min_upload_date)
    max_date = parse_date(max_upload_date) + 86399

    while True:
        data = flickr_call(
            api_key,
            "flickr.people.getPhotos",
            user_id=user_id,
            extras=EXTRAS,
            per_page=500,
            page=page,
            min_upload_date=min_date,
            max_upload_date=max_date,
        )
        photos = data["photos"]
        rows.extend(photos.get("photo", []))
        if page >= int(photos["pages"]):
            return rows
        page += 1


def get_sizes_map(api_key, photo_id):
    data = flickr_call(api_key, "flickr.photos.getSizes", photo_id=photo_id)
    sizes = {}
    for size in data.get("sizes", {}).get("size", []):
        label = size.get("label", "").lower()
        sizes[label] = size.get("source", "")
    return sizes


def get_exif_json(api_key, photo_id):
    try:
        data = flickr_call(api_key, "flickr.photos.getExif", photo_id=photo_id)
    except RuntimeError as err:
        message = str(err).lower()
        if "permission denied" in message or "photo not found" in message:
            return ""
        raise

    exif_entries = []
    for entry in data.get("photo", {}).get("exif", []):
        exif_entries.append({
            "tagspace": entry.get("tagspace", ""),
            "tagspaceid": entry.get("tagspaceid", ""),
            "tag": entry.get("tag", ""),
            "label": entry.get("label", ""),
            "raw": entry.get("raw", {}).get("_content", ""),
            "clean": entry.get("clean", {}).get("_content", ""),
        })

    return json.dumps(exif_entries, ensure_ascii=False)


def to_row(api_key, photo):
    sizes = get_sizes_map(api_key, photo["id"])
    original_url = photo.get("url_o", "") or sizes.get("original", "")
    medium_url = photo.get("url_m", "") or sizes.get("medium", "") or sizes.get("medium 640", "")
    thumbnail_url = sizes.get("thumbnail", "") or sizes.get("square", "") or sizes.get("small square", "")

    return {
        "image_supplier_image_id": extract_image_supplier_image_id(photo),
        "photo_id": photo.get("id", ""),
        "title": photo.get("title", ""),
        "description": normalize_description(photo),
        "date_upload_unix": photo.get("dateupload", ""),
        "date_taken": photo.get("datetaken", ""),
        "owner_nsid": photo.get("owner", ""),
        "owner_name": photo.get("ownername", ""),
        "path_alias": photo.get("pathalias", ""),
        "license": photo.get("license", ""),
        "media": photo.get("media", ""),
        "tags": photo.get("tags", ""),
        "views": photo.get("views", ""),
        "original_format": photo.get("originalformat", ""),
        "original_width": photo.get("width_o", ""),
        "original_height": photo.get("height_o", ""),
        "thumbnail_url": thumbnail_url,
        "medium_url": medium_url,
        "original_url": original_url,
        "photo_page_url": photo_page_url(photo),
        "exif_json": get_exif_json(api_key, photo["id"]),
    }


def write_csv(output_path, rows):
    fieldnames = [
        "image_supplier_image_id", "photo_id", "title", "description",
        "date_upload_unix", "date_taken", "owner_nsid", "owner_name",
        "path_alias", "license", "media", "tags", "views",
        "original_format", "original_width", "original_height",
        "thumbnail_url", "medium_url", "original_url", "photo_page_url",
        "exif_json",
    ]

    output_path.parent.mkdir(parents=True, exist_ok=True)
    with output_path.open("w", newline="", encoding="utf-8-sig") as handle:
        writer = csv.DictWriter(handle, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(rows)


def main():
    settings = load_settings()
    api_key = settings["api_key"]
    user_id = settings["user_id"]
    script_dir = settings["script_dir"]

    print("Flickr Metadata + EXIF Export")
    print(f"Using config: {settings['config_path']}")
    print("Using profile: [default]")
    print()

    mode = prompt_mode()

    if mode == "album":
        album_id = prompt_album_id()
        output_path = prompt_output_path(script_dir, "album")
        photos = collect_album(api_key, user_id, album_id)
    else:
        min_date, max_date = prompt_date_range()
        output_path = prompt_output_path(script_dir, "date")
        photos = collect_date_range(api_key, user_id, min_date, max_date)

    rows = [to_row(api_key, photo) for photo in photos]
    write_csv(output_path, rows)

    print()
    print(f"Exported {len(rows)} photos to:")
    print(output_path)


if __name__ == "__main__":
    try:
        main()
    except Exception as err:
        print(f"Error: {err}", file=sys.stderr)
        input("Press Enter to close...")
        sys.exit(1)