Hansard Search

Contents

1. Quick start
2. The layout
3. Searching

Basic search
Boolean operators

4. Add terms (related terms)
5. Reading results — cards
6. The reading pane & day context
7. The timeseries chart
8. Filters
9. Downloading
10. Mobile
11. Methodology

Source data
Coverage
What is a speech turn?
Member enrichment
Validation
Known limitations

1. Quick start

Type a word or phrase, press Enter. Click any result card to read the speech in context. That's it — everything else is optional.

2. The layout

The interface has three main areas:

Left — filter sidebar. On desktop, open it with the FILTERS tab on the left edge. On mobile, tap the floating ⊟ button at bottom-left.
Centre — result cards. Matching speech turns, sorted by date by default. Sort pills above the list let you reorder by speaker, party, or chamber.
Right — reading pane. Opens when you click a card. Shows every speech turn for that sitting day, with the selected turn highlighted.

On mobile, results fill the screen. Tap a card to open the reading pane full-screen. Tap ← Back to return to results.

3. Searching

Basic search

Type a word or phrase into the search bar and press Enter or click Search. Single words do not need quotes. Multi-word phrases must be enclosed in single quotes.

'climate change'

renewables

The search is case-insensitive by default. Tick the Case sensitive checkbox in the filter sidebar if you need exact-case matching.

Boolean operators

Combine terms using boolean logic to build precise queries.

Operator	Symbol	Meaning	Example
AND	`&`	Both terms must appear in the speech	`'climate change' & 'fuel subsidies'`
OR	`\|`	Either term may appear	`'global warming' \| 'climate crisis'`
NOT	`!`	Term must not appear	`'nuclear energy' & !'nuclear waste'`
Grouping	`( )`	Control evaluation order	`('climate change' \| 'global warming') & 'fuel subsidies'`

Tip: Without parentheses, & binds more tightly than | — the same precedence as multiplication vs addition in arithmetic. Use parentheses whenever the logic might be ambiguous.

Example queries

('climate change' | 'global warming') & 'fuel subsidies'

'first nations' & ('land rights' | 'native title') & !'Northern Territory'

'WorkChoices' | 'industrial relations' | 'Fair Work'

4. Add terms (related terms)

After a search, the Search button turns green and a row of suggestion chips appears below the search bar. Each chip is an alternative phrasing, synonym, or abbreviation for one of the terms in your query.

Suggestions are generated in two stages: the most frequently co-occurring terms from your actual results are extracted from the corpus, then passed to an AI model as context — so suggestions are grounded in the language actually used in Australian parliamentary debate, not generic synonyms.

Click a chip to add it to your search. The chip turns green and the expanded query is shown in small text below the search bar. Click it again to remove it. The boolean structure of your original expression is preserved — each alias is inserted inline alongside its source term using an OR group.

Example: If your expression is ('climate change' | 'global warming') & 'fuel subsidies' and you click climate emergency as an alias for climate change, the effective query becomes ('climate change' | 'climate emergency' | 'global warming') & 'fuel subsidies'.

Suggestions update automatically as you type — approximately 700ms after you pause. Only new or changed terms trigger a new fetch; unchanged terms keep their existing suggestions.

5. Reading results — cards

Each result card shows:

Chamber badge (Senate or House), speaker name, and date
Party, state or electorate, and government/opposition status
Type badges where applicable: Question, Written question, Answer, First speech, Interjection
A 280-character preview of the speech body with matched terms highlighted

Sort pills above the list let you reorder results by date, speaker, party, or chamber. Click the same pill twice to reverse the sort direction.

Click any card to open the reading pane on the right (or full-screen on mobile).

6. The reading pane & day context

Clicking a card opens the reading pane, which shows every speech turn for that sitting day — not just the matched speech. This gives you the full debate context around the result you clicked.

The selected turn is highlighted with a blue border.
Matched search terms are highlighted throughout all turns on the page.
A sticky header shows the chamber and matched terms on the left, and the sitting date as a ↗ link to the official APH Hansard on the right.
Use the Previous and Next buttons to step through the turns in sequence.

On mobile, the reading pane opens full-screen. Tap ← Back to return to the results list.

7. The timeseries chart

A bar chart above the results shows matching speeches per year, coloured by chamber (Senate in blue, House in orange). It updates with every search and filter change.

Passive highlight

When you click a result card, the chart quietly marks that year's bar with an accent colour. This is informational only — it does not filter results or lock the year.

Active pin

Clicking a bar in the chart locks the view to that year. While pinned:

Other years' bars are dimmed; the pinned bar stays full opacity.
Sidebar filters are paused and a lock banner appears in the sidebar.
The result list shows only speeches from the pinned year.

To release the pin, click the same bar again, or use the Unpin year & unlock filters button in the sidebar banner.

8. Filters

Open the filter sidebar with the FILTERS tab on desktop, or the floating ⊟ button on mobile. Counts next to each option update in real time as you apply filters. The Reset button clears all filters at once.

Date range

Restrict results to a specific period. Coverage runs from 1998-01-01 to the present. Click + Add range to add multiple non-contiguous date windows.

Parliament number

Filter to a specific parliament (e.g. 47th Parliament). Useful for studying a single term.

Party

Filter to speeches by members of a specific party. Independents have their own group. Party membership is assigned as at the date of the speech, so members who changed parties mid-term are correctly attributed.

Gender

Filter to speeches by female, male, or non-binary members.

State / senators and Electorate / MPs

Filter by state (Senate) or electorate (House). The electorate filter includes region-type quick-filters to narrow by metropolitan, regional, or rural electorates.

Row type and speech flags

Fine-grained filters for the nature of the speech turn:

First speech — inaugural speeches only
Questions — questions put to the government
Answers — ministerial responses
Written questions — questions submitted in writing
No interjections — exclude turns that are purely interjections
Embedded interjections — include turns that contain inline interjections from other members
Main chamber only — exclude Federation Chamber / Main Committee proceedings

Case sensitive

Tick this checkbox to make the search exact-case. Unticked (the default) means matching is case-insensitive.

9. Downloading

Click ↓ CSV (10) to download the first 10 results of your current search as a comma-separated file. The download respects all active filters and the current sort order. Columns include: date, chamber, speaker name, party, state/electorate, government flag, speech type flags, full body text, name ID, unique ID, partyfacts ID, and matched terms.

10. Mobile

On a phone or tablet:

Tap the floating ⊟ button at bottom-left to open the filter drawer.
Tap any result card to open the reading pane full-screen.
Tap ← Back to return to the results list.

11. Methodology

Source data

The corpus is built from the official XML Hansard files published by the Parliament of Australia. These files are the authoritative record of proceedings in both chambers and are made available for download from the APH website.

Two XML schema versions are present across the date range:

v2.1 — used for older files, featuring <talk> / <talker> / <para> structure
v2.2 — used for files from approximately 2021 onwards, featuring HPS span and anchor markup for attribution

Both variants are parsed by separate schema-aware parsers. Body text extraction handles inline attribution spans (including capture of trailing el.tail text that schema-naive parsers discard) to maximise word coverage.

Coverage and scale

555,265

Senate speech turns

630,173

House speech turns

1,482

Senate sitting days

1,722

House sitting days

Coverage runs from the first sitting day of 1998 through to 2026. The corpus does not include committee Hansard, which is held in a separate dataset.

What is a speech turn?

Each row in the corpus represents a single continuous speech turn — an uninterrupted block of text attributed to one speaker. A speech turn ends when the next speaker begins, or when the presiding officer intervenes. A single member may contribute many turns in a single sitting day across different debates.

Speech turns are distinct from speeches in the colloquial sense. A minister answering a question in Question Time may have a turn of a single sentence; a second reading contribution may run to several thousand words. Both are single rows.

Interjections — short interruptions attributed to a member while another member holds the floor — are captured as separate rows and can be filtered out using the No interjections filter.

Member enrichment

Member lookup

Speaker name strings in the XML are resolved to a canonical member record using hand-curated lookup tables built from APH member lists. The Senate lookup covers 666 senators; the House lookup covers 1,280 members. Each record includes name, party, state or electorate, gender, and date-range information for members who served in multiple parliaments.

Party attribution

Party membership is assigned as at the date of the speech. Members who changed parties during their term — such as crossbenchers who moved between minor parties — are correctly attributed using dated party override entries in the lookup. A standardised party abbreviation is used throughout; common aliases (e.g. Liberal National Party of Queensland → LNP) are normalised.

Gender

Gender is sourced from Wikidata via SPARQL query, matched on the member's Wikidata QID. Where Wikidata records are absent or ambiguous, manual review has been applied.

partyfacts ID

Each party abbreviation is mapped to a partyfacts numeric ID where a verified match exists, enabling cross-national comparative research using the partyfacts dataset as a bridge.

Government / opposition flag

Each speech turn is flagged as government (1) or opposition/crossbench (0) based on whether the speaker's party held government in the relevant chamber on the date of the speech.

Validation

Each sitting day file passes seven automated validation tests before inclusion in the corpus:

T1 — Date match: The date in the file header matches the session calendar.
T2 — Consecutive duplicates: No two adjacent turns by the same speaker are identical (flags copy-paste artifacts in the XML).
T3 — Time expired: Turns flagged as "time expired" procedural text are excluded from body content.
T4 — Party/state multiplicity: A single member does not appear with conflicting party or state values on the same day.
T5 — Unknown name IDs: All speaker name strings resolve to a known member record.
T6 — Birth/death dates: No speech is attributed to a member outside their known lifespan.
T7 — Active on date: No speech is attributed to a member outside their period of service in the chamber.

Current validation pass rates: Senate 76.3% of sitting days fully clean; House 89.7%. Failures at the sitting-day level do not mean the day is excluded — they flag specific rows for review. The majority of failures are residual edge cases that do not affect the accuracy of the bulk of the corpus.

Known limitations

Garbled name IDs (Senate, ~78 cases): A small number of Senate XML files contain malformed speaker attribution tags (e.g. XH4, /7E4). These turns cannot be attributed to a member and are retained in the corpus with null member fields.
Residual duplicate turns (~24 Senate, ~25 House): A small number of speech turns appear twice in the source XML. These are genuine artifacts in the official files and cannot be removed without risk of incorrectly deduplicating legitimate repeated quotations.
Body text coverage: The word count of the corpus body text is approximately 0.3–0.7% below an independent reference corpus (Katz & Alexander, Zenodo 2023) for Senate files, due to minor differences in how procedural text and interjection formatting is handled across schema versions.
Committee Hansard: Committee hearings, estimates hearings, and joint committee proceedings are not included in this corpus. They are held in a separate dataset and are not yet searchable through this interface.
Currency: The corpus is updated periodically. Very recent sittings may not yet be included.

Reference: The independent Senate Hansard corpus used for validation benchmarking is: Lindsay Katz & Ryan Alexander (2023). Australian Senate Hansard 1998–2022. Zenodo. https://zenodo.org/records/8121950