HANSARD SEARCH v0.1 Australian Federal Parliament · House & Senate · 1998–2026 Help & Methodology
Contents

1. Quick start

Type a word or phrase, press Enter. Click any result card to read the speech in context. That's it — everything else is optional.

2. The layout

The interface has three main areas:

On mobile, results fill the screen. Tap a card to open the reading pane full-screen. Tap ← Back to return to results.

3. Searching

Basic search

Type a word or phrase into the search bar and press Enter or click Search. Single words do not need quotes. Multi-word phrases must be enclosed in single quotes.

'climate change'
renewables

The search is case-insensitive by default. Tick the Case sensitive checkbox in the filter sidebar if you need exact-case matching.

Boolean operators

Combine terms using boolean logic to build precise queries.

OperatorSymbolMeaningExample
AND&Both terms must appear in the speech'climate change' & 'fuel subsidies'
OR|Either term may appear'global warming' | 'climate crisis'
NOT!Term must not appear'nuclear energy' & !'nuclear waste'
Grouping( )Control evaluation order('climate change' | 'global warming') & 'fuel subsidies'
Tip: Without parentheses, & binds more tightly than | — the same precedence as multiplication vs addition in arithmetic. Use parentheses whenever the logic might be ambiguous.

Example queries

('climate change' | 'global warming') & 'fuel subsidies'
'first nations' & ('land rights' | 'native title') & !'Northern Territory'
'WorkChoices' | 'industrial relations' | 'Fair Work'

4. Add terms (related terms)

After a search, the Search button turns green and a row of suggestion chips appears below the search bar. Each chip is an alternative phrasing, synonym, or abbreviation for one of the terms in your query.

Suggestions are generated in two stages: the most frequently co-occurring terms from your actual results are extracted from the corpus, then passed to an AI model as context — so suggestions are grounded in the language actually used in Australian parliamentary debate, not generic synonyms.

Click a chip to add it to your search. The chip turns green and the expanded query is shown in small text below the search bar. Click it again to remove it. The boolean structure of your original expression is preserved — each alias is inserted inline alongside its source term using an OR group.

Example: If your expression is ('climate change' | 'global warming') & 'fuel subsidies' and you click climate emergency as an alias for climate change, the effective query becomes ('climate change' | 'climate emergency' | 'global warming') & 'fuel subsidies'.

Suggestions update automatically as you type — approximately 700ms after you pause. Only new or changed terms trigger a new fetch; unchanged terms keep their existing suggestions.

5. Reading results — cards

Each result card shows:

Sort pills above the list let you reorder results by date, speaker, party, or chamber. Click the same pill twice to reverse the sort direction.

Click any card to open the reading pane on the right (or full-screen on mobile).

6. The reading pane & day context

Clicking a card opens the reading pane, which shows every speech turn for that sitting day — not just the matched speech. This gives you the full debate context around the result you clicked.

On mobile, the reading pane opens full-screen. Tap ← Back to return to the results list.

7. The timeseries chart

A bar chart above the results shows matching speeches per year, coloured by chamber (Senate in blue, House in orange). It updates with every search and filter change.

Passive highlight

When you click a result card, the chart quietly marks that year's bar with an accent colour. This is informational only — it does not filter results or lock the year.

Active pin

Clicking a bar in the chart locks the view to that year. While pinned:

To release the pin, click the same bar again, or use the Unpin year & unlock filters button in the sidebar banner.

8. Filters

Open the filter sidebar with the FILTERS tab on desktop, or the floating button on mobile. Counts next to each option update in real time as you apply filters. The Reset button clears all filters at once.

Date range

Restrict results to a specific period. Coverage runs from 1998-01-01 to the present. Click + Add range to add multiple non-contiguous date windows.

Parliament number

Filter to a specific parliament (e.g. 47th Parliament). Useful for studying a single term.

Party

Filter to speeches by members of a specific party. Independents have their own group. Party membership is assigned as at the date of the speech, so members who changed parties mid-term are correctly attributed.

Gender

Filter to speeches by female, male, or non-binary members.

State / senators and Electorate / MPs

Filter by state (Senate) or electorate (House). The electorate filter includes region-type quick-filters to narrow by metropolitan, regional, or rural electorates.

Row type and speech flags

Fine-grained filters for the nature of the speech turn:

Case sensitive

Tick this checkbox to make the search exact-case. Unticked (the default) means matching is case-insensitive.

9. Downloading

Click ↓ CSV (10) to download the first 10 results of your current search as a comma-separated file. The download respects all active filters and the current sort order. Columns include: date, chamber, speaker name, party, state/electorate, government flag, speech type flags, full body text, name ID, unique ID, partyfacts ID, and matched terms.

10. Mobile

On a phone or tablet:

11. Methodology

Source data

The corpus is built from the official XML Hansard files published by the Parliament of Australia. These files are the authoritative record of proceedings in both chambers and are made available for download from the APH website.

Two XML schema versions are present across the date range:

Both variants are parsed by separate schema-aware parsers. Body text extraction handles inline attribution spans (including capture of trailing el.tail text that schema-naive parsers discard) to maximise word coverage.

Coverage and scale

555,265
Senate speech turns
630,173
House speech turns
1,482
Senate sitting days
1,722
House sitting days

Coverage runs from the first sitting day of 1998 through to 2026. The corpus does not include committee Hansard, which is held in a separate dataset.

What is a speech turn?

Each row in the corpus represents a single continuous speech turn — an uninterrupted block of text attributed to one speaker. A speech turn ends when the next speaker begins, or when the presiding officer intervenes. A single member may contribute many turns in a single sitting day across different debates.

Speech turns are distinct from speeches in the colloquial sense. A minister answering a question in Question Time may have a turn of a single sentence; a second reading contribution may run to several thousand words. Both are single rows.

Interjections — short interruptions attributed to a member while another member holds the floor — are captured as separate rows and can be filtered out using the No interjections filter.

Member enrichment

Member lookup

Speaker name strings in the XML are resolved to a canonical member record using hand-curated lookup tables built from APH member lists. The Senate lookup covers 666 senators; the House lookup covers 1,280 members. Each record includes name, party, state or electorate, gender, and date-range information for members who served in multiple parliaments.

Party attribution

Party membership is assigned as at the date of the speech. Members who changed parties during their term — such as crossbenchers who moved between minor parties — are correctly attributed using dated party override entries in the lookup. A standardised party abbreviation is used throughout; common aliases (e.g. Liberal National Party of QueenslandLNP) are normalised.

Gender

Gender is sourced from Wikidata via SPARQL query, matched on the member's Wikidata QID. Where Wikidata records are absent or ambiguous, manual review has been applied.

partyfacts ID

Each party abbreviation is mapped to a partyfacts numeric ID where a verified match exists, enabling cross-national comparative research using the partyfacts dataset as a bridge.

Government / opposition flag

Each speech turn is flagged as government (1) or opposition/crossbench (0) based on whether the speaker's party held government in the relevant chamber on the date of the speech.

Validation

Each sitting day file passes seven automated validation tests before inclusion in the corpus:

Current validation pass rates: Senate 76.3% of sitting days fully clean; House 89.7%. Failures at the sitting-day level do not mean the day is excluded — they flag specific rows for review. The majority of failures are residual edge cases that do not affect the accuracy of the bulk of the corpus.

Known limitations

Reference: The independent Senate Hansard corpus used for validation benchmarking is: Lindsay Katz & Ryan Alexander (2023). Australian Senate Hansard 1998–2022. Zenodo. https://zenodo.org/records/8121950