Type a word or phrase into the search bar and press Enter or click Search. Single words do not need quotes. Multi-word phrases must be enclosed in single quotes.
The search is case-sensitive by default. Toggle the case-sensitivity checkbox in the sidebar if you want case-insensitive matching. Case-insensitive searches are slower on large result sets.
Combine terms using boolean logic to build precise queries.
| Operator | Symbol | Meaning | Example |
|---|---|---|---|
| AND | & | Both terms must appear in the speech | 'climate change' & 'fuel subsidies' |
| OR | | | Either term may appear | 'global warming' | 'climate crisis' |
| NOT | ! | Term must not appear | 'nuclear energy' & !'nuclear waste' |
| Grouping | ( ) | Control evaluation order | ('climate change' | 'global warming') & 'fuel subsidies' |
& binds more tightly than | — the same precedence as multiplication vs addition in arithmetic. Use parentheses whenever the logic might be ambiguous.
After a search, a Related panel appears below the search bar. It suggests alternative phrasings, synonyms, and abbreviations for each term in your query — one group of suggestions per term.
Suggestions are generated in two stages. First, the most frequently co-occurring terms from your actual search results are extracted from the corpus. These are then passed to an AI model as context, so suggestions are grounded in the language actually used in Australian parliamentary debate rather than generic synonyms.
Click a suggestion chip to add it to your search. The chip turns green and the full expanded query is shown in small text below the search bar. Click it again to remove it. The boolean structure of your original expression is preserved — each alias is inserted inline alongside its source term using an OR group.
('climate change' | 'global warming') & 'fuel subsidies' and you click climate emergency as an alias for climate change, the effective query becomes ('climate change' | 'climate emergency' | 'global warming') & 'fuel subsidies'.
Related terms update automatically as you type — approximately 700ms after you pause. Only new or changed terms trigger a new fetch; unchanged terms keep their existing suggestions.
The sidebar contains metadata filters that narrow results without changing your search expression.
Filter to Senate, House, or both. Defaults to both chambers.
Restrict results to a specific date range. Dates are in YYYY-MM-DD format. Coverage runs from 1998-01-01 to mid-2025.
Filter to speeches by members of a specific party. Party membership is assigned as at the date of the speech, so members who changed parties mid-term are correctly attributed.
Partial-match text filters. Speaker matches on the member's display name; state/electorate matches on their state (Senate) or electorate (House).
Filter to speeches made while the member's party was in government or in opposition. This is determined at the chamber level by the date of the speech.
Filter to speeches by female, male, or non-binary members.
Filter by the nature of the speech turn:
Choose 10, 20, 50, or 100 results per page. Larger pages are slower to render but reduce pagination.
Each result card shows:
Click ▶ expand to read the full speech text. Matched search terms are highlighted in the expanded view. Click again to collapse.
Results can be sorted by date, speaker name, party, chamber, or government/opposition status using the pill buttons above the results list. Click the same pill twice to reverse the sort direction.
Click ↓ CSV (10) to download the first 10 results of your current search as a comma-separated file. The download respects all active filters and sort order. Columns include: date, chamber, speaker name, party, state/electorate, government flag, speech type flags, full body text, name ID, unique ID, partyfacts ID, and matched terms.
The 10-result limit applies to the free version of this tool.
The corpus is built from the official XML Hansard files published by the Parliament of Australia. These files are the authoritative record of proceedings in both chambers and are made available for download from the APH website.
Two XML schema versions are present across the date range:
<talk> / <talker> / <para> structureBoth variants are parsed by separate schema-aware parsers. Body text extraction handles inline attribution spans (including capture of trailing el.tail text that schema-naive parsers discard) to maximise word coverage.
Coverage runs from the first sitting day of 1998 through to mid-2025. The corpus does not include committee Hansard, which is handled separately.
Each row in the corpus represents a single continuous speech turn — an uninterrupted block of text attributed to one speaker. A speech turn ends when the next speaker begins, or when the presiding officer intervenes. A single member may contribute many turns in a single sitting day across different debates.
Speech turns are distinct from speeches in the colloquial sense. A minister answering a question in Question Time may have a turn of a single sentence; a second reading contribution may run to several thousand words. Both are single rows.
Interjections — short interruptions attributed to a member while another member holds the floor — are captured as separate rows and can be filtered out using the No interjections filter.
Speaker name strings in the XML are resolved to a canonical member record using hand-curated lookup tables built from APH member lists. The Senate lookup covers 666 senators; the House lookup covers 1,280 members. Each record includes name, party, state or electorate, gender, and date-range information for members who served in multiple parliaments.
Party membership is assigned as at the date of the speech. Members who changed parties during their term — such as crossbenchers who moved between minor parties — are correctly attributed using dated party override entries in the lookup. A standardised party abbreviation is used throughout; common aliases (e.g. Liberal National Party of Queensland → LNP) are normalised.
Gender is sourced from Wikidata via SPARQL query, matched on the member's Wikidata QID. Where Wikidata records are absent or ambiguous, manual review has been applied.
Each party abbreviation is mapped to a partyfacts numeric ID where a verified match exists, enabling cross-national comparative research using the partyfacts dataset as a bridge.
Each speech turn is flagged as government (1) or opposition/crossbench (0) based on whether the speaker's party held government in the relevant chamber on the date of the speech.
Each sitting day file passes seven automated validation tests before inclusion in the corpus:
Current validation pass rates: Senate 76.3% of sitting days fully clean; House 89.7%. Failures at the sitting-day level do not mean the day is excluded — they flag specific rows for review. The majority of failures are residual edge cases that do not affect the accuracy of the bulk of the corpus.
XH4, /7E4). These turns cannot be attributed to a member and are retained in the corpus with null member fields.