HereSay Voice AI Classifications Dataset

Version 2026-Q2-v2 · Released by HereSay · ODC-BY 1.0

10 aggregate classification datasets computed from 54 anonymized voice AI conversations (2,334 turns). Includes OpenAI's Asking/Doing/Expressing trichotomy, voice-specific facets, per-turn sentiment (VADER), question-type and pronoun distributions, conversation-arc transitions, and time-of-day patterns. Raw conversation text is not included.

License: ODC-BY 1.0 — Free to use, free to redistribute, attribution required. When you use this dataset in research, journalism, or commercial work, include:

"Contains information from the HereSay Voice AI Classifications Dataset (2026-Q2-v2) by HereSay (heresay.live), which is made available under the ODC Attribution License (ODC-BY 1.0)."

You need a free HereSay account to download. This is so you see the license at download time.

What's inside the ZIP

01_conversation_stats.csv — workhorse table (turns, char counts, redactions)
02_asking_doing_expressing.csv — OpenAI taxonomy label per conversation
03_voice_facets.csv — mic_test / practice / casual / emotional / info_seeking
04_turn_lengths.csv — char + word counts per turn
05_voice_signals.csv — filler words, mic-check phrases, ASR garble
06_question_types.csv — what / how / why / yes-no question counts
07_pronoun_usage.csv — 1st / 2nd / 3rd person counts per role
08_arc_transitions.csv — opening facet → closing facet (Sankey input)
09_sentiment.csv — VADER compound score per turn
10_time_of_day.csv — UTC hour bucket counts
Plus README, METHODOLOGY, DATASHEET, CODEBOOK, CITATION.cff, LICENSE, CHANGELOG

Read the blog post with headline findings →