Last Updated: May 2026 · By Iris Crier · HereSay

Voice AI conversations are inverted from text AI conversations. When OpenAI analyzed 1.1 million ChatGPT messages, only 11% were "Expressing" (social, emotional, casual). When we analyzed 54 voice AI conversations from our app at heresay.live using the same taxonomy, 63% were Expressing. Voice is mostly social. Text is mostly utility.

We're releasing the data so anyone can verify and build on it. The dataset is 10 aggregate classification files under ODC-BY 1.0 — free to use, free to redistribute, attribution required, no raw conversation text. Download below.

What's in the dataset

The HereSay Voice AI Classifications Dataset (2026-Q2-v2) covers:

54 anonymized voice conversations (2,334 total turns: 870 user, 1,464 bot)
10 classification dimensions (see methodology below)
Zero raw conversation text. Only aggregate stats and per-conversation/per-turn labels.
License: ODC-BY 1.0 (attribution required)
Version: 2026-Q2-v2 — quarterly releases planned

→ Download the dataset (free account required)

Why this matters

Existing public datasets of AI conversations are almost entirely text-based. WildChat (Allen Institute for AI, ICLR 2024) released 1.04 million conversations — all text. LMSYS Chatbot Arena released conversations with model comparisons — all text. Anthropic's Clio system analyzes Claude conversations — all text. OpenAI's NBER Working Paper 34255 ("How People Are Using ChatGPT", 2025) analyzed 1.1M ChatGPT messages — all text.

No open dataset captures voice AI conversation patterns. This release is a first attempt.

The dataset is small. We're deliberately transparent about this: 54 conversations is roughly 4 orders of magnitude smaller than the major studies. We're not making representative claims about "all voice AI users". We're publishing a slice of real data with full methodology, anonymized for safety, and inviting others to compare or contribute.

The headline finding: voice flips the trichotomy

OpenAI's working paper split ChatGPT messages into three buckets — what they called "Asking, Doing, and Expressing":

Asking = seeking information ("How do I…", "What is…")
Doing = creating something with the model ("Write me…", "Practice with me…")
Expressing = social, emotional, casual ("How are you?", "I'm frustrated about…")

Their text-based distribution:

	OpenAI (1.1M ChatGPT msgs)	HereSay (54 voice convos)
Asking	49%	15%
Doing	40%	22%
Expressing	11%	63%

The shift is striking. In text, ChatGPT mostly answers utility questions. In voice (at least in our small sample), the AI is overwhelmingly used for social chat — casual conversation, practice scenarios, and emotional support.

This isn't entirely surprising. As Anthropic researcher Sherry Turkle has argued for two decades, voice is "the most intimate medium" — humans evolved to use it for relationship work. Typing "What is the capital of France?" feels efficient. Asking it out loud feels weird. The selection effect is real.

But the size of the shift is the story. Voice AI behavior isn't just slightly different from text AI behavior — it appears to be roughly inverted.

What kinds of voice conversations are people having?

Beyond Asking/Doing/Expressing, we built a six-bucket classification specific to voice:

Voice facet	Share of conversations	Example utterance pattern
Casual chat	54%	"How are you?" / "What's up?" / general chitchat
Practice session	22%	"Let's practice a job interview" / "Pretend to be my boss"
Information seeking	15%	Sustained question-asking
Mic test	9%	"Hear me?" / "Testing" / 3-5 turns max
Emotional support	<2%	Sustained venting about work/relationships
Curiosity about AI	0% in this sample	"Are you conscious?" / "How were you made?"

(Counts sum to >100% because some of these overlap; we picked the dominant facet per conversation.)

Casual chat dominates voice AI — far more than any prior text study suggests. The closest text comparable is Pew Research's 2025 ChatGPT survey, which found "social or recreational conversation" as a minority use case (around 18% of ChatGPT users reported "for entertainment or fun" as a primary use). Voice flips that into the majority.

Practice sessions are the second-largest category (22%). People specifically ask the AI to role-play job interviews, difficult conversations, dates, and small-talk scenarios. This is the "Doing" bucket in OpenAI's taxonomy — but voice users seem to use it for interpersonal skill rehearsal rather than productivity tasks.

Voice conversations run longer than text

Median length:

HereSay voice (this dataset): 44 turns
WildChat (text): 2.54 turns (source: arXiv:2405.01470, Table 2)

That's roughly 17× longer. The interpretation isn't that voice users are 17× more engaged — voice utterances are much shorter (median 30.8 characters in our data vs 60.7 for the bot side). It takes more turns to communicate the same amount of information.

A useful number: in our data, the bot wrote about 3.3× more total characters than the user across the average conversation. The bot does most of the talking. Whether this is the AI being overly verbose or filling silence in a voice medium is a question worth more investigation.

Sentiment skews positive

Using VADER (Hutto & Gilbert, ICWSM 2014) — the standard rule-based sentiment analyzer — we scored every turn on a -1 (negative) to +1 (positive) scale:

User turns: mean compound score = +0.06 (slightly positive, mostly neutral)
Bot turns: mean compound score = +0.41 (notably positive)

The bot is much more cheerful than the user. This matches what you'd expect from a friendly assistant persona — the bot is trained to be warm and encouraging. The user, speaking briefly and casually, lands closer to neutral.

The big caveat: VADER was developed on social-media text, not voice transcripts. Short voice utterances confuse it ("Hmm." has no detected polarity). Treat these as directional rather than precise.

When do people talk to the bot?

UTC hour	Conversations
02:00 (10pm ET)	21
03:00 (11pm ET)	12
00:00 (8pm ET)	5
04:00 (12am ET)	3
13:00 (9am ET)	2
Other hours	≤2 each

Voice AI usage in our sample concentrates in late-evening Eastern time. This is a HereSay-specific artifact — our app holds nightly voice meetups at 10pm ET — but the magnitude is striking. The bot sits idle most of the day, then has a small burst during the evening social window.

We do not think this generalizes. We include it here because it's a verifiable property of the dataset, not because it's a finding about voice AI use in general.

How we built it (methodology)

The full methodology is in METHODOLOGY.md inside the dataset zip. The short version:

Source. Production AI bot's persistent transcript log (/var/log/heresay/ai-bot.log). The bot is a friendly general-purpose voice AI integrated into the HereSay app. Users interact via the in-app voice call interface.
Filtering. We kept conversations with ≥2 total turns and ≥1 user turn. Silent calls were dropped.
Anonymization. Microsoft Presidio with spaCy's en_core_web_lg model redacted PII (names → PERSON_1, locations → LOCATION_1, etc., with coreferent tokens within each conversation). We then stripped HereSay-specific bot intro turns and brand mentions.
Classification. Heuristic regex matching on user-side turns, validated by manual spot-check of a 20% sample. The full classifier rules are documented in METHODOLOGY.md inside the dataset archive. We deliberately did NOT use LLM-mediated classification at this stage. At n=54, regex + manual review is more transparent and reproducible.
Output. Ten classification CSVs + a STATS.json snapshot + a Datasheet (Gebru et al., 2018), all bundled in a versioned ZIP.

We follow the Datasheets for Datasets standard for documentation. Microsoft Presidio is the same tool the WildChat team used; the OpenAI NBER paper used Anthropic's Clio procedure on their own data.

Limitations (please read before quoting)

n=54 is tiny. Direction of effects is suggestive, not representative. Anthropic's Clio system uses a minimum cluster size of k=1000 for its public privacy threshold; we're well below that. Do not quote our percentages as "AI users do X%". Quote them as "HereSay's voice AI users in this sample did X%."
Sample bias. Our users opted into HereSay (a real-time voice chat app focused on stranger conversations). They are not representative of the general public or even of general voice AI users.
Selection bias toward the social. People who launch HereSay are already in a social/chat mindset. They land on a "Talk to AI" button as one option among several. The 63% Expressing rate partly reflects that funnel.
Voice ≠ text. These results are not directly comparable to text AI baselines without adjustment for medium. We compare to OpenAI's Asking/Doing/Expressing primarily for taxonomy, not for direct apples-to-apples baseline.
Heuristic classifiers. Our voice-facet labels come from regex pattern matching, not LLM-mediated classification. Inter-annotator agreement (Cohen's κ) is not computed; we plan to add this in a future release using a second human annotator.
ASR transcription quality. User turns are transcribed by automatic speech recognition. Some short utterances ("Eight Eight seven seven, yeah") are mumbled numbers or false starts that may not represent real intent.

How to cite

The dataset is available for download at heresay.live/dataset/voice-ai (free account required).

For attribution in any context (required by license):

"Contains information from the HereSay Voice AI Classifications Dataset (2026-Q2-v2) by HereSay (heresay.live), made available under the ODC Attribution License (ODC-BY 1.0)."

A machine-readable CITATION.cff file is included in the archive.

Future releases

This is the first quarterly release. Planned cadence: roughly every 3 months as more conversations accumulate. Future versions may add:

Politeness markers (Stanford Politeness Corpus methodology)
Dialog acts (DAMSL-lite labels)
LLM-mediated topic categorization (alongside the regex-based labels)
Inter-annotator agreement (Cohen's κ on a 30-conversation subsample)

All schema changes are documented in CHANGELOG.md inside each release.

Get the data

Download the HereSay Voice AI Classifications Dataset (2026-Q2-v2) — free account required so we can show the license terms at download time.

Questions, errata, or want to contribute methodology? Email [email protected].

Community Rules

What People Ask AI Voice Bots (Free Open Dataset)