Blog

How to actually do research (a guide nobody gives you)

Meta description: A practical guide to doing research well. How to pick problems, build taste, upgrade your inputs, write to think, tighten the feedback loop, and find the people who make your work better.

Nobody teaches you research. You get a desk, a problem someone else picked, and a vague instruction to produce something novel. So most people reverse-engineer the job from what they can see, which is papers and announcements and threads, and what they end up learning is how to look like a researcher rather than how to be one.

The actual skill is a stack of smaller skills. Picking problems. Reading well. Writing to think. Running experiments with a fast loop. Staring at your own failures. Finding the right people. Almost every one of these can be deliberately trained, and almost none of them are taught in any formal setting.

This guide draws on the research traditions of Hamming, Feynman, Shannon, and Darwin, alongside more recent thinking from John Schulman, Andrej Karpathy, Chris Olah, and Andrew Ng. The ideas apply well beyond any single field.

Pick your own problems

Richard Hamming had a habit at Bell Labs that made him unpopular at lunch. He'd sit down next to someone and ask what the important problems in their field were, then ask why they weren't working on them. People changed tables. The question stings because most researchers don't have a good answer. They don't choose problems so much as absorb them: from an advisor, from whatever a major lab announced last quarter, from whatever paper is getting shared this week.

The trouble with an absorbed problem is that you hold the conclusion without the reasoning. You know some famous group cares about a direction, but you don't know why, what they expect to find, or what would make them drop it. When they pivot, you find out a year later. And on a problem that's already fashionable, you're racing a crowd who started earlier and have more resources.

John Schulman's guide to ML research describes two modes. In one, you read the literature and look for things to improve. In the other, you choose an outcome you actually want to exist and reason backwards to the experiments. He argues for the second, and the quiet reason is that it manufactures originality. A goal you actually care about will pull you into territory that no survey paper covers, because the survey was written by someone who didn't have your specific question.

This applies far beyond ML. A historian who picks a question because it matters to them will produce different work from one who picks the next obvious gap in the existing literature. A designer who starts from a problem they've personally experienced will find things that a literature-driven approach misses. The research workflow is the same either way. The starting point changes everything about where you end up.

Build taste

Taste in research, the ability to distinguish important questions from trivial ones, promising approaches from dead ends, gets talked about like a gift. It behaves more like a muscle.

Claude Shannon gave a talk on creative thinking in 1952 where his opening move was to shrink a problem until it's nearly trivial, crack the small version, then reintroduce the difficulty one piece at a time. That single technique will carry you through more walls than any modern productivity advice.

A few practices that build taste over time: predict the result of every experiment before you run it. Cover a paper's results section and guess the numbers from the method alone. Keep a note of which of this month's releases will matter in two years and check your hit rate later. A forecast plus a correction, repeated a few hundred times, is how every good model gets trained, including the one in your head.

Upgrade your inputs

Shared reading lists produce shared ideas. If your information diet is the trending page of a preprint server plus whatever survives the group chat filter, you'll reliably reach the same conclusions as everyone else, at the same time, which makes those conclusions worth approximately nothing.

Old material is criminally underpriced. Most fields rerun their own history on a delay. Rich Sutton needed about a thousand words in 2019 to write "The Bitter Lesson," and it predicts the shape of the AI field better than surveys ten times its length. Shannon's 1952 creative thinking lecture is still more useful than most contemporary advice on the topic. The commonplace book tradition exists precisely because the best ideas from old reading compound over years in ways that this week's trending paper never will.

Range matters as much as depth. Interpretability borrows from neuroscience. Evaluation design is mechanism design wearing a lab coat. A working understanding of how hardware actually moves data tells you which architecture papers are doomed before the benchmarks arrive. The researchers who produce the most interesting work tend to read more broadly than their peers, not just more deeply.

And read the paper itself, not the thread summarising it. The appendix is where the bodies are buried, and the limitations section is usually the most honest paragraph in the document. Annotating as you read forces you to engage with the actual arguments rather than absorbing someone else's interpretation of them.

Write everything down

Paul Graham points out that an idea can feel fully formed right up until you try to put it into words. The page finds gaps your head papers over: the assumption you never tested, the step that doesn't actually follow, the two claims that quietly contradict each other.

Feynman's rule was that the first person you must avoid fooling is yourself, because you're the easiest target. Writing is the cheapest defence against self-deception ever invented. Darwin went further and made it procedural: any fact that cut against his theory got written down on the spot, because he'd caught his own memory deleting inconvenient evidence faster than the convenient kind. Your memory does the same thing to your failed experiments.

Keep a research log: hypothesis, setup, expectation, result, updated belief. Rereading last month's entries is humbling in a way that no peer reviewer can match. The log also compounds: six months of documented thinking is a resource you can search, pattern-match against, and build on. A six-month memory of what you think you remember is unreliable and getting worse.

A notes system with semantic search means your research log stays findable by meaning rather than by the filing conventions you may or may not have followed when you wrote it. "That experiment where I tested the learning rate hypothesis" finds the entry even if you filed it under a different project name.

Put some of it in public

Chris Olah and Shan Carter's essay on research debt makes the case that fields choke on undigested ideas, and that a clear explanation is a contribution in its own right rather than a service job. A lot of people working in interpretability today found the field through readable blog posts, not conference papers.

A body of public writing also functions as the strongest credential you can hold, because it's an unfakeable sample of how you think. A digital garden or a blog that accumulates your thinking over months gives potential collaborators and employers a window into your mind that a CV never can.

Float your half-formed ideas in public too, because being wrong on the timeline is far cheaper than being wrong in print. And the collaborator who tells you an idea is bad before you sink three months into it is worth more than any amount of compute.

Tighten the loop

The stories about the most productive researchers rarely involve a single stroke of genius. They involve volume: more experiments per day, more wrong ideas discarded per week, a model of reality that updates faster than anyone else's. Research speed is mostly the speed at which you discover you're wrong.

Which makes your tooling a first-class research activity. Launching an experiment should be one command. Plotting results should be one more. Every experiment should be reproducible from its config, and comparing two runs should take seconds, not an afternoon of archaeology. Shrink everything until it's cheap, get it right at small scale, then spend the resources.

And retire the idea that engineering is the junior partner. At the frontier of most fields, the two jobs have fused. The researcher who can build the harness, the evaluation, and the data pipeline is the one whose hypotheses actually get tested. Everyone else is waiting in a queue.

The same principle applies to your information infrastructure. If finding a source you read three months ago takes twenty minutes of folder-hunting, your loop is too slow. If your research notes live in five different apps, you're adding friction to every synthesis step. Consolidating your research materials into one searchable library is the information equivalent of writing a good experiment harness: it makes everything downstream faster.

Stare at the outputs

A descending loss curve is not analysis. It's reassurance. Your experiments throw off far more information than you consume: transcripts, failure cases, the strange tail of the distribution. Most of it dies unread in a logs folder.

Andrew Ng has taught the same unglamorous move for over a decade because nothing beats it: pull a hundred failures, read all of them, sort them into piles, attack the biggest pile. It works on models and it works on evaluations, where a benchmark you've never read transcripts from is a benchmark you don't actually understand.

One transcript of something going properly wrong will teach you more than the next decimal point of accuracy ever will. The outputs are talking to you. Most researchers aren't listening because the listening is tedious and the loss curve is right there, going down, looking encouraging.

Find your people

Hamming noticed a pattern in who ended up doing important work. Colleagues with closed office doors got more done in any given year. Colleagues with open doors did the work that mattered, because the interruptions carried information about what the world actually needed.

Generosity compounds in research like nothing else. Replicate a result and share what you find. Release the tool you built for yourself. Explain something hard in plain language. The returns arrive sideways, months later, as the collaboration or the reference or the role you couldn't have applied for.

The collaborator who tells you an idea is bad before you invest three months is worth more than compute. That relationship can't be bought. It can only be earned by being the person who does the same for others.

The long game

Pasteur said luck favours the prepared mind, and Hamming built a career philosophy on top of it: knowledge and productivity compound like interest. The daily edges look trivial in isolation. What you read, what you record, how fast your loop runs, who you argue with. Give them a few years and they produce careers that look like luck from the outside.

Start compounding earlier than feels necessary. Future you already knows this was the cheap part.

Frequently asked questions

How do I find important problems in my field?

Read broadly, especially outside your immediate subfield. Talk to practitioners who use research outputs and ask what's actually hard in their work. Keep a list of questions that surprise you and revisit it monthly. The important problems are usually not the fashionable ones; they're the ones that, if solved, would change what's possible.

How do I get better at reading papers?

Read the abstract and conclusion first to understand the claim before you read the method. Cover the results and predict them from the method. Read the limitations section, which is usually the most honest part. Annotate as you go and write a brief literature note afterwards. The goal is engagement, not consumption.

How much time should I spend reading vs doing?

There's no universal ratio, but if you haven't run an experiment in two weeks, you're probably reading too much. If you're running experiments without being able to articulate what question they answer, you're probably not reading enough. The two should feed each other: reading generates hypotheses, experiments test them, results send you back to reading with sharper questions.

How do I build a network as an early-career researcher?

By producing useful public work. Write clear explanations. Share tools and replication results. Engage with other people's work specifically and substantively rather than generically. The network forms around the work, not around the networking.