Cheminformatics Roadmap, Schemes and Mind Maps of Chemistry

Cheminformatics Roadmap using python

Typology: Schemes and Mind Maps

2023/2024

Uploaded on 07/01/2026

nazar-krasnobryzhev
nazar-krasnobryzhev 🇬🇧

2 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Threetracksrunsimultaneously
Theroadmapisnotpurelysequential.Frommonth6,youbuildtechnicalskills,
presence,andnetworkallatonce.Here'showtheyoverlap.
TR AC KA · T EC HN IC AL
Python→DataScience→RDKit→MLonMolecules.Themaintechnicalprogression.Phases0–4.Referencedfromthe
PythonRoadmapguide.
TR AC KB · P RE SE NC E
GitHubportfolio,LinkedInprofile,writingaboutyourwork,opensource.Startsmonth6,runstooffer.Phases5–6.
TR AC KC · C AR EE R
Networking,jobsearching,applications,interviews,offerevaluation.Startsmonth12,intensifiesatmonth18.Phases7–8.
AP PR OX IM A TE T IM E LI NE
M1
M6
M12
M18
M24
COMPLETECAREERROADMAP·FROMZEROTOHIRED
GetHiredin
Chemoinformatics
Everything—notjustthetechnicalskills.Thefieldorientation,the
portfoliostrategy,theLinkedInbuild,thenetworkingplan,the
applications,theinterviews,andwhattodowhentheofferlands.
// YO U R S TA R TI N G P OI N T
Pythonlevel:Completebeginner
Pace:1–2hrs/day
Goal:Firstcheminformaticsjob
Horizon:18–24monthstofirstapplication·20–26tooffer
8
PHASES
24mo
REALISTICHORIZON
3
PARALLELTRACKS
100+
ACTIONABLESTEPS
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Cheminformatics Roadmap and more Schemes and Mind Maps Chemistry in PDF only on Docsity!

Three tracks run simultaneously

The roadmap is not purely sequential. From month 6, you build technical skills,

presence, and network all at once. Here's how they overlap.

TRACK A · TECHNICAL Python → Data Science → RDKit → ML on Molecules. The main technical progression. Phases 0–4. Referenced from the Python Roadmap guide. TRACK B · PRESENCE GitHub portfolio, LinkedIn profile, writing about your work, open source. Starts month 6, runs to offer. Phases 5–6. TRACK C · CAREER Networking, job searching, applications, interviews, offer evaluation. Starts month 12, intensifies at month 18. Phases 7–8. APPROXIMATE TIMELINE M M M M M C O M P L E T E C A R E E R R O A D M A P · F R O M Z E R O T O H I R E D Get Hired in Chemoinformatics Everything — not just the technical skills. The field orientation, the portfolio strategy, the LinkedIn build, the networking plan, the applications, the interviews, and what to do when the offer lands. / / Y O U R S T A R T I N G P O I N T

Python level:Complete beginner

Pace:1–2 hrs/day

Goal:First cheminformatics job

Horizon:18–24 months to first application · 20–26 to offer

8 PHASES 24mo REALISTIC HORIZON 3 PARALLEL TRACKS 100+ ACTIONABLE STEPS

◆ Phase 00 · Track A

Field Orientation

⏱ 1–2 weeks · do this first WHY THIS COMES BEFORE ANY CODING

Most people start learning Python without knowing what job they're actually aiming for. Cheminformatics has at least

six distinct role types, each with a different skill emphasis. Two weeks of orientation now prevents 12 months of

learning the wrong things.

What cheminformatics actually is The intersection of chemistry, computer science, and data science — applied to molecular problems like drug discovery, materials design, and toxicity prediction. Not a single job; a cluster of related roles. The six role archetypes Computational medicinal chemist, cheminformatics scientist, ML researcher (drug discovery), materials informatics engineer, academic researcher, and software engineer in a biotech. Each needs a different emphasis — understand which you're targeting. The ecosystem of employers Big pharma (AZ, Pfizer, Novartis), AI drug discovery startups (Recursion, Insilico, Exscientia), CROs, academic labs, and materials companies. Each has different hiring timelines, culture, and what "cheminformatics" means to them. Reading a job posting correctly Learn to separate "required" from "nice to have." Most job postings are wishlists. If you meet 60–70% of the requirements, apply. Identify the 2–3 skills that appear in every posting in your target role — those are your priority learning targets. The field's entry points Junior roles are rare but exist. More common entry routes: research assistant at an academic lab, internship at a startup, graduate placement scheme at pharma, or demonstrating skills through public projects and a warm referral. Set a concrete target By the end of this phase, name one role type and two companies you want to work at. Everything you build — your portfolio, your LinkedIn, your network — should point toward that target specifically. 10 THINGS TO DO THIS WEEK ReadPat Walters' "So you want to be a cheminformatician" blog post (search it) Search"cheminformatics" on LinkedIn Jobs — read 15 postings, note what appears in all of them Identify5 companies you'd genuinely want to work at — follow them all on LinkedIn now Readthe Wikipedia article on cheminformatics in full — imperfect but a fast orientation Watchone 30-min YouTube talk on cheminformatics or drug discovery ML (search Valence Labs or CICAG) Joinr/Cheminformatics on Reddit and read the top 20 posts of all time FollowPat Walters, Greg Landrum, Iwatobipen on LinkedIn — they will become your field antennae Look up3 people with "cheminformatics" in their LinkedIn title — study their career path and skills listed

Companion Guide: Python for Chemoinformatics — Deep Learning Guide

The full technical roadmap with core concepts, annotated code, beginner traps, per-phase resources, and active recall quizzes is in the cheminformatics_deep_guide.html file. Open it alongside this document. The five sub-phases are: Phase 1 · Python Core · 10wk Phase 2 · Data Science · 8wk Phase 3 · RDKit & Chem · 12wk Phase 4 · ML on Molecules · 14wk Phase 5 · Portfolio & Jobs · Ongoing KEY TECHNICAL MILESTONES & WHEN TO START PARALLEL TRACKS Month 1:Python installed, first script runs, Exercism track started Month 2–3:pandas fluent, first ChEMBL analysis notebook on GitHub Month 4–5:RDKit installed and working, first fingerprint script written —start Track B now Month 6–7:Lipinski filter and Butina clustering project complete and on GitHub —start Track C networking now Month 10–12:First QSAR model trained with scaffold split, results documented Month 14–16:GNN trained on MoleculeNet, model card written —start applying now Month 18–20:Full technical stack complete, 3+ portfolio projects live ⚠ THE BIGGEST TECHNICAL CAREER TRAPS →Learning in isolation without building public projects. A private notebook nobody can see is worth nothing in a job search. →Chasing the newest framework before mastering RDKit and scikit-learn. Depth beats breadth at the junior level. →Waiting until you feel "ready" to start Track B and C. You will never feel ready. Start building publicly at month 4, not month

P H A S E 0 2 · T R A C K B B E G I N S A T M O N T H 4 ◆ Phase 02 · Track B

GitHub Portfolio

⏱ Starts month 4 · runs to offer ⟳ runs parallel to Phase 1 YOUR GITHUB IS YOUR CV. FULL STOP.

In cheminformatics, hiring managers look at code. A well-documented project solving a real problem gets you an

interview that a degree alone won't. The goal is 3–5 high-quality, reproducible projects by the time you start applying

— not 15 half-finished ones.

Profile setup Professional photo, concise bio, pinned repositories. Link your LinkedIn. Add a short profile README (a special repo named after your username) that describes what you're building and what you know. Project selection strategy Each project should demonstrate one specific skill clearly. Don't build five similar projects. Aim for: one EDA project, one QSAR model, one GNN or transformer, and one tool or pipeline. Diverse, not repetitive.

The README formula Every project: one-sentence problem statement, dataset used, method and why you chose it, results with a number, limitations, and exact reproduce instructions. Assumes the reader knows Python, not your project. Reproducibility as professionalism An environment.yml and a Restart & Run All that works from scratch on a clean machine. Hiring managers test this. Failing reproducibility is a silent rejection you'll never know about. Commit discipline Commit often, with descriptive messages. A repo with 1 commit labelled "final" looks amateur. A repo with 30 commits showing iterative development looks like someone who actually knows how to work. The Featured section Pin your best 2–3 projects in your GitHub profile's Featured section with a one-sentence description. This is the first thing a recruiter sees. Make it immediately clear what each project does and what it demonstrates. YOUR TARGET PORTFOLIO: 4 PROJECTS, EACH PROVING SOMETHING DIFFERENT 01 ChEMBL Bioactivity EDA Proves: pandas fluency, data cleaning, visualisation. Download a target dataset, compute pIC50, classify activity, produce 3 publication-quality plots. Start: month 3 02 Drug-Like Compound Analyser Proves: RDKit fluency, Lipinski filtering, Murcko scaffolds, UMAP visualisation. Take 200 approved drugs, compute descriptors, cluster, visualise chemical space. Start: month 5 03 QSAR Property Predictor Proves: ML pipeline, scaffold splitting, honest evaluation, model card. RF + GNN on BBBP or ESOL dataset. Compare models, report AUROC, write a 200-word model card. Start: month 12 04 A Tool or Pipeline (your choice) Proves: software thinking, reusability. Options: a PAINS alert CLI tool, a SMILES-to-report notebook, a FastAPI property prediction endpoint, or a virtual screening mini-pipeline. Start: month 16 P H A S E 2 R E S O U R C E S GUIDE FREE GitHub Docs — Getting Started Official guide to Git and GitHub basics: repos, commits, branches, pull requests. Do the interactive "Hello World" tutorial before anything else. docs.github.com/en/get-started VIDEO FREE Corey Schafer — Git Tutorial for Beginners 30-minute walkthrough of Git commands you'll actually use. Watch once before starting your first project, then use as a reference when you get confused. youtube.com/@coreyms GUIDE FREE makeareadme.com Interactive tool that generates a professional README template. Use it for every project. A great README is the single highest-leverage

One post per week. Formats that work: "I built X and learned Y" (project update), "I read this paper and here's what matters" (field signal), "Here's a mistake I made and how I fixed it" (authenticity). Consistency beats quality every time. Engagement strategy Comment thoughtfully on 3–5 posts per week from people in your target field. Not "great post!" — add a sentence of genuine insight. In a small field, your name starts appearing in feeds before your application does. Writing about projects After finishing each GitHub project, write a 150-word LinkedIn post: what problem you solved, how you did it, one result with a number, and a GitHub link. This is your most reliable way to get profile views from relevant people. The LinkedIn algorithm Early engagement (within 60 minutes of posting) determines reach. Post in the morning, engage with comments immediately. Native content (posts, articles) outperforms external links. Video and images outperform text-only. P H A S E 3 R E S O U R C E S TOOL FREE LinkedIn Skill Assessments Take the Python and Data Analysis assessments. A verified badge on your profile boosts your ranking in recruiter searches. Takes 15 minutes each. linkedin.com/skill-assessments TOOL FREE Aragon.ai — AI Headshots Free tier generates professional-looking headshots from your photos. Profiles with professional photos get 21× more views. No excuse to skip this. aragon.ai TOOL FREE Loom — Project Walkthroughs Record a 2-minute screen walkthrough of your project and post the link on LinkedIn. Video posts get 5× the reach of text posts. Free tier is plenty. loom.com P H A S E 0 4 · T R A C K C B E G I N S A T M O N T H 6 ◆ Phase 04 · Track C

Strategic Networking

⏱ Starts month 6 · 2–3 hrs/week ⟳ runs parallel to Phase 1 & 2 MOST JOBS IN THIS FIELD ARE NOT POSTED PUBLICLY

A significant proportion of cheminformatics roles — especially junior ones — are filled by referral before or instead of a

public posting. Your network is not optional. It is the primary hiring mechanism. You need to be known before you need

a job, not after.

The right mindset Networking is not asking people for jobs. It's building genuine professional relationships over time by offering value: sharing interesting work, asking thoughtful questions, engaging with their content. The job comes later, as a byproduct.

Online community first Start where it's low-pressure: Reddit, Discord, LinkedIn comments. Contribute to discussions, answer beginner questions once you know the answer, share what you're building. Be a visible, helpful presence before you need anything. Cold outreach (the right way) Message 2–3 people per month whose work you've genuinely read. One paragraph: what you've been learning, what of their work you found interesting, and one specific question. No job ask, no "can we hop on a call." Just curiosity. Conferences and events RDKit User Group Meeting (free, virtual), RSC CICAG workshops, Gordon Research Conferences, and SCI Drug Discovery events. Attend virtually first; in-person later when you have something to present. Open source as networking Contributing to RDKit, DeepChem, or TeachOpenCADD puts you in direct contact with senior practitioners in a context where your work speaks for itself. Even a documentation fix gets your name in the commit history. Academic lab connections Email a PhD student or postdoc doing cheminformatics research you find interesting. Ask about their work. Offer to help with something small. Academic labs are often looking for motivated people — and they become referrers later. MONTHLY NETWORKING HABIT (FROM MONTH 6) Post one project update or insight on LinkedIn Comment thoughtfully on 10–15 posts from people in the field Send 2 cold messages — genuine, specific, no ask Attend one virtual event or webinar Contribute something — even a comment or issue — to an open source project Update your LinkedIn with any new project, skill, or achievement P H A S E 4 R E S O U R C E S COMMUNITY FREE RDKit UGM (User Group Meeting) Annual free virtual meeting. Practitioners from industry and academia present work. Attend to learn the latest, but also to put names to the people whose code you use every day. github.com/rdkit/UGM_ COMMUNITY FREE ChemAI Network Discord Active real-time community for AI in drug discovery. The best place to ask "does anyone know about X" questions and get answers from practitioners within hours. Find invite via r/Cheminformatics EVENTS SOME FREE RSC CICAG & SCI Events Royal Society of Chemistry's Chemical Information and Computer Applications Group runs free workshops. SCI Drug Discovery events are paid but often have student rates or recordings. rsc.org/membership-and-community/connect/cicag

If you haven't heard in 10 business days, one polite follow-up email to the recruiter is appropriate. "I'm still very interested in this role and wanted to confirm my application was received." After that, move on. P H A S E 5 R E S O U R C E S TOOL FREE resumake.io — CV Builder Clean, ATS-friendly CV templates. Generates a plain LaTeX-style PDF that parsers handle correctly. Avoid fancy Canva templates — they break ATS scanning. resumake.io TOOL FREE Jobscan.co — ATS Keyword Checker Paste your CV and a job description; it shows which keywords are missing. Free tier gives 5 scans/month. Use it for every application at a company you really want. jobscan.co SALARY FREE Glassdoor + Levels.fyi Research salary ranges before interviews. Glassdoor for pharma and CROs; Levels.fyi for biotech startups. Know your number before any compensation conversation. glassdoor.com · levels.fyi ⚠ JOB SEARCH TRAPS →Applying to 100 jobs with the same generic CV. Eight tailored applications outperform 100 generic ones in this field — it's small enough that people talk. →Only applying to perfect-fit roles. Apply if you meet 60–70% of requirements. The job description is a wish list, not a legal contract. →Disappearing into applications without maintaining your network and public presence. Keep posting, keep engaging — the next job often comes from someone who already knows your work. P H A S E 0 6 · I N T E R V I E W S ◆ Phase 06 · Track C

Interview Mastery

⏱ Prepare 2–3 weeks before first application INTERVIEW PREP IS A SKILL YOU CAN PRACTISE — START BEFORE YOU NEED IT

The worst time to learn how to interview is during an interview. Spend 2–3 weeks before your first application learning

the question types, preparing your project walkthroughs, and doing at least one mock technical screen. The field is

small — a poor interview leaves an impression.

THE FOUR INTERVIEW FORMATS YOU'LL FACE 1 · Recruiter Screen (30 min) Motivation, background, salary expectations. No technical content. Be clear about why this company and this role specifically. Have your elevator pitch (90 seconds) memorised. 2 · Technical Screen (45–60 min)

Conceptual chemistry/ML questions, live code or whiteboard. They want to see how you think, not just what you know. Talk through your reasoning out loud — silence is the enemy. 3 · Take-Home Challenge (3–7 days) A dataset, a problem, a notebook. They assess code quality, analysis depth, and communication. Treat it like a portfolio project — README, reproducibility, and written interpretation of results. 4 · Final Panel (half day) Project presentation, deep technical discussion, behavioural questions, meet the team. Prepare a 10-minute presentation of your best project with clear slides. Know every line of your code. QUESTIONS YOU WILL ALMOST CERTAINLY BE ASKED TECHNICAL · CORE

"Explain what a Morgan fingerprint is and why it's useful for machine learning on molecules."

TECHNICAL · CORE

"Why do we use scaffold splitting instead of random splitting? What problem does it solve?"

TECHNICAL · CORE

"Walk me through your QSAR project. Why did you choose that model? What would you do differently?"

TECHNICAL · DEEPER

"What are the limitations of molecular fingerprints? When would a graph neural network be preferable?"

TECHNICAL · DEEPER

"You have a highly imbalanced dataset (95% inactive, 5% active). How do you handle that?"

TECHNICAL · DEEPER

"How would you assess whether a model is applicable to a new set of compounds?"

BEHAVIOURAL

"Tell me about a time you got stuck on a technical problem. How did you get unstuck?"

BEHAVIOURAL

"Why cheminformatics specifically? What draws you to the intersection of chemistry and ML?"

MOTIVATION

"Why this company? What do you know about our pipeline / approach / technology?"

OPEN-ENDED

"What paper or technique in the field have you found most interesting recently?"

PREPARE YOUR "PAPER OF THE WEEK" ANSWER NOW The last question above trips up nearly every candidate who hasn't prepared. Pick one paper from the last 12 months in drug discovery ML — read it properly, understand the method, know the limitations. Be able to discuss it for 3 minutes. This single preparation separates you from 90% of candidates. P H A S E 6 R E S O U R C E S GUIDE FREE STAR Method — Behavioural Interviews Situation, Task, Action, Result. Write 5 STAR stories from your projects before any interview. Reuse them across questions. The format works because it's concrete and memorable. Search "STAR method interview" — many free guides PRACTICE FREE Pramp.com — Mock Technical Interviews Free peer-to-peer mock interviews. Find someone in a related field and practice live. The discomfort of being watched while you code is

The field moves fast. Keep reading papers, following practitioners, and building side projects even after you're employed. The people who advance fastest treat their career as a continuing learning project, not a destination. Pay it forward Answer beginner questions on Reddit. Mentor someone earlier on this path than you. Write about what you've learned. The cheminformatics community is small — your reputation as someone who gives back compounds over a career. R E F E R E N C E S E C T I O N S ◆ Reference

Job Roles Decoded

Cheminformatics Scientist Core role. Builds and maintains chemical databases, writes analysis pipelines, supports chemistry teams with data. Needs strong RDKit, pandas, SQL. Often the most accessible entry point. Computational Medicinal Chemist More chemistry-heavy. Molecular docking, pharmacophore modelling, ADMET prediction, SAR analysis. Needs domain chemistry knowledge plus computational tools. Often requires a chemistry degree. ML Scientist — Drug Discovery Most competitive. QSAR/GNN/generative model building at scale. Needs strong ML engineering plus chemistry domain. PhD common but not universal at startups. Highest salary band. Materials Informatics Engineer Applies cheminformatics methods to materials (batteries, polymers, semiconductors). Materials Project, DFT data, crystal structure ML. Growing fast. Less crowded than drug discovery. Bioinformatics / Chemogenomics Intersection with genomics — compound-target interaction prediction, proteochemometrics. Needs cheminformatics plus sequence data skills. Increasingly in demand at pharma. Software Engineer — Cheminformatics Builds the tools others use. Cheminformatics libraries, pipeline infrastructure, data platforms. Needs strong software engineering. Less domain chemistry required — accessible from an SWE background.

Who Hires

AI DRUG DISCOVERY STARTUPS Recursion · Exscientia · Insilico Medicine · BenevolentAI · Isomorphic Labs Move fastest, hire most aggressively in ML. Strong engineering culture. Remote-friendly. Competitive salaries. High-risk, high-reward career bets.

BIG PHARMA AstraZeneca · Novartis · Pfizer · GSK · Roche · Sanofi · Merck More structured, larger teams, clearer career paths. Graduate schemes are the best entry point. Slower hiring cycles. Strong benefits and job stability. COMPUTATIONAL CHEMISTRY SOFTWARE Schrödinger · OpenEye (Cadence) · Chemical Computing Group · Cresset Build the tools the rest of industry uses. Excellent for learning depth in the field. Good salaries. Often based in specific cities (NYC, Boston, Cambridge UK). CROS & BIOTECH Evotec · Charles River · Agilent · Chemaxon · Dotmatics More accessible entry points. Broader exposure to different projects and clients. Good for building range early in a career before specialising. ACADEMIC LABS Volkamer Lab · OPIG Oxford · Shoichet Lab · Glorius Group + many more Research assistant and PhD positions. Lower pay but excellent training. Often the best stepping stone into industry. Publishing in this role opens many doors. MATERIALS & AGRICHEM BASF · Dow · Syngenta · Materials Project (Berkeley) · Toyota Research Institute Less crowded than drug discovery. Similar technical skills, different application domain. Faster hiring, comparable salaries, less competition from PhDs. Salary Ranges Approximate 2024–2025 figures · UK and EU · verify on Glassdoor/Levels before negotiating ROLE JUNIOR YRS)^ (0–2^ MID YRS)^ (2–5^ SENIOR YRS) (5+ NOTES Cheminformatics Scientist £35–50k £50–70k £70–95k Most accessible entry role Computational Med. Chemist £40–55k £55–80k £80–110k Chemistry degree often required ML Scientist — Drug Discovery £50–70k £70–100k £100–140k+ PhD typical; startups pay more Materials Informatics Eng. £40–55k £55–80k £75–100k Less competitive; growing fast SWE — Cheminformatics tools £45–65k £65–95k £90–130k Strong SWE skills required Academic (Postdoc / RA) £28–38k £35–48k N/A (→ faculty) Trade pay for training & publishing US AND EU NOTE US salaries run roughly 1.5–2.5× the UK figures for equivalent roles, especially at AI startups in Boston/NYC/SF. EU salaries (Germany, Switzerland, Netherlands) are closer to UK levels but with stronger benefits. Remote roles may negotiate to local market rates — verify this explicitly before accepting.

Get Hired in Chemoinformatics — Complete Roadmap Companion to the Python for Chemoinformatics Deep Learning Guide Built for a complete beginner targeting their first cheminformatics role · ~20–26 months to offer