Here's the thing
I'm going to make this quick. You do a carefully thought through analysis. You present it to all the movers and shakers at your company. Everyone loves it. Six months later someone asks you a question you didn't cover so you need to reproduce your analysis...
But you can't remember where the hell you saved the damn thing on your computer.
If you're a data scientist (especially the decision sciences/analysis focused kind) this has happened to you. A bunch. You might laugh it off, shrug, w/e but it's a real problem because now you have to spend hours if not days recreating work you've already done. It's a waste of time and money.
I used to be this person too, so I get it. I decided to experiment with a new method that sounds so simplistic and stupid you'll think it won't work.
Just. Try. It. It will change your life.
I now keep all of my analyses in a single folder. I call mine "Research" call yours w/e you want it doesn't fucking matter. Next, any time I start ANY analysis (SQL query, python notebook, Excel workbook, etc) I create a folder in my research folder. The folder is named by the date and then a brief description of what it is about.
That's it. I have not lost a single analysis since I started doing this. I've been asked for analyses that are 6 months old and I can find them in <10 minutes.
Once you have this folder structure you have to work directly out of the folder for all of your work. There can't be extra effort to get your work into this folder or you'll start losing research again.
But I won't remember what date I did an analysis!
You don't have to! But you will remember that you did it a few months ago. Or a week or two ago. If you remember nothing you can probably ask the person with the question when you first presented that and they'll give you a time range. Really the time range is there to eliminate 95% of the noise.
But it's hard to make sure the descriptions are clear/useful!
If you work on two analyses that you would describe basically the same way (for some weird reason) that's where the dates come in handy! It's hard to mix up what you did last week with what you did last month in this system. Worst case you find both and have to look at both. You still haven't lost anything but maybe it takes you the full ten minutes to figure out which is which.
If this method is even an option for you it's because you lose your research. Before you come up with more excuses just try this out for a couple weeks and see if it doesn't affect the quality of your life.
Tooling that has helped me
What's important is that this method is so light and simple, it works with almost any tool. If you use SQL as much as I do, I recommend getting a tool like PyCharm or DataGrip that allows you to organize your SQL files in custom folders like you would code in a standard IDE. I can't imagine this being easy if you have to keep copying and pasting SQL code from your research folder.
Again, you have to set up your workflow so you can work entirely out of this one folder.
Keep that tenet in mind as you look at your tools and evaluate new tools. It's key.
Writing a Spark job? Do the same. Just live in this folder for your analysis.
You can combine this with software that does full text searches over folders as well to make finding things even easier.
That's it. I know it's nothing flashy or exciting. No one who interviews you for a data science role will ever ask you about this either. The people who work with data scientists though will notice you have your shit together and while those around you can never keep track of their work, you're always able to pick up where you left off and get down to business immediately.
Just remember this format and try it.
If you've solved this problem another way, I'd love to hear your solution. For people who struggle with this I'd love to offer more than one approach. Share it with me via comments or Twitter or something. Thanks!