实证分析者:合理规划和管理你的文档

发布时间:2020-10-10 阅读 2397

Stata 连享会   主页 || 视频 || 推文

温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。

课程详情 https://gitee.com/arlionn/Course   |   lianxh.cn

课程主页 https://gitee.com/arlionn/Course


目录


Source: Planning, Organizing, and Documenting (PO&D)

Why are PO&D so important that they warrant an entire chapter?

  • REPLICABILITY
  • Hit by a bus rule: Could another researcher pick up your project and finish it if you were hit by a bus?
  • Publication process may take years from the time a project is started – maintaining proper documentation allows a researcher to more easily re-familiarize themselves with a project. Planning
  • Effective planning saves time and prevents errors.
  • Especially important during collaborative projects to avoid confusion over who is responsible for what.
  • Also especially important for long or complex research projects.
  • Makes working on multiple projects easier.

Planning Tips

  1. Start with general goals and publishing plans, this allows researchers to prioritize tasks so that the early stages of a project are not held up by data collection and/or data management.
  2. Plan a timeline with target dates for key stages of the project. Goals may not be met, but can be used as useful benchmarks for assessing your plan. It also allows you to plan around special deadlines such as conference submission deadlines
  3. A well thought out plan allows for a smooth division of labor. With no plan it is difficult to coordinate data management and use. A plan also allows for understandings surrounding authorship. A failed collaboration certainly affects the success of a project, but also potentially affects interpersonal relationships.
  4. Select an enforcer for collaborative projects – someone specifically designated to focus on PO&D. For example, the enforcer may be in charge of backing up the data or organizing files.
  5. Simple but informative variable names help avoid confusion.
  6. Planning for missing data – does it matter if the missing data are due to attrition, refusal, or a skip pattern in the survey? Do you need to account for these different forms of missing data with separate codes?
  7. Consider the software being used to analyze the data. The type of software being used affects data formats, naming conventions, and data structures.

Organizing

Successful organization determines what goes where, what to name a file, how you find a file.

Organizing Tips

  1. Start early! Staying organized early in a project allows for easier organization towards the end of a project.
  2. Simple but not too simple. Large and small projects require different levels of complexity in organization. Not every project is going to be organized in the same way.
  3. Stay consistent. Using the same naming scheme for all projects reduces the amount of time you must spend thinking about organization.
  4. Use written documentation of your organization procedures. This prevents you from changing conventions mid-project. This is especially important during collaborative projects.
  5. Using dashes (-) or underscores (_) in naming conventions allow researchers to avoid using quotation marks in their program commands. They also facilitate work on projects across operating systems (PC vs Mac).
  6. Labeling files with a short mnemonic unique to each project allows for a researcher to quickly identify or locate files associated with particular projects.
  7. “Work” and “Posted” directories are useful to keep straight which aspects of a project are finished and which are still in progress.
  8. Using a dash (-) at the beginning of a directory name pushes it to the top of your directory list.
  9. Mailbox directories can be set up on collaborative projects. For example, a directory may be named Researcher1toResearcher2 or Researcher2toResearcher1.
  10. Private directories can be set up for files that aren’t intended to be shared with the entire research team.
  11. “– to clean” directory can be used for files that you are unsure of where to put. These files stay in this folder until they can be transferred to their proper place.
  12. “– hold then delete” directory can be used as a sort of trash bin. It can be used to hold on to old files until they it is clear they can be deleted.
  13. Use directory names to highlight what the directory contains.
  14. “- history” directory can be used to keep track of critical information about files in the project.
  15. Spreadsheets can be useful tools for organizing a directory. See Long (2009:29)

Documenting

Long’s Law of Documentation: It is always faster to document it today than tomorrow.

  • Documentation reminds a researcher of decisions made, work completed, and plans for future work.
  • Without documentation, replication is nearly impossible.

Documenting Tips

  1. Document data sources so that you are aware of which release of a dataset you are using.
  2. Document data decisions such as how variables were created and cases selected. Why?
  3. Document that type of analysis used, the order it was used in, and why it was used. Also account for other analyses that you may have run but decided not to use.
  4. Document the type of software used.
  5. Keep a record of where the results are stored
  6. A Research log can be used to chronicle what, when, and how decisions were made. This can be handwritten or constructed using a word processor. See research log template Long (2009:41)
    • a. A good research log keeps work on track
    • b. Helps deal with interruptions to work
    • c. Facilitates replication
  7. Include detailed comments in your do files using (*)
  8. Internally labeling documents with the author’s name, the date it was created, and the name of the document allows for better tracking of which document is the most recent. This information can be added at the end of a document.
  9. Include full dates and names so that there is not confusion down the road.
  10. Code books are also important for documentation of variables. Good codebooks typically include:
    • a. Variable name and question number
    • b. Text of the original question and variable label used
    • c. If data were collected using a survey any branching information should be included
    • d. Descriptive statistics including value labels for categorical variables
    • e. Descriptions of how missing data can occur as well as codes for each type of missing data
    • f. Information on recoding or imputation. Include information about how missing data were handled of a variable was constructed from other variables.
    • g. An appendix that contains any abbreviations or conventions used.
  11. Manage datasets with a dataset registry. See Long (2009:45)
    • a. Dataset registries track your datasets as well as the dofiles used to create them. A dataset registry allows an individual to more easily troubleshoot any potential data problems.

Examples and templates cited from Long (2009) are also available in file format.

相关课程

连享会-直播课 上线了!
http://lianxh.duanshu.com

免费公开课:


课程一览

支持回看,所有课程可以随时购买观看。

专题 嘉宾 直播/回看视频
最新专题 DSGE, 因果推断, 空间计量等
Stata数据清洗 游万海 直播, 2 小时,已上线
研究设计 连玉君 我的特斯拉-实证研究设计-幻灯片-
面板模型 连玉君 动态面板模型-幻灯片-
面板模型 连玉君 直击面板数据模型 [免费公开课,2小时]

Note: 部分课程的资料,PPT 等可以前往 连享会-直播课 主页查看,下载。


关于我们

  • Stata连享会 由中山大学连玉君老师团队创办,定期分享实证分析经验。直播间 有很多视频课程,可以随时观看。
  • 连享会-主页知乎专栏,300+ 推文,实证分析不再抓狂。
  • 公众号推文分类: 计量专题 | 分类推文 | 资源工具。推文分成 内生性 | 空间计量 | 时序面板 | 结果输出 | 交乘调节 五类,主流方法介绍一目了然:DID, RDD, IV, GMM, FE, Probit 等。
  • 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标,输入简要关键词,以便快速呈现历史推文,获取工具软件和数据下载。常见关键词:课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法

连享会主页  lianxh.cn
连享会主页 lianxh.cn

连享会小程序:扫一扫,看推文,看视频……

扫码加入连享会微信群,提问交流更方便

✏ 连享会学习群-常见问题解答汇总:
https://gitee.com/arlionn/WD