Project tips + resources

Data sources

Example Project Ideas

  • You took Philosphy 106 (ethics) and you discussed the relationship between employers and employees. You submitted a signature assignment of a paper on unions (pro or against, it doesn’t matter). A possible dataset you could choose is the Right to Work Dataset and do an analysis on the relationship between state laws and their effect on union participation.
  • You took a BIO 101 class and discussed the biologcal mechanisms of diabetes. A possible dataset you could choose is the Diabetes Dataset and create a model that predict diabetes based on diagnostic measurements.
  • You took ARTS 104 (Creativity: Methods and Practices). A possible dataset you could choose is the Best Artworks of All Time and do a clustering analysis to see if there exists any patterns about great artists in our times (country, age, sex, etc.).
  • You took SOCIO 101. A possible dataset you could choose is the Student Performance Dataset that looks at student achievement in secondary education. You could create a regression model that predicts performance based upon a variety of factors.

Other data repositories

Tips

  • Ask questions if any of the expectations are unclear.

  • Code: In your write up your code should be hidden (echo = FALSE) so that your document is neat and easy to read. However your document should include all your code such that if I re-knit your qmd file I should be able to obtain the results you presented.

    • Exception: If you want to highlight something specific about a piece of code, you’re welcome to show that portion.
  • Merge conflicts will happen, issues will arise, and that’s fine! Commit and push often, and ask questions when stuck.

  • Make sure each team member is contributing, both in terms of quality and quantity of contribution (we will be reviewing commits from different team members).

  • All team members are expected to contribute equally to the completion of this assignment and group assessments will be given at its completion - anyone judged to not have sufficient contributed to the final product will have their grade penalized. While different teams members may have different backgrounds and abilities, it is the responsibility of every team member to understand how and why all code and approaches in the assignment works.

Formatting + communication tips

Headers

  • Use headers to clearly label each section.
  • Inspect the document outline to review your headers and sub-headers.

References

  • Include all references in a section called “References” at the end of the report.
  • This course does not have specific requirements for formatting citations and references.

Appendix

  • If you have additional work that does not fit or does not belong in the body of the report, you may put it at the end of the document in section called “Appendix”.
  • The items in the appendix should be properly labeled.
  • The appendix should only be for additional material. The reader should be able to fully understand your report without viewing content in the appendix.

Resize figures

Resize plots and figures, so you have more space for the narrative.

Arranging plots

Arrange plots in a grid, instead of one after the other. This is especially useful when displaying plots for exploratory data analysis and to check assumptions.

If you’re using ggplot2 functions, the patchwork package makes it easy to arrange plots in a grid. See the documentation and examples here.

Plot titles and axis labels

Be sure all plot titles and axis labels are visible and easy to read.

  • Use informative titles, not variable names, for titles and axis labels.

Do a little more to make the plot look professional!

  • Informative title and axis labels
  • Flipped coordinates to make names readable
  • Arranged bars based on count
  • Capitalized manufacturer names
  • Optional: Added color - Use a coordinated color scheme throughout paper / presentation
  • Optional: Applied a theme - Use same theme throughout paper / presentation

TODO example plot

Guidelines for communicating results

  • Don’t use variable names in your narrative! Use descriptive terms, so the reader understands your narrative without relying on the data dictionary.

    • ❌ There is a negative linear relationship between mpg and hp.
    • ✅ There is a negative linear relationship between a car’s fuel economy (in miles per gallon) and its horsepower.
  • Know your audience: Your report should be written for a general audience who has an understanding of statistics at the level of STA 210.

  • Avoid subject matter jargon: Don’t assume the audience knows all of the specific terminology related to your subject area. If you must use jargon, include a brief definition the first time you introduce a term.

  • Tell the “so what”: Your report and presentation should be more than a list of interpretations and technical definitions. Focus on what the results mean, i.e. what you want the audience to know about your topic after reading your report or viewing your presentation.

    • ❌ For every one unit increase in horsepower, we expect the miles per gallon to decrease by 0.068 units, on average.
    • ✅ If the priority is to have good fuel economy, then one should choose a car with lower horsepower. Based on our model, the fuel economy is expected to decrease, on average, by 0.68 miles per gallon for every 10 additional horsepower.
  • Tell a story: All visualizations, tables, model output, and narrative should tell a cohesive story!

  • Use one voice: Though multiple people are writing the report, it should read as if it’s from a single author. At least one team member should read through the report before submission to ensure it reads like a cohesive document.

Additional resources