统计数据分析代写 STA4003代写 R代写 时间序列代写
836STA4003 Project 统计数据分析代写 The submitted codes must be clearly written in a R file with an output MSE. A report to describe your analysis is required. The submitted codes must ...
View detailsSearch the whole station
决策方法和预测分析代写 Objective This take-home project is one of three assessments (along with Tests 1 & 2) in this unit. It is worth 60% of the overall mark.
This take-home project is one of three assessments (along with Tests 1 & 2) in this unit. It is worth 60% of the overall mark.
The main objective is to allow the participants in the unit to demonstrate their grasp of the fundamentals and practical use of statistical and machine learning methods for prediction/classification that have been discussed during the unit, and those that participants will research on their own.
The project should focus on the analysis of a substantive dataset that participants may obtain from online sources, or from their place of work.
Up to now, in both lectures and tutorials, we have analyzed data and fitted predictive models as if the steps to do so were clear, well-laid out, and led invariably to a ‘correct’ answer. Reality, however, is messier. There is not a linear path from problem and data to solution, and one of the pedagogical objectives of the project is to allow participants to get some sense of that.
Participants should work (with some exception) in teams of 2 people. Analysis and reporting are to be carried out in R/RStudio using R Markdown.
Task | Mark | Due |
Proposal (2–3 pages) | 10% | TW7 |
Peer Review of Proposal | 5% | TW8 |
Written Report | 35% | TW12 |
Oral Presentation | 10% | TW12 |
Details of each of these assessments are shown below, and rubrics and other reference material will be available on Blackboard.
Participants may wish to use data from their own workplaces, as long as confidentiality requirements do not prevent them from writing a report that will be read by the lecturer nor from speaking about their analysis to the other participants in the unit.
There are many public sources of data available, including open data websites such as OpenDataSoft. The appendix also contains a list of data websites compiled by an academic at the University of Idaho.
The idea is to find a dataset that is sufficiently complex to allow you to demonstrate your familiarity with the methods studied in the unit, and those that we have not. There may be several response variables for which prediction/classification methods have to be used. In addition, you will find yourself more motivated if you select a dataset from a field that is of interest to you.
The project proposal is a short (2-3 page) Word document produced using R
Markdown that contains:
1. Title
2. Data & Analyses
a. Objective: What do you plan on predicting/classifying and why?
b. Where do the data come from? Have these data been analyzed before?
c. Describe context and variables and their types; show some plots/tables
d. What analyses do you propose to carry out?
e. How will you evaluate the predictive models?
Your proposal will be marked by one of your classmates.
You will be provided with a rubric and some general guidelines to help you evaluate a proposal written by one of your classmates.
The project report should be written as a formal technical report. It can be written wholly in R Markdown and then converted to Word, or some combination of R Markdown for technical appendices and Word for the main body. There is no prescribed structure, but it should contain the following elements:
• What is the problem you are trying to solve? Where do the data come
from? Include background material as appropriate.
• What are the methods you used for exploratory analysis and for prediction/classification? Provide background information on methods that we did not cover in the unit.
• What hyper-parameter choices did you make and why?
• What data cleaning/wrangling did you have to do before analysis?
• Include methods that didn’t work as well as those that did.
• Provide a detailed description of your results. What are the performance measures you used to assess predictive/classification accuracy?
• If the data have been analyzed before, how well did your methods
perform compared to those that others used?
• Use informative and interesting visualizations for EDA and for
displaying your results.
• What would you have done differently? What other methods could you
have used?
Depending on the complexity of the problem you have decided to tackle, the main body of the report will be 10–20 pages long, including important plots and tables. The appendix should contain the R Markdown file and the resulting output from your data wrangling, exploratory data analysis, and quantitative analysis. If you use any external resources such as books or websites – and you are encouraged to do so! – please make sure that you cite them appropriately.
A rubric will be provided on BB to guide you as you write the report. If you are working in a team, please provide a breakdown of the effort of each member, and what each individual worked on.
1. This assignment is my/our own original work, except where I/we have appropriately cited the original source (appropriate citation of original work will vary from discipline to discipline).
2. This assignment has not previously been submitted in any form for this or any other unit, degree or diploma at any university or other institute of tertiary education.
3. I/we acknowledge that it is my responsibility to check that the file I/we have submitted is: a) readable, b) the correct file and c) fully complete.
The last lecture/workshop slot will be devoted to oral presentation of your work.
Depending on the number of presentations, each presentation will be between 8 – 12 minutes long plus some time for questions. A rubric will be made available on Blackboard.
The course textbook and supporting materials should be your starting point for help on exploratory data analysis and predictive methods. There is plenty of online help on using R. For example, the website stackoverflow has a subsection devoted to R that’s very useful. A searchable archive of the R help list may be found at this website. And, of course, you are welcome to contact me for guidance.
Good luck.
(Hyperlinked and compiled by Stephen Sauchi Lee, University of Idaho.)
1. 200,000+ Jeopardy questions
2. Awesome Public Datasets on github, curated by caesar0301.
3. AWS (Amazon Web Services) Public Data Sets, provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
4. Canada Open Data, pilot project with many government and geospatial datasets.
5. Causality Workbench data repository.
6. CDC Data — Medical data from the Centers for Disease Control and Prevention
7. Census.gov — US government source of data about the nation’s people and economy
8. CKAN — Open-source data portal platform
9. Corral Big Data repository at Texas Advanced Computing Center, supporting data centric science.
10. CrowdFlower Data for Everyone library.
11. Data Market — Portal for shared business data
12. Data Planet, The largest repository of standardized and structured statistical data, with over 25 billion data points, 4.3 billion datasets, 400+ source databases.
13. Data Source Handbook, A Guide to Public Data, by Pete Warden, O’Reilly (Jan 2011).
14. Data.gov — Source of machine readable datasets generated by the US government
15. Data.gov.uk, publicly available data from UK (also London datastore.)
16. Data.gov/Education, central guide for education data resources including high value data sets, data visualization tools, resources for the classroom, applications created from open data and more.
17. Datacatalogs.org, open government data from US, EU, Canada, CKAN, and more.
18. DataFerrett, a data mining tool that accesses and manipulates TheDataWeb, a collection of many on-line US Government datasets.
19. DataMarket, visualize the world’s economy, societies, nature, and industries, with 100 million time series from UN, World Bank, Eurostat and other important data providers.
20. DataSF.org, a clearinghouse of datasets available from the City & County of San Francisco, CA.
21. Dataverse Network — Repository for research datasets
22. Delve, Data for Evaluating Learning in Valid Experiments
23. Donors Choose: data related to their projects
24. EconData, thousands of economic time series, produced by a number of US Government agencies.
25. Enron Email Dataset, data from about 150 users, mostly senior management of Enron.
26. Europeana Data, contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana – the trusted and comprehensive resource for European cultural heritage content.
27. FEDSTATS, a comprehensive source of US statistics and more
28. FIMI repository for frequent itemset mining, implementations and datasets.
29. Financial Data Finder at OSU, a large catalog of financial data sets.
30. FiveThirtyEight: data and code related to their articles
31. Free SVG Maps — Website for free geographic maps
32. GDELT: The Global Data on Events, Location and Tone, described by Guardian as “a big data history of life, the universe and everything.”
33. GeoDa Center, geographical and spatial data.
34. Google ngrams datasets, text from millions of books scanned by Google.
35. Google Public Data Explorer — Google’s public data portal to explore, visualize, and communicate large datasets
36. Grain Market Research, financial data including stocks, futures, etc.
37. Guardian DataBlog — Data journalism and data visualization from the Guardian
38. HitCompanies Datasets, comprehensive data on random 10,000 UK companies sampled from HitCompanies, updated automatically using AI/Machine Learning.
39. ICWSM-2009 dataset contains 44 million blog posts made between August 1st and October 1st, 2008.
40. IMDb Datasets — Webpage for access to IMDb datasets
41. Infochimps, an open catalog and marketplace for data. You can share, sell, curate, and download data about anything and everything.
42. Investor Links, includes financial data
43. Jake Hofman Data Links — Jake Hofman’s bookmarked computational social science data resources
44. Jerry Smith dataset collection, with Finance, Government, Machine Learning, Science, and other data.
45. Kaggle – home of Data Science
46. KDD Cup center, with all data, tasks, and results.
47. KDnuggets Data Repositories List — Data repository list maintained by KDnuggets, a popular data mining website
48. Kevin Chai list of datasets, for text, SNA, and other fields.
49. KONECT, the Koblenz Network Collection, with large network datasets of all types in order to perform research in the area of network mining.
50. Last.fm Datasets — Webpage for access to Last.fm datasets
51. Linked Data — Linkage site for distributed data
52. Linking Open Data project, at making data freely available to everyone.
53. Million Song Dataset
54. MIT Cancer Genomics gene expression datasets and publications, from MIT Whitehead Center for Genome Research.
55. ML Data, the data repository of the EU Pascal2 networks.
56. mldata.org — A public repository for machine learning data
57. NASDAQ Data Store, provides access to market data.
58. National Government Statistical Web Sites, data, reports, statistical yearbooks, press releases, and more from about 70 web sites, including countries from Africa, Europe, Asia, and Latin America.
59. National Space Science Data Center (NSSDC), NASA data sets from planetary exploration, space and solar physics, life sciences, astrophysics, and more.
60. NetworkRepository: Interactive Data Repository, has many collections of graph and networks from social science, machine learning, scientific computing, and other areas.
61. Open Data Census, assesses the state of open data around the world.
62. Open Source Sports, many sports databases, including Baseball, Football, Basketball, and Hockey.
63. OpenData from Socrata, access to over 10,000 datasets including business, education, government, and fun.
64. Peter Skomoroch (LinkedIn) Data Links — Peter Skomoroch’s bookmarked machine learning data resources
65. PubGene(TM) Gene Database and Tools, genomic-related publications database
66. Quandl, a collaboratively curated portal to millions of financial and economic time-series datasets.
67. qunb, a platform to find and visualize quantitative data.
68. RealClimate Data — Aggregator for selected sources of code and data related to climate science
69. Reddit Open Data — Forum on the social news site reddit for open APIs and datasets
70. Reddit Top 2.5 Million: all-time top 1,000 posts from each of the top 2,500 subreddits
71. Robert Schiller data on housing, stock market, and more from his book Irrational Exuberance.
72. SMD: Stanford Microarray Database, stores raw and normalized data from microarray experiments.
73. SourceForge.net Research Data, includes historic and status statistics on approximately 100,000 projects and over 1 million registered users’ activities at the project management web site.
74. StateMaster — Reference site for data on US states
75. StatLib, CMU Datasets Archive.
77. The Upshot: data related to their articles
78. Time Series Data Library
79. UCI Datasets — The UC Irvine Machine Learning Repository, a popular source of machine learning datasets
80. UCI KDD Database Repository for large datasets used in machine learning and knowledge discovery research.
81. UCR Time Series Data Archive, offering datasets, papers, links, and code.
82. UFO reports: geolocated and time-standardized UFO reports for close to a century
83. UK’s Met Office Data — Climate station records from the UK’s National Weather Service
84. UK’s Office for National Statistics — Source of datasets generated by the UK’s Office for National Statistics
85. United States Census Bureau.
86. Visual Analytics Benchmark Repository.
87. Web Data Commons, structured data from the Common Crawl, the largest public web corpus.
88. Wikipedia Database — Webpage for access to complete Wikipedia database dumps
89. Wikiposit, a (virtual) amalgamation of (mostly financial) data from many different sites, allowing users to merge data from different sources
90. Wolfram Alpha disease and patient level data.
91. Wolfram|Alpha — Computational knowledge engine or answer engine
92. World Bank Catalog — World Bank data
93. Yahoo Sandbox datasets, Language, Graph, Ratings, Advertising and Marketing, Competition
94. Yelp Academic Dataset, all the data and reviews of the 250 closest businesses for 30 universities for students and academics to explore and research.
95. Yelp Dataset Challenge: Yelp reviews, business attributes, users, and more from 10 cities
更多代写:Python代写 GMAT代考 英国essay代寫 澳洲essay代写 thesis代寫 软件工程代写
合作平台:essay代写 论文代写 写手招聘 英国留学生代写
STA4003 Project 统计数据分析代写 The submitted codes must be clearly written in a R file with an output MSE. A report to describe your analysis is required. The submitted codes must ...
View detailsAMS 553.414/614: Applied Statistics and Data Analysis Practice questions for final exam 代考应用统计和数据分析 1. The data for this practice question is based on the cars dataset which is a...
View detailsPractice Midterm 统计学的基本概念代写 Part I. Points and Number within sections are approximate and may change. Short Answers –2 points each (5) 1.Will we get nominal or ordinal data Part I...
View detailsStatistics in Education Policy 统计课业代写 There are 12 question groupings. You are expected to address each question in the grouping. Each question “grouping” is worth 8 points and ...
View details