Error in Script #33

ats1958 · 2018-04-23T13:49:02Z

Has anyone successfully downloaded all data recently? Getting the following error:

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: All access to this object has been disabled (Service: Amazon S3; Status Code: 403; Error Code:

lecy · 2018-04-23T15:04:47Z

I just did a test-run and I was able to execute this R code without problem.

library( jsonlite )
library( R.utils )



# CREATE A DATA FRAME OF ELECTRONIC FILERS FROM IRS JSON FILES

dat1 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2011.json")[[1]]
dat2 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2012.json")[[1]]
dat3 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2013.json")[[1]]
dat4 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2014.json")[[1]]
dat5 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2015.json")[[1]]
dat6 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2016.json")[[1]]
dat7 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2017.json")[[1]]

efiler.index <- rbind( dat1, dat2, dat3, dat4, dat5, dat6, dat7 )

head( efiler.index )


library( xml2 )
library( dplyr )


### EXAMPLE ORGANIZATIONS FROM EACH PERIOD

V_990_2014 <- "https://s3.amazonaws.com/irs-form-990/201543089349301829_public.xml"

V_990_2012 <- "https://s3.amazonaws.com/irs-form-990/201322949349300907_public.xml"

V_990EZ_2014 <- "https://s3.amazonaws.com/irs-form-990/201513089349200226_public.xml"

V_990EZ_2012 <- "https://s3.amazonaws.com/irs-form-990/201313549349200311_public.xml"





### GENERATE ALL XPATHS: V 990 2014
doc <- read_xml( V_990_2014 )
xml_ns_strip( doc )
doc %>% xml_find_all( '//*') %>% xml_path()



### GENERATE ALL XPATHS: V 990 2012
doc <- read_xml( V_990_2012 )
xml_ns_strip( doc )
doc %>% xml_find_all( '//*') %>% xml_path()



### GENERATE ALL XPATHS: V 990EZ 2014
doc <- read_xml( V_990EZ_2014 )
xml_ns_strip( doc )
doc %>% xml_find_all( '//*') %>% xml_path()



### GENERATE ALL XPATHS: V 990EZ 2012
doc <- read_xml( V_990EZ_2012 )
xml_ns_strip( doc )
doc %>% xml_find_all( '//*') %>% xml_path()

borenstein · 2018-04-24T01:32:15Z

Depending on how you're authenticating, you may be unable to get S3 data via the s3:// protocol, while having no trouble downloading it anonymously via https. When that happens, it's usually something about permissions on your side--either your IAM role is too restrictive, or the client that you're using to talk to S3 doesn't see your credentials. Do try to get it working via S3, however--batch downloads via S3 are orders of magnitude faster than individual https requests, which each require separate handshakes between your machine and Amazon.

…

-- David Bruce Borenstein, PhD 781.710.2789 (m) https://www.linkedin.com/in/davidborenstein

On Mon, Apr 23, 2018 at 11:04 AM, Jesse Lecy ***@***.***> wrote: I just did a test-run and I was able to execute this R code without problem. library( jsonlite ) library( R.utils ) # CREATE A DATA FRAME OF ELECTRONIC FILERS FROM IRS JSON FILES dat1 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2011.json")[[1]]dat2 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2012.json")[[1]]dat3 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2013.json")[[1]]dat4 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2014.json")[[1]]dat5 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2015.json")[[1]]dat6 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2016.json")[[1]]dat7 <- fromJSON("https://s3.amazonaws.com/irs-form-990/index_2017.json")[[1]] efiler.index <- rbind( dat1, dat2, dat3, dat4, dat5, dat6, dat7 ) head( efiler.index ) library( xml2 ) library( dplyr ) ### EXAMPLE ORGANIZATIONS FROM EACH PERIOD V_990_2014 <- "https://s3.amazonaws.com/irs-form-990/201543089349301829_public.xml" V_990_2012 <- "https://s3.amazonaws.com/irs-form-990/201322949349300907_public.xml" V_990EZ_2014 <- "https://s3.amazonaws.com/irs-form-990/201513089349200226_public.xml" V_990EZ_2012 <- "https://s3.amazonaws.com/irs-form-990/201313549349200311_public.xml" ### GENERATE ALL XPATHS: V 990 2014doc <- read_xml( V_990_2014 ) xml_ns_strip( doc )doc %>% xml_find_all( '//*') %>% xml_path() ### GENERATE ALL XPATHS: V 990 2012doc <- read_xml( V_990_2012 ) xml_ns_strip( doc )doc %>% xml_find_all( '//*') %>% xml_path() ### GENERATE ALL XPATHS: V 990EZ 2014doc <- read_xml( V_990EZ_2014 ) xml_ns_strip( doc )doc %>% xml_find_all( '//*') %>% xml_path() ### GENERATE ALL XPATHS: V 990EZ 2012doc <- read_xml( V_990EZ_2012 ) xml_ns_strip( doc )doc %>% xml_find_all( '//*') %>% xml_path() — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEPgnzPhH8Z5Nz4LOKWGHqnmbC847M68ks5tre2PgaJpZM4Tf7kv> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in Script #33

Error in Script #33

ats1958 commented Apr 23, 2018

lecy commented Apr 23, 2018

borenstein commented Apr 24, 2018 via email

Error in Script #33

Error in Script #33

Comments

ats1958 commented Apr 23, 2018

lecy commented Apr 23, 2018

borenstein commented Apr 24, 2018 via email